[ACCEPTED]-How to properly trim whitespaces from a string in Java?-unicode
Google has made guava-libraries available recently. It 6 may have what you are looking for:
CharMatcher.inRange('\0', ' ').trimFrom(str)
is equivalent 5 to String.trim(), but you can customize 4 what to trim, refer to the JavaDoc.
For instance, it 3 has its own definition of WHITESPACE which differs from the JDK and is defined 2 according to the latest Unicode standard, so 1 what you need can be written as:
CharMatcher.WHITESPACE.trimFrom(str)
I swear I only saw this after I posted the 4 question: Google just released Guava, a library 3 of core Java utilities.
I haven't tried this 2 yet, but from what I can tell, this is fully 1 Unicode compliant:
String s = " \t testing \u00a0"
s = CharMatcher.WHITESPACE.trimFrom(s);
It's really hard to define what constitutes 7 white spaces. Sometimes I use non-breakable 6 spaces just to make sure it doesn't get 5 stripped. So it will be hard to find a library 4 to do exactly what you want.
I use my own 3 trim() if I want trim every white space. Here 2 is the function I use to check for white 1 spaces,
public static boolean isWhitespace (int ch)
{
if (ch == ' ' || (ch >= 0x9 && ch <= 0xD))
return true;
if (ch < 0x85) // short-circuit optimization.
return false;
if (ch == 0x85 || ch == 0xA0 || ch == 0x1680 || ch == 0x180E)
return true;
if (ch < 0x2000 || ch > 0x3000)
return false;
return ch <= 0x200A || ch == 0x2028 || ch == 0x2029
|| ch == 0x202F || ch == 0x205F || ch == 0x3000;
}
I've always found trim
to work pretty well for 6 almost every scenario.
However, if you really 5 want to include more characters, you can 4 edit the strip
method from commons-lang
to include not only 3 the test for Character.isWhitespace
, but also for Character.isSpaceChar
which seems to be 2 what's missing. Namely, the following lines 1 at stripStart
and stripEnd
, respectively:
while ((start != strLen) && Character.isWhitespace(str.charAt(start)))
while ((end != 0) && Character.isWhitespace(str.charAt(end - 1)))
I did little changes on java's trim() method 2 and it supports non-ascii characters.This 1 method runs faster than most of the implementations.
public static String trimAdvanced(String value) {
Objects.requireNonNull(value);
int strLength = value.length();
int len = value.length();
int st = 0;
char[] val = value.toCharArray();
if (strLength == 0) {
return "";
}
while ((st < len) && (val[st] <= ' ') || (val[st] == '\u00A0')) {
st++;
if (st == strLength) {
break;
}
}
while ((st < len) && (val[len - 1] <= ' ') || (val[len - 1] == '\u00A0')) {
len--;
if (len == 0) {
break;
}
}
return (st > len) ? "" : ((st > 0) || (len < strLength)) ? value.substring(st, len) : value;
}
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.