[ACCEPTED]-How to properly trim whitespaces from a string in Java?-unicode

Accepted answer
Score: 60

Google has made guava-libraries available recently. It 6 may have what you are looking for:

CharMatcher.inRange('\0', ' ').trimFrom(str)

is equivalent 5 to String.trim(), but you can customize 4 what to trim, refer to the JavaDoc.

For instance, it 3 has its own definition of WHITESPACE which differs from the JDK and is defined 2 according to the latest Unicode standard, so 1 what you need can be written as:

CharMatcher.WHITESPACE.trimFrom(str)
Score: 7

I swear I only saw this after I posted the 4 question: Google just released Guava, a library 3 of core Java utilities.

I haven't tried this 2 yet, but from what I can tell, this is fully 1 Unicode compliant:

String s = "  \t testing \u00a0"
s = CharMatcher.WHITESPACE.trimFrom(s);
Score: 3

It's really hard to define what constitutes 7 white spaces. Sometimes I use non-breakable 6 spaces just to make sure it doesn't get 5 stripped. So it will be hard to find a library 4 to do exactly what you want.

I use my own 3 trim() if I want trim every white space. Here 2 is the function I use to check for white 1 spaces,

  public static boolean isWhitespace (int ch)
  {
    if (ch == ' ' || (ch >= 0x9 && ch <= 0xD))
      return true;
    if (ch < 0x85) // short-circuit optimization.
      return false;
    if (ch == 0x85 || ch == 0xA0 || ch == 0x1680 || ch == 0x180E)
      return true;
    if (ch < 0x2000 || ch > 0x3000)
      return false;
    return ch <= 0x200A || ch == 0x2028 || ch == 0x2029
      || ch == 0x202F || ch == 0x205F || ch == 0x3000;
  }
Score: 2

I've always found trim to work pretty well for 6 almost every scenario.

However, if you really 5 want to include more characters, you can 4 edit the strip method from commons-lang to include not only 3 the test for Character.isWhitespace, but also for Character.isSpaceChar which seems to be 2 what's missing. Namely, the following lines 1 at stripStart and stripEnd, respectively:

  • while ((start != strLen) && Character.isWhitespace(str.charAt(start)))
  • while ((end != 0) && Character.isWhitespace(str.charAt(end - 1)))
Score: 0

I did little changes on java's trim() method 2 and it supports non-ascii characters.This 1 method runs faster than most of the implementations.

public static String trimAdvanced(String value) {

        Objects.requireNonNull(value);

        int strLength = value.length();
        int len = value.length();
        int st = 0;
        char[] val = value.toCharArray();

        if (strLength == 0) {
            return "";
        }

        while ((st < len) && (val[st] <= ' ') || (val[st] == '\u00A0')) {
            st++;
            if (st == strLength) {
                break;
            }
        }
        while ((st < len) && (val[len - 1] <= ' ') || (val[len - 1] == '\u00A0')) {
            len--;
            if (len == 0) {
                break;
            }
        }


        return (st > len) ? "" : ((st > 0) || (len < strLength)) ? value.substring(st, len) : value;
    }

More Related questions