[ACCEPTED]-How to determine whether a character is a letter in Java?-unicode

Accepted answer
Score: 33

Character.isLetter() is much faster than 12 string.matches(), because string.matches() compiles 11 a new Pattern every time. Even caching the 10 pattern, I think isLetter() would still 9 beat it.


EDIT: Just ran across this again and 8 thought I'd try to come up with some actual 7 numbers. Here's my attempt at a benchmark, checking 6 all three methods (matches() with and without caching 5 the Pattern, and Character.isLetter()). I also made sure that there 4 were both valid and invalid characters checked, so 3 as not to skew things.

import java.util.regex.*;

class TestLetter {
    private static final Pattern ONE_CHAR_PATTERN = Pattern.compile("\\p{L}");
    private static final int NUM_TESTS = 10000000;

    public static void main(String[] args) {
        long start = System.nanoTime();
        int counter = 0;
        for (int i = 0; i < NUM_TESTS; i++) {
            if (testMatches(Character.toString((char) (i % 128))))
                counter++;
        }
        System.out.println(NUM_TESTS + " tests of Pattern.matches() took " +
                (System.nanoTime()-start) + " ns.");
        System.out.println("There were " + counter + "/" + NUM_TESTS +
                " valid characters");
        /*********************************/
        start = System.nanoTime();
        counter = 0;
        for (int i = 0; i < NUM_TESTS; i++) {
            if (testCharacter(Character.toString((char) (i % 128))))
                counter++;
        }
        System.out.println(NUM_TESTS + " tests of isLetter() took " +
                (System.nanoTime()-start) + " ns.");
        System.out.println("There were " + counter + "/" + NUM_TESTS +
                " valid characters");
        /*********************************/
        start = System.nanoTime();
        counter = 0;
        for (int i = 0; i < NUM_TESTS; i++) {
            if (testMatchesNoCache(Character.toString((char) (i % 128))))
                counter++;
        }
        System.out.println(NUM_TESTS + " tests of String.matches() took " +
                (System.nanoTime()-start) + " ns.");
        System.out.println("There were " + counter + "/" + NUM_TESTS +
                " valid characters");
    }

    private static boolean testMatches(final String c) {
        return ONE_CHAR_PATTERN.matcher(c).matches();
    }
    private static boolean testMatchesNoCache(final String c) {
        return c.matches("\\p{L}");
    }
    private static boolean testCharacter(final String c) {
        return Character.isLetter(c.charAt(0));
    }
}

And my output:

10000000 tests of Pattern.matches() took 4325146672 ns.
There were 4062500/10000000 valid characters
10000000 tests of isLetter() took 546031201 ns.
There were 4062500/10000000 valid characters
10000000 tests of String.matches() took 11900205444 ns.
There were 4062500/10000000 valid characters

So that's 2 almost 8x better, even with a cached Pattern. (And 1 uncached is nearly 3x worse than cached.)

Score: 22

Just checking if a letter is in A-Z because 7 that doesn't include letters with accents 6 or letters in other alphabets.

I found out 5 that you can use the regular expression 4 class for 'Unicode letter', or one of its 3 case-sensitive variations:

string.matches("\\p{L}"); // Unicode letter
string.matches("\\p{Lu}"); // Unicode upper-case letter

You can also do 2 this with Character class:

Character.isLetter(character);

but that is less convenient 1 if you need to check more than one letter.

More Related questions