[ACCEPTED]-How to determine whether a character is a letter in Java?-unicode
Character.isLetter() is much faster than 12 string.matches(), because string.matches() compiles 11 a new Pattern every time. Even caching the 10 pattern, I think isLetter() would still 9 beat it.
EDIT: Just ran across this again and 8 thought I'd try to come up with some actual 7 numbers. Here's my attempt at a benchmark, checking 6 all three methods (matches()
with and without caching 5 the Pattern
, and Character.isLetter()
). I also made sure that there 4 were both valid and invalid characters checked, so 3 as not to skew things.
import java.util.regex.*;
class TestLetter {
private static final Pattern ONE_CHAR_PATTERN = Pattern.compile("\\p{L}");
private static final int NUM_TESTS = 10000000;
public static void main(String[] args) {
long start = System.nanoTime();
int counter = 0;
for (int i = 0; i < NUM_TESTS; i++) {
if (testMatches(Character.toString((char) (i % 128))))
counter++;
}
System.out.println(NUM_TESTS + " tests of Pattern.matches() took " +
(System.nanoTime()-start) + " ns.");
System.out.println("There were " + counter + "/" + NUM_TESTS +
" valid characters");
/*********************************/
start = System.nanoTime();
counter = 0;
for (int i = 0; i < NUM_TESTS; i++) {
if (testCharacter(Character.toString((char) (i % 128))))
counter++;
}
System.out.println(NUM_TESTS + " tests of isLetter() took " +
(System.nanoTime()-start) + " ns.");
System.out.println("There were " + counter + "/" + NUM_TESTS +
" valid characters");
/*********************************/
start = System.nanoTime();
counter = 0;
for (int i = 0; i < NUM_TESTS; i++) {
if (testMatchesNoCache(Character.toString((char) (i % 128))))
counter++;
}
System.out.println(NUM_TESTS + " tests of String.matches() took " +
(System.nanoTime()-start) + " ns.");
System.out.println("There were " + counter + "/" + NUM_TESTS +
" valid characters");
}
private static boolean testMatches(final String c) {
return ONE_CHAR_PATTERN.matcher(c).matches();
}
private static boolean testMatchesNoCache(final String c) {
return c.matches("\\p{L}");
}
private static boolean testCharacter(final String c) {
return Character.isLetter(c.charAt(0));
}
}
And my output:
10000000 tests of Pattern.matches() took 4325146672 ns. There were 4062500/10000000 valid characters 10000000 tests of isLetter() took 546031201 ns. There were 4062500/10000000 valid characters 10000000 tests of String.matches() took 11900205444 ns. There were 4062500/10000000 valid characters
So that's 2 almost 8x better, even with a cached Pattern
. (And 1 uncached is nearly 3x worse than cached.)
Just checking if a letter is in A-Z because 7 that doesn't include letters with accents 6 or letters in other alphabets.
I found out 5 that you can use the regular expression 4 class for 'Unicode letter', or one of its 3 case-sensitive variations:
string.matches("\\p{L}"); // Unicode letter
string.matches("\\p{Lu}"); // Unicode upper-case letter
You can also do 2 this with Character class:
Character.isLetter(character);
but that is less convenient 1 if you need to check more than one letter.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.