[ACCEPTED]-RegEx: \w - "_" + "-" in UTF-8-pcre

Accepted answer
Score: 20

Try this:

(?:[\w\-](?<!_))+

It does a simple match on anything 21 that is encoded as a \w (or a dash) and 20 then has a zero-width lookbehind that ensures 19 that the character that was just matched 18 is not a underscore.

Otherwise you could 17 pick this one:

(?:[^_\W]|-)+

which is a more set-based 16 approach (note the uppercase W)

OK, I had 15 a lot of fun with unicode in php's flavor 14 of PCREs :D Peekaboo says there is a simple 13 solution available:

[\p{L}\p{N}\-]+

\p{L} matches anything 12 unicode that qualifies as a Letter (note: not 11 a word character, thus no underscores), while 10 \p{N} matches anything that looks like a 9 number (including roman numerals and more 8 exotic things).
\- is just an escaped dash. Although 7 not strictly necessary, I tend to make it 6 a point to escape dashes in character classes... Note, that 5 there are dozens of different dashes in 4 unicode, thus giving rise to the following 3 version:

[\p{L}\p{N}\p{Pd}]+

Where "Pd" is Punctuation Dash, including, but 2 not limited to our minus-dash-thingy. (Note, again 1 no underscore here).

Score: 3

I am not sure which language you use, but 2 in PERL you can simply write: [[:alnum:]-]+ when 1 the correct locale is set.

More Related questions