[ACCEPTED]-RegEx: \w - "_" + "-" in UTF-8-pcre
Try this:
(?:[\w\-](?<!_))+
It does a simple match on anything 21 that is encoded as a \w (or a dash) and 20 then has a zero-width lookbehind that ensures 19 that the character that was just matched 18 is not a underscore.
Otherwise you could 17 pick this one:
(?:[^_\W]|-)+
which is a more set-based 16 approach (note the uppercase W)
OK, I had 15 a lot of fun with unicode in php's flavor 14 of PCREs :D Peekaboo says there is a simple 13 solution available:
[\p{L}\p{N}\-]+
\p{L} matches anything 12 unicode that qualifies as a Letter (note: not 11 a word character, thus no underscores), while 10 \p{N} matches anything that looks like a 9 number (including roman numerals and more 8 exotic things).
\- is just an escaped dash. Although 7 not strictly necessary, I tend to make it 6 a point to escape dashes in character classes... Note, that 5 there are dozens of different dashes in 4 unicode, thus giving rise to the following 3 version:
[\p{L}\p{N}\p{Pd}]+
Where "Pd" is Punctuation Dash, including, but 2 not limited to our minus-dash-thingy. (Note, again 1 no underscore here).
I am not sure which language you use, but 2 in PERL you can simply write: [[:alnum:]-]+ when 1 the correct locale is set.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.