I am dealing with developing and Application for European Client and they have their native character set.
Now I need to have regex which would allow foreign characters like eéèêë etc and am not sure of how this can be done.
Any Suggestions ?
I am dealing with developing and Application for European Client and they have their native character set.
Now I need to have regex which would allow foreign characters like eéèêë etc and am not sure of how this can be done.
Any Suggestions ?
If all you want to match is letters (including "international" letters) you can use \p{L}.
You can find some information on regex and Unicode here.
\p{L} isn't cross-browser yet. Transpiling down from this will give you massively bloated code if you use it a lot.
Here is a short and sweet answer to generally including non-ascii letters that doesn't add a gazillion lines of JavaScript or plugins. Replace a-zA-Z0-9 or \w in your regex with this, and don't use the u flag:
\u00BF-\u1FFF\u2C00-\uD7FF\w
This inserted into all my JavaScript regexes in place of a-zA-Z0-9 or \w, seems to do the job. My context was in the discerning of UTF-8 in HTML and CSS, and it had to be cross-browser.
I can't believe it is this simple, so am waiting to be proved wrong, after a day's searching of trying to get something to work in Firefox...
I've only tested this using Japanese hirigana with a french accent.
[e\xE8\xE9\xEA\xEB] will match any one of eéèêë
If you want to match any Latin character with an accent or diacritic mark in virtually any regular expressions engine, try:
[A-Za-zŽžÀ-ÿ]
It matches any character in the "Printable and Extended ASCII Character" sets following:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
ŽžÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Matches {char} (ASCII character index, case sensitive):
| char(s) | index(start) | index(end) |
|---|---|---|
| [A-Z] | 65 | 90 |
| [a-z] | 97 | 122 |
| Ž | 142 | --- |
| ž | 158 | --- |
| [À-ÿ] | 192 | 255 |
Test it at https://regex101.com/r/Xbbtm1/1