Regex for Umlaut

Question

I am using JS Animated Contact Form with this line of validation regex:

rx:{".name":{rx:/^[a-zA-Z'][a-zA-Z-' ]+[a-zA-Z']?$/,target:'input'}, other fields...

I just found out, that I can't enter name like "Müller". The regex will not accept this. What do I have to do, to allow also Umlauts?

you could use `\w` as 'word', but you'll have to test if that mathces the umlaut — Martijn, Feb 25 '14 at 14:55

score 45 · Accepted Answer · answered Feb 25 '14 at 14:54

45

You should use in your regex unicode codes for characters, like \u0080. For German language, I found following table:

Zeichen     Unicode
------------------------------
Ä, ä        \u00c4, \u00e4
Ö, ö        \u00d6, \u00f6
Ü, ü        \u00dc, \u00fc
ß           \u00df

(source http://javawiki.sowas.com/doku.php?id=java:unicode)

answered Feb 25 '14 at 14:54

IProblemFactory

9,201
8
47
65

30

[The holy grail](http://unicode-table.com/en). You can also write ranges, ie. `[\u00F0-\u02AF]`. – tenub Feb 25 '14 at 14:56
1

Use this to find hidden characters except German Umlaute: https://regexr.com/4pmml – ThreeCheeseHigh Nov 27 '19 at 20:24
1

in case you are using PHP use `[\x{00F0}-\x{02AF}]` – Alberto Sinigaglia Sep 09 '21 at 10:51

score 24 · Answer 2 · edited May 23 '17 at 11:54

24

Try using this:

/^[\u00C0-\u017Fa-zA-Z'][\u00C0-\u017Fa-zA-Z-' ]+[\u00C0-\u017Fa-zA-Z']?$/

I have added the unicode range \u00C0-\u017F to the start of each of the square bracket groups.

Given that /^[\u00C0-\u017FA-Za-z]+$/.test("aeiouçéüß") returns true, I expect it should work.

Credit to https://stackoverflow.com/a/11550799/940252.

edited May 23 '17 at 11:54

Community

1
1

answered Feb 25 '14 at 15:01

Josh Harrison

5,842
1
28
44

`[\u00C0-\u017Fa-zA-Z']?`$/ is kind of redundant, what are you trying to do? – Feb 25 '14 at 17:17
I'm not sure as I'm not terribly hot on regex and the OP didn't specify the pattern they're hoping to match. I just worked with their original code. If you can clean it up please do! :) – Josh Harrison Feb 25 '14 at 17:21
I would venture to change that space to something else to capture all non-word characters like hyphens. Here's a test: https://regex101.com/r/zH5uV0/4 – Mike Kormendy Jul 24 '16 at 14:01
2

`/^[\u00C0-\u017Fa-zA-Z'][\u00C0-\u017Fa-zA-Z-' ]+[\u00C0-\u017Fa-zA-Z']?$/.test("ü") -> false` – Zane Hitchcox Aug 18 '19 at 04:13

score 6 · Answer 3 · answered Sep 02 '19 at 09:08

I came up with a combination of different ranges:

[A-Za-zÀ-ž\u0370-\u03FF\u0400-\u04FF]

But I see that it misses some letters of @SambitD proposal, refer to: https://rubular.com/r/2g00QJK4rBS8Y4

score 3 · Answer 4 · answered May 24 '19 at 13:38

I used

A-Za-z-ÁÀȦÂÄǞǍĂĀÃÅǺǼǢĆĊĈČĎḌḐḒÉÈĖÊËĚĔĒẼE̊ẸǴĠĜǦĞG̃ĢĤḤáàȧâäǟǎăāãåǻǽǣćċĉčďḍḑḓéèėêëěĕēẽe̊ẹǵġĝǧğg̃ģĥḥÍÌİÎÏǏĬĪĨỊĴĶǨĹĻĽĿḼM̂M̄ʼNŃN̂ṄN̈ŇN̄ÑŅṊÓÒȮȰÔÖȪǑŎŌÕȬŐỌǾƠíìiîïǐĭīĩịĵķǩĺļľŀḽm̂m̄ŉńn̂ṅn̈ňn̄ñņṋóòôȯȱöȫǒŏōõȭőọǿơP̄ŔŘŖŚŜṠŠȘṢŤȚṬṰÚÙÛÜǓŬŪŨŰŮỤẂẀŴẄÝỲŶŸȲỸŹŻŽẒǮp̄ŕřŗśŝṡšşṣťțṭṱúùûüǔŭūũűůụẃẁŵẅýỳŷÿȳỹźżžẓǯßœŒçÇ

which supports almost all the chars in Europe. Source of truth

No sane programmer would list all characters, when there are shorthand character classes and ranges. Please, don't do that. — user1438038, Dec 17 '19 at 14:21

score 1 · Answer 5 · answered Dec 08 '21 at 10:19

In JS, you can use the u flag on regular expressions to enable access to a special "meta sequence", namely \P. \P is a Unicode aware lookup that has a special Letter category. This category will match German, Swedish, Scandinavian, cyrillic characters etc.

In short, use this:

/\p{Letter}/u

Props to this article by Till Sanders.

score 0 · Answer 6 · answered Mar 22 '18 at 11:49

The problem with the \uXXXX approach is, that it is not supported by all Regex flavours. For example Visual C++ does not support it. There, you would need to enumerate the actual letters.

I recommend to use a tool like https://www.regexbuddy.com/ that knows as many flavors as possible.

Regex for Umlaut

6 Answers6

Linked

Related