How do I use a Ruby regex to capture non-English words?

Question

I am trying to validate 'words' with Ruby 1.8.7.

My regex to catch a word is currently:

/[a-zA-Z]\'*\-*/

This will only catch English words; Is there a way to catch non-English UTF-8 characters?

Possible duplicate http://stackoverflow.com/questions/397788/why-does-w-match-only-english-words-in-javascript-regex — Homam, Jun 05 '11 at 18:09
@Geek. Good point, `/\w+/` is right, but he also needs `/\w+/u` — DigitalRoss, Jun 05 '11 at 19:08

score 4 · Accepted Answer · answered Jun 05 '11 at 19:06

4

Even the 1.8.x Regex engine is UTF-8 aware, you just need to use the right expression, and it's slightly more than just using /\w/:

s = "résumé and some other words"
puts s[/[a-z]+/u]
puts s[/\w+/u]

and you get:

r
résumé

answered Jun 05 '11 at 19:06

DigitalRoss

1 Answers1