3

I am trying to validate 'words' with Ruby 1.8.7.

My regex to catch a word is currently:

/[a-zA-Z]\'*\-*/

This will only catch English words; Is there a way to catch non-English UTF-8 characters?

the Tin Man
  • 155,156
  • 41
  • 207
  • 295
ethicalhack3r
  • 1,002
  • 3
  • 15
  • 16

1 Answers1

4

Even the 1.8.x Regex engine is UTF-8 aware, you just need to use the right expression, and it's slightly more than just using /\w/:

s = "résumé and some other words"
puts s[/[a-z]+/u]
puts s[/\w+/u]

and you get:

r
résumé
DigitalRoss
  • 139,415
  • 24
  • 238
  • 326