0

I'm making a french verb conjugation Rails website where users may insert conjugations of verbs like:

     se abstenir
     m'appelle
     êtes
     achète

And I need to validate_format_of those verbs. The apostrophes are quite easy, but what about the êèã characters?

By now I have:

    word_format = /\A[\w]+[' ]?[\w]*\z/
    validates_format_of (...), :with => word_format

Which clearly doesn't work since \w doesn't match them. Also including áêĩ(...) to the regexp gives me a invalid multibyte char (US-ASCII) error.

I also need to upcase of downcase those strings, which ruby is ignoring, resulting in 'VOUS êTES' for example. The trivial answer seems to be doing it by hand, but I hope Ruby/Rails to surprise me again.

Its seems to be a hard problem, and I wasn't expecting since Ruby/Rails power.

Anybody could give me a clue?

alexandrecosta
  • 3,048
  • 2
  • 15
  • 15

2 Answers2

0

It looks like instead of \w you need to use the POSIX bracket expression [:alpha].

word_format = /\A[:alpha]+[' ]?[\w]*\z/
Community
  • 1
  • 1
ScottJShea
  • 6,893
  • 11
  • 43
  • 66
0

You'll need to install UnicodeUtils for the upcasing thing.

#encoding: utf-8
require "unicode_utils/upcase"
puts UnicodeUtils.upcase("êtes Niño")#=> ÊTES NIÑO

The regex could look like this:

word_format = /\A[[:word:]]+[' ]?[[:word:]]*\z/

/[[:word:]]/ - A character in one of the following Unicode general categories Letter, Mark, Number, Connector_Punctuation.

Cœur
  • 34,719
  • 24
  • 185
  • 251
steenslag
  • 76,334
  • 16
  • 131
  • 165