Is it possible to use a regex to match "February 2009", for example?
Asked
Active
Viewed 5.4k times
35
-
3Is it allowed to match "Undecimber 15000"? – kennytm Apr 16 '10 at 19:05
-
The restrictions are January - December, followed by 1990 - 2010. Fortunately, non-english isn't a concern. – Jeremy Apr 16 '10 at 19:08
-
1*followed by 1990 - 2010* — Is it intentional to have this tight upper limit? There's only 8 months left to 2011. – kennytm Apr 16 '10 at 19:16
4 Answers
54
Along the lines of
\b(?:Jan(?:uary)?|Feb(?:ruary)?|...|Dec(?:ember)?) (?:19[7-9]\d|2\d{3})(?=\D|$)
that's
\b # a word boundary
(?: # non-capturing group
Jan(?:uary)? # Jan(uary)
|Feb(?:ruary)? #
|... # and so on
|Dec(?:ember)? # Dec(ember)
) # end group
# a space
(?: # non-capturing group
19[7-9]\d|2\d{3} # 1970-2999
) # end group
(?=\D|$) # followed by: anything but a digit or the end of string
Tomalak
- 322,446
- 66
- 504
- 612
30
I had to work on this to match a few fringe examples, but I ended up using
(\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)\D?(\d{1,2}\D?)?\D?((19[7-9]\d|20\d{2})|\d{2})
to capture dates with word months in them
Beerswiller
- 529
- 5
- 8
-
4Just a minor thing, for the months instead of (Nov|Dec) it should be (?:Nov|Dec), or at least I had to change that in order for it to work with Python otherwise it was returning an empty [''] match – Walter R Aug 03 '17 at 19:04
-
You can add (?i)(regex_part_to_make_case_insensitive) or (?i)regex_part_to_make_case_insensitive(?-i) depending on the regex processor you are using. – Onyr Apr 22 '21 at 16:04
4
Modifying Beerswiller's answer, if you want "st"/"nd"/"rd" variations:
(\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)\D?(\d{1,2}(st|nd|rd|th)?)?(([,.\-\/])\D?)?((19[7-9]\d|20\d{2})|\d{2})*
Pedro Machin
- 51
- 3
3
This regex accounts for some spacing around the comma.
Sometimes it's not always in the right place.
((\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)\D?)(\d{1,2}(st|nd|rd|th)?)?((\s*[,.\-\/]\s*)\D?)?\s*((19[0-9]\d|20\d{2})|\d{2})*