8

Say i have a list of sentences, many of which contain numbers (but not all):

mylist = [
"The current year is 2015 AD.",
"I have 2 dogs."
...
]

I want to know which elements in the list contain a valid year (say, between 1000 and 3000). I know this is a regex issue, and i have found a few posts (e.g., this one) that address detecting digits in strings, but nothing on full years. Any regex wizards out there?

Community
  • 1
  • 1
Nolan Conaway
  • 2,413
  • 1
  • 24
  • 40
  • 1
    Or [Regular expression match to test for a valid year](http://stackoverflow.com/q/4374185), but that doesn't contain the right answer: don't use regex. Then there's also [Regular expression numeric range](http://stackoverflow.com/q/1377926), which does. – jscs Nov 26 '15 at 04:24

3 Answers3

13

Sounds like you are looking for a regex that will find 4 digit numbers where the first digit is between 1 & 3 and the next 3 digits are between 0 and 9 so I think you are looking for something like this

[1-3][0-9]{3}

If you want to accept strings that contain this you could do

.*([1-3][0-9]{3})
pwilmot
  • 586
  • 2
  • 8
11

Here's a simple solution:

import re
mylist = [] # init the list
for l in mylist:
    match = re.match(r'.*([1-3][0-9]{3})', l)
    if match is not None:
        # Then it found a match!
        print match.group(1)

This will check to see if there is a 4 digit number between 1000 and 3999

Chrispresso
  • 3,485
  • 13
  • 26
3

Well a year can so fare be a lot of things. most commen it is 4 digits long yes, but it is just a number. If you want all years from 1000 and till 9999 you can use this regex: ([1-9][0-9]{3}) but to match the range you need: ([1-2][0-9]{3}|3000)

aweis
  • 4,918
  • 3
  • 28
  • 39