0

I'm using a regex to find hostname in a string which match all the possible combinations but not able to construct one.

names = ['www.google.com.in','w.stack.in','www.code31ws.com','google.com','ww.sample.co']

regex = '(w{3}?\.?)?[\w?-]+\.(com|in|edu|co)'
for i in range(len(ips)):
    reg = re.search(regex,ips[i])
    if reg:
        print "true {}".format(i)
    else:
        print "false {}".format(i)

Result it:

true 0
true 1
true 2
true 3
true 4

Want that it should not match:

w.stack.in
ww.sample.com
Avinash Raj
  • 166,785
  • 24
  • 204
  • 249
Aniket
  • 93
  • 1
  • 2
  • 10

2 Answers2

3

Your regex works fine. Your problem is how you are using it. You used re.search(). Well, re can find a match in w.stack.in because stack.in matches. What you want, however, is to make sure the whole string matches. For that, use re.match(). See search() vs. match(). A second option would be to put ^ at the beginning of the expression to say that it must be at the beginning of the string.

zondo
  • 19,040
  • 7
  • 42
  • 82
1

The reason this happens is because the www is optional and search only needs to match part of the string, so it's just matching the rest:

>>> re.search('(w{3}?\.?)?[\w?-]+\.(com|in|edu|co)', 'w.stack.in').group()
'stack.in'

You can fix it by using match which must match the whole string:

>>> re.match('(w{3}?\.?)?[\w?-]+\.(com|in|edu|co)', 'w.stack.in') is None
True

By the way I would simplify the first part to just (www\.)?.

Alex Hall
  • 33,530
  • 5
  • 49
  • 82