1

I am trying to get the Name and Numbers from a string that looks like:

string = '><span>Name</span></p><div class="info"><span>100 years old<'

The thing is that the following pattern is not getting all numbers:

re.findall('<span>([a-zA-Z]+)</span>(.*)([0-9]+)',string)

Instead it returns the last numbers from the group of numbers (from the example above '0')

[('Name','</p><div class="info"><span>10','0')]

I want it to return [('Name','</p><div class="info"><span>','100')]


I know that I can do the following to get it working.

re.findall('<span>([a-zA-Z]+)</span>(.*)>([0-9]+)',string)

But, why is the first regex not getting all numbers?

zurfyx
  • 27,640
  • 18
  • 109
  • 139

2 Answers2

3

.* is greedy by default - changing that selector to .*? results in a non-greedy matcher:

>>> re.findall('<span>([a-zA-Z]+)</span>(.*?)([0-9]+)',string)
[('Name', '</p><div class="info"><span>', '100')]
Sean Vieira
  • 148,604
  • 32
  • 306
  • 290
1

Because the "." is getting some of the numbers.

You can try this instread

"([a-zA-Z]+)(\\D*)([\\d]+)"

NOTE : I do not know if you need to escape "\".

Logan Murphy
  • 5,920
  • 3
  • 23
  • 41