-1

I'm iterating through pages and I'd like to modify lines containing

<span class="font16"></span>

How can I correct the code below?

text = re.sub(r'<span class="font(.*)"></span><span', r'<span class="font\1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </span><span', text)
Ahmad Alfy
  • 12,332
  • 6
  • 62
  • 95
MarkF6
  • 493
  • 5
  • 20

1 Answers1

1

The pattern .* will match anything until the end of line, so the match will look like this:

16"></span>....

which isn't what you want. Use a pattern that stops at the first " (since they aren't allowed inside attribute values which are quoted with "):

r'<span class="font([^"]+)"></span><span'
Aaron Digulla
  • 310,263
  • 103
  • 579
  • 794
  • Ok, now, I've got this: text = re.sub(r'{} ; where I'd like to insert the signs between font16"> and . – MarkF6 Sep 10 '13 at 08:24
  • I'm wondering why the span is empty. You should probably search for `` without the closing `` or the next span.. – Aaron Digulla Sep 10 '13 at 11:53