-1

How can I extract string1#string2 from the bellow line?

<![CDATA[<html><body><p style="margin:0;">string1#string2</p></body></html>]]>

The # character and the structure of the line is always the same.

Ciprian Vintea
  • 398
  • 2
  • 3
  • 14

4 Answers4

1

I would like to refer you to this gem:

In synthesis a regex is not the appropriate tool for this job
Also have you tried an XML parser instead?

EDIT:

import xml.etree.ElementTree as ET
a = "<html><body><p style=\"margin:0;\">string1#string2</p></body></html>"
root = ET.fromstring(a)
c = root[0][0].text

OUT:
c
'string1#string2'

d = c.replace('#', ' ').split()
Out: 
d 
['string1', 'string2']
Community
  • 1
  • 1
SerialDev
  • 2,671
  • 20
  • 32
1

Simple, buggy, not reliable:

line.replace('<![CDATA[<html><body><p style="margin:0;">', "").replace('</p></body></html>]]>', "").split("#")
unddoch
  • 5,210
  • 1
  • 23
  • 36
1
re.search(r'[^>]+#[^<]+',s).group()
zxy
  • 148
  • 1
  • 2
0

If you wish to use a regex:

>>> re.search(r"<p.*?>(.+?)</p>", txt).group(1)
'string1#string2'
donkopotamus
  • 20,509
  • 2
  • 42
  • 59