Python usage of regular expressions

Question

How can I extract string1#string2 from the bellow line?

<![CDATA[<html><body><p style="margin:0;">string1#string2</p></body></html>]]>

The # character and the structure of the line is always the same.

Quick, dirty, and without regex: `newline = oldline.replace('
','').replace(' — Efferalgan, Oct 06 '16 at 08:57

score 1 · Answer 1 · edited May 23 '17 at 12:19

1

I would like to refer you to this gem:

In synthesis a regex is not the appropriate tool for this job
Also have you tried an XML parser instead?

EDIT:

import xml.etree.ElementTree as ET
a = "<html><body><p style=\"margin:0;\">string1#string2</p></body></html>"
root = ET.fromstring(a)
c = root[0][0].text

OUT:
c
'string1#string2'

d = c.replace('#', ' ').split()
Out: 
d 
['string1', 'string2']

edited May 23 '17 at 12:19

Community

1
1

answered Oct 06 '16 at 08:54

SerialDev

2,671
20
32

This is not an answer. The linked answer is epic, but that doesn't make this an answer. – unwind Oct 06 '16 at 09:01

score 1 · Answer 2 · answered Oct 06 '16 at 08:55

1

Simple, buggy, not reliable:

line.replace('<![CDATA[<html><body><p style="margin:0;">', "").replace('</p></body></html>]]>', "").split("#")

answered Oct 06 '16 at 08:55

unddoch

5,210
1
23
36

or just `line[42:57]` – donkopotamus Oct 06 '16 at 09:02

score 1 · Answer 3 · answered Oct 06 '16 at 09:12

1

re.search(r'[^>]+#[^<]+',s).group()

answered Oct 06 '16 at 09:12

zxy

148
1
2

score 0 · Accepted Answer · answered Oct 06 '16 at 09:00

0

If you wish to use a regex:

>>> re.search(r"<p.*?>(.+?)</p>", txt).group(1)
'string1#string2'

answered Oct 06 '16 at 09:00

donkopotamus

20,509
2
42
59

Python usage of regular expressions

4 Answers4