Repeating regex groups

Question

I'm trying to get some information from a web site. The information I want is in a table so I made a regex but I don't know the right way to simplify it.

The following are two parts of my regex that I would like to simplify:

<br>(.*)<br>(.*)<br>(.*)

<tr><td>(.+)r>(.+)r>(.+)r>(.+).+</td></tr> # This part should be repeated n times(n = 1 to 10)

I looked through the python documentation and I can't realize how to do it. Perhaps you can give me a hint.

Thank you, mF.

Don't use regex for HTML! Use an HTML parser. – Ben James Jan 01 '10 at 20:06 — Ben James, Jan 01 '10 at 20:06

score 3 · Answer 1 · edited May 23 '17 at 10:32

3

RegEx match open tags except XHTML self-contained tags

"Have you tried using an XML parser instead?"

EDIT: This is the way to go: Beautiful Soup

edited May 23 '17 at 10:32

Community

1
1

answered Jan 01 '10 at 20:06

Isaac

14,813
8
51
75

score 3 · Accepted Answer · answered Jan 01 '10 at 20:20

This is the wrong way to go unless you're trying to scrape some data out of a tiny fragment.

It would be much better if you used a tolerant HTML. BeautifulSoup mentioned earlier is a good one but it's stagnating and I don't believe it's being maintained actively anymore.

A highly recommended parser for Python is lxml.

There was a long thread discussing parsing XHTML on one of our local mailing lists here which you might find useful too.

score 1 · Answer 3 · answered Jan 01 '10 at 20:12

1

You just need to put the block in parens and then use the {...} operators, e.g.:

(foo...){1,10}

Matches 1 to 10 instances of the thing inside of there. Given your example above, you can nest those:

((f..)(b..)){1,10}

answered Jan 01 '10 at 20:12

scotchi

2,259
2
18
21

Repeating regex groups

3 Answers3