Find text between specific characters regex

Question

data="""
<section class="descr"><a href="https://audioz.
download/uploads/posts/2020-04/1588148680_2020-
04-29_112330.jpg" rel="highslide" 
class="highslide thumbnail"><img src="https://
audioz.download/uploads/posts/2020-04/thumbs/
1588148680_2020-04-29_112330.jpg" data-src="https://audioz.download/uploads/posts/2020-04/thumbs/1588148680_2020-04-29_112330.jpg" class=
"thumbnail" itemprop="image" alt="Capella 
v8.0.16-CRD screenshot"></a><br><mark class=
"subtitle"> CRD | 06.2020 | 129 MB</mark><br>
With capella you can instantly create complete 
scores … No other notation program will take you 
by the hand and gently guide you towards your first own score in the manner in which capella does it. There is no need to be fully computer literate - you just follow your musical imagination and capella does 
the rest. Within no time you will have completed your first score sheet.</section> """

I would like to extract the text in the end with regex. Output should be:

With capella you can instantly create complete 
scores … No other notation program will take you 
by the hand and gently guide you towards your first own score in the manner in which capella does it. There is no need to be fully computer literate - you just follow your musical imagination and capella does 
the rest. Within no time you will have completed your first score sheet.

How would I do that? What I did in Python is:

import re
reg = re.compile(r'(?<=\<\/mark\>\<br\>)(.*)(?=\<\/section\>)')
mo = reg.search(data)

mo.group()

But is there a more efficient way to do this?

As a general rule, don't use regexes to parse HTML — it's not a regular language. — martineau, Jul 04 '21 at 20:14

Find text between specific characters regex

0 Answers0