-1

Wondering if there is any way that I can simply grab all matches of a RegEx query in a specific, non-unique div in python? I am aware that beautiful soup would be useful but we are specifically not allowed to use it. Why I don't know. I essentially need any matches in a particular div. Another way of looking at it is all the matches of a findall query that are from the first div.

e.g.

<div class="text">
   <p class="content1">content1</p>
   <p class="content2">content2</p>
</div>
<div class="text">
   <p class="content3">content3</p>
   <p class="content4">content4</p>
</div>

I can find the contents of the div easily(?<=<div class="text"><p class="content">)(.+?)(?=</p></div>) (and then .group(0)) and one content piece e.g. can find content1 totally fine but I need to find content1 and content2 - not content3 and content4

Also taking into account there can be one or more instances of a <p>content</p> tag in a div. Any recommendations?

  • What is the logic by which you want to find the first div over the second? – Tim Biegeleisen Jun 03 '22 at 03:51
  • The application is scraping a booking site for the next event that is on. This also needs to account for multiple times that event is next on. So if each different event is one div, and each has multiple event dates/times, I need to only get dates and times of the first event. – Elliot Cullen Jun 03 '22 at 04:58

0 Answers0