Python selenium to extract elements with xpath and for loop

Question

I am using Python/Selenium to extract some text from a website to further sort it in Google Sheets.

There are 15 headers for which I need to extract text. The text is found under each header in tag h5.

Here's one extract of a header:

<tr class="dayHeader">
 <td colspan="7" style="padding:10px 0;">
  <hr>
  <h5>&nbsp;&nbsp;Tuesday - 02 February 2021</h5>
 </td>
</tr>

What I have done is the following:

headers = driver.find_elements_by_tag_name('h5')
results = []

for header in headers:
    result = header.text
    results.append(result)

I'd prefer fetching the text from h5 going by the class above this tag, like so:

headers = driver.find_element(By.XPATH,"//tr[@class='dayHeader']/h5")

and add it to the mentioned for loop, but I can't seem to get this line to work. How can I do this?

If you only want `h5` elements within `tr` elements with the class `dayHeader`, then you can use `//tr[@class='dayHeader']/descendant::h5`. — Justin Ezequiel, Jan 28 '21 at 17:46

score 1 · Answer 1 · answered Jan 28 '21 at 17:46

1

Try this approach:

headers = [h.text for h in driver.find_elements(By.XPATH,"//tr[@class='dayHeader']/td/h5")]

This is a one-liner for extracting elements and extracting text values to a list.

answered Jan 28 '21 at 17:46

Alexey R.

6,081
1
8
25

undetected Selenium · Accepted Answer · 2021-01-28T21:20:59.293

You were almost there. / in xpath indicates first child. But the <h5> isn't the first child of //tr[@class='dayHeader'].

Solution

You need to replace the single forward slash i.e. / with a double forward slash i.e. // which will indicate a descendant. So your effective line of code will be:

print([my_elem.text for my_elem in driver.find_elements(By.XPATH, "//tr[@class='dayHeader']//h5")])

Ideally you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use the following Locator Strategy:

print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tr[@class='dayHeader']//h5")))])

Excellent. thanks for your detailed response, it's super helpful! — userX, Feb 02 '21 at 10:51

Python selenium to extract elements with xpath and for loop

2 Answers2

Solution

Linked

Related