1

I am trying to get only the link from the result of find_all()

Here is my code:

    mydivs = soup.find_all("td", {"class": "candidates"})
    for link in mydivs:
        print(link)

But it returns:

<td class="candidates"><div><a data-tn-element="view-unread-candidates" data-tn-link="true" href="/c#candidates?id=a722443b402&amp;ctx=jobs-tab-view-candidates">56 candidates</a><br/><a data-tn-element="view-unread-candidates" data-tn-link="true" href="/c#candidates?id=a7b2a139b402&amp;candidateFilter=4af15d8991a8"><span class="jobs-u-font--bold">(45 awaiting review)</span></a></div></td>

What I want to get:

/c#candidates?id=a722443b402&amp;ctx=jobs-tab-view-candidates

Solal
  • 521
  • 1
  • 6
  • 21

1 Answers1

0

You can use regex to parse everything between the href and the last quotation mark after converting the bs4 element into a string.

import re

#Rest of imports/code up until your script. 

mydivs = soup.find_all("td", {"class": "candidates"})
or link in mydivs:
   link_text = str(link)
   href_link = re.search('href = "(.+?)"', link_text)
   print(href_link.group(1))

Small Example Shown Below:

import re

link_text = '<td class = "candidates" > <div > <a data-tn-element = "view-unread-candidates" data-tn-link = "true" href = "/c#candidates?id=a722443b402&amp;ctx=jobs-tab-view-candidates" > 56 candidates < /a > <br/> < a data-tn-element = "view-unread-candidates" data-tn-link = "true" href = "/c#candidates?id=a7b2a139b402&amp;candidateFilter=4af15d8991a8" > <span class = "jobs-u-font--bold" > (45 awaiting review) < /span > </a > </div > </td >'
href_link = re.search('href = "(.+?)"', link_text)
print(href_link.group(1))

Output:

/c#candidates?id=a722443b402&amp;ctx=jobs-tab-view-candidates

You may need to work on the spacing with the href = " inside of the re.search since I cannot see what the tag looks like. But all you need to do is copy the exact text from the href up until the first character of the link you want for this to work.

Edeki Okoh
  • 1,676
  • 14
  • 26
  • See my comment above. – daka May 10 '19 at 18:21
  • No, because it's unnecessarily complicated, which makes it a bad answer, and worthy of a down vote. – daka May 10 '19 at 18:23
  • Seeing how the user tried the post you marked as duplicate and it returned None I would not called this overly complicated but rather a solution that works. – Edeki Okoh May 10 '19 at 18:32