0

I'm trying to retrieve saleprice from redfin for the house.

Here's the part of HTML:

<div class="timeline"><div class="property-history-content-container"><div class="timeline-content"><h4 class="section-header col-12">Today</h4><div class="sold-row row PropertyHistoryEventRow" id="propertyHistory-0"><div class="col-4"><p>Oct 15, 2020</p><p class="subtext">Date</p></div><div class="description-col col-4"><div>Sold (MLS) (Closed)</div><div></div><p class="subtext">ACTRIS #5085856</p></div><div class="col-4"><div class="price-col number">$302,000<span class="number empty"> </span></div><p class="subtext">Price</p></div></div><div class=" row PropertyHistoryEventRow" id="propertyHistory-1"><div class="col-4"><p>Sep 16, 2020</p><p class="subtext">Date</p></div><div class="description-col col-4"><div>Contingent (Active Under Contract)</div><div></div><p class="subtext">ACTRIS #5085856</p></div><div class="col-4"><div class="price-col number">—<span class="number empty"> </span></div><p class="subtext">Price</p></div></div><div class=" row PropertyHistoryEventRow" id="propertyHistory-2"><div class="col-4"><p>Sep 8, 2020</p><p class="subtext">Date</p></div><div class="description-col col-4"><div>Pending (Pending - Taking Backups)</div><div></div><p class="subtext">ACTRIS #5085856</p></div><div class="col-4"><div class="price-col number">—<span class="number empty"> </span></div><p class="subtext">Price</p></div></div><div class=" row PropertyHistoryEventRow" id="propertyHistory-3"><div class="col-4"><p>Sep 5, 2020</p><p class="subtext">Date</p></div><div class="description-col col-4"><div>Listed (Active)</div><div></div><p class="subtext">ACTRIS #5085856</p></div><div class="col-4"><div class="price-col number">$294,900<span class="number empty"> </span></div><p class="subtext">Price</p></div></div></div></div></div>

Here's the part of mycode:

url = 'https://www.redfin.com/TX/Cedar-Park/615-Fence-Post-Pass-78613/home/32939011'
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser') 
soup.find_all(class_ = 'timeline-content')

However, the code does not return the "sold" event, only 3 events before sold.

Below is the result of soup.find_all(class_ = 'timeline-content')

[<div class="timeline-content"><h4 class="section-header col-12">Today</h4><div class="row PropertyHistoryEventRow" id="propertyHistory-0"><div class="col-4"><p>Sep 16, 2020</p><p class="subtext">Date</p></div><div class="description-col col-4"><div>Contingent (Active Under Contract)</div><div></div><p class="subtext">ACTRIS #5085856</p></div><div class="col-4"><div class="price-col number">—<span class="number empty"> </span></div><p class="subtext">Price</p></div></div><div class="row PropertyHistoryEventRow" id="propertyHistory-1"><div class="col-4"><p>Sep 8, 2020</p><p class="subtext">Date</p></div><div class="description-col col-4"><div>Pending (Pending - Taking Backups)</div><div></div><p class="subtext">ACTRIS #5085856</p></div><div class="col-4"><div class="price-col number">—<span class="number empty"> </span></div><p class="subtext">Price</p></div></div><div class="row PropertyHistoryEventRow" id="propertyHistory-2"><div class="col-4"><p>Sep 5, 2020</p><p class="subtext">Date</p></div><div class="description-col col-4"><div>Listed (Active)</div><div></div><p class="subtext">ACTRIS #5085856</p></div><div class="col-4"><div class="price-col number empty">**<span class="number empty"> </span></div><p class="subtext">Price</p></div></div></div>]
HedgeHog
  • 12,487
  • 2
  • 11
  • 31
jxariel
  • 3
  • 2

1 Answers1

0

Add the the tag you want to find too soup.find('div', class_='timeline-content') that works.

Example based on provided html

from bs4 import BeautifulSoup

html = """
<div class="timeline"><div class="property-history-content-container"><div class="timeline-content"><h4 class="section-header col-12">Today</h4><div class="sold-row row PropertyHistoryEventRow" id="propertyHistory-0"><div class="col-4"><p>Oct 15, 2020</p><p class="subtext">Date</p></div><div class="description-col col-4"><div>Sold (MLS) (Closed)</div><div></div><p class="subtext">ACTRIS #5085856</p></div><div class="col-4"><div class="price-col number">$302,000<span class="number empty"> </span></div><p class="subtext">Price</p></div></div><div class=" row PropertyHistoryEventRow" id="propertyHistory-1"><div class="col-4"><p>Sep 16, 2020</p><p class="subtext">Date</p></div><div class="description-col col-4"><div>Contingent (Active Under Contract)</div><div></div><p class="subtext">ACTRIS #5085856</p></div><div class="col-4"><div class="price-col number">—<span class="number empty"> </span></div><p class="subtext">Price</p></div></div><div class=" row PropertyHistoryEventRow" id="propertyHistory-2"><div class="col-4"><p>Sep 8, 2020</p><p class="subtext">Date</p></div><div class="description-col col-4"><div>Pending (Pending - Taking Backups)</div><div></div><p class="subtext">ACTRIS #5085856</p></div><div class="col-4"><div class="price-col number">—<span class="number empty"> </span></div><p class="subtext">Price</p></div></div><div class=" row PropertyHistoryEventRow" id="propertyHistory-3"><div class="col-4"><p>Sep 5, 2020</p><p class="subtext">Date</p></div><div class="description-col col-4"><div>Listed (Active)</div><div></div><p class="subtext">ACTRIS #5085856</p></div><div class="col-4"><div class="price-col number">$294,900<span class="number empty"> </span></div><p class="subtext">Price</p></div></div></div></div></div>
"""

soup=BeautifulSoup(html,'html.parser')

soup.find('div', class_='timeline-content').find('div', class_='price-col number').text

Output

$302,000

Attention You have to login to get your provided html else it would not contain all the information you like to scrape e.g. price

HedgeHog
  • 12,487
  • 2
  • 11
  • 31
  • Thanks! do you have a sample code that can login to redfin from python? – jxariel Jan 06 '21 at 16:01
  • Happy to help, and welcome to Stack Overflow. If this answer or any other one solved your issue, please mark it as accepted - [someone-answers](https://stackoverflow.com/help/someone-answers) - Concerning the login you should open another question and reference with a link. [Simple Solution](https://stackoverflow.com/a/21186465/14460824) – HedgeHog Jan 06 '21 at 16:56