0

I want to scrape data from https://angel.co/companies?locations[]=1688-United+States. Can anyone please guide me what should I do?

I know I should use BeautifulSoup or Selenium but eventually I found out that this web page is not static it changes its data time by time can anyone please guide me through it?

I think the angellist API web page is not working anymore.

dspencer
  • 3,898
  • 4
  • 19
  • 41
vish
  • 77
  • 2
  • 13
  • Can you add what you want to extract, the data format. – Tek Nath Feb 05 '20 at 10:47
  • @vish use `Selenium` – Zaraki Kenpachi Feb 05 '20 at 10:55
  • hello @tek nath if you open https://angel.co/companies?locations[]=1688-United+States this URL you will see one table, i want to extract table with column (Company , Location, Market, Website, Employees, Total Raised ) these 6 column i want to extract – vish Feb 05 '20 at 11:05

1 Answers1

1

You need to wait few second till table on page is generated:

from selenium import webdriver
import os
import time

chrome_driver = os.path.abspath(os.path.dirname(__file__)) + '/chromedriver'
browser = webdriver.Chrome(chrome_driver)
browser.get("https://angel.co/companies?locations[]=1688-United+States")
time.sleep(3)

data_row = browser.find_elements_by_class_name('base.startup')

for item in data_row:
    print('-'*100)
    company = item.find_element_by_class_name('name').text
    location = item.find_element_by_class_name('column.location').text
    print(company)
    print(location)

Output:

----------------------------------------------------------------------------------------------------
WP Engine
Austin
----------------------------------------------------------------------------------------------------
Kissmetrics
San Francisco
----------------------------------------------------------------------------------------------------
Bluesmart
San Francisco
----------------------------------------------------------------------------------------------------
Star.me
Los Angeles
...
...
Zaraki Kenpachi
  • 5,102
  • 2
  • 12
  • 33
  • @vish read this: https://stackoverflow.com/questions/2953834/windows-path-in-python – Zaraki Kenpachi Feb 06 '20 at 06:52
  • hello when i use this path "\Users\Dell User\Downloads\Compressed\chromedriver_win32" it shows an earror: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \UXXXXXXXX escape please help – vish Feb 06 '20 at 06:54
  • @Zarakai Kenpachi can you please make a correct path of C:\Users\Dell User\Downloads\Compressed\chromedriver_win32 how should i write it ? i have tried chrome_driver = "C:\\Users\\Dell User\\Downloads\\Compressed\\chromedriver_win32" still not working shows path does not found – vish Feb 06 '20 at 07:00
  • 1
    @vish sorry con't test that. I'm Linux user – Zaraki Kenpachi Feb 06 '20 at 07:02