4

I need to parse html tables to do things like get all cells in a column above/below or left/right of a certain cell. Is there a python library that can do this easily?

myahya
  • 2,999
  • 7
  • 36
  • 51

4 Answers4

2

BeautifulSoup

KurzedMetal
  • 12,066
  • 5
  • 38
  • 64
1

You can use lxml - XML and HTML with Python - to parse a table. Here is a simple example of what you can do with a table (load & iterate through rows).

Community
  • 1
  • 1
Sergei Danielian
  • 4,731
  • 4
  • 33
  • 57
0

Take a look at pyquery. It allows to make jquery queries on xml documents. A quick look at the API seemed that prevAll and nextAll can find the left/right cells. Think it will not be that difficult to get the above/below ones as well.

Can't Tell
  • 11,814
  • 9
  • 57
  • 86
0

This code convert all tables in page to lists.

import pandas as pd
url = r'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
tables = pd.read_html(url) # Returns list of all tables on page
sp500_table = tables[0] # Select table of interest