Expressive python library for parsing HTML tables

Question

I need to parse html tables to do things like get all cells in a column above/below or left/right of a certain cell. Is there a python library that can do this easily?

score 2 · Answer 1 · answered Apr 26 '12 at 14:13

2

BeautifulSoup

answered Apr 26 '12 at 14:13

KurzedMetal

12,066
5
38
64

score 1 · Answer 2 · edited May 23 '17 at 12:12

1

You can use lxml - XML and HTML with Python - to parse a table. Here is a simple example of what you can do with a table (load & iterate through rows).

edited May 23 '17 at 12:12

Community

1
1

answered Apr 26 '12 at 14:16

Sergei Danielian

4,731
4
33
57

score 0 · Answer 3 · answered Apr 26 '12 at 14:22

0

Take a look at pyquery. It allows to make jquery queries on xml documents. A quick look at the API seemed that prevAll and nextAll can find the left/right cells. Think it will not be that difficult to get the above/below ones as well.

answered Apr 26 '12 at 14:22

Can't Tell

11,814
9
57
86

score 0 · Answer 4 · answered Mar 07 '19 at 22:10

0

This code convert all tables in page to lists.

import pandas as pd
url = r'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
tables = pd.read_html(url) # Returns list of all tables on page
sp500_table = tables[0] # Select table of interest

answered Mar 07 '19 at 22:10

Alexandr Ovdienko

86
4

Expressive python library for parsing HTML tables

4 Answers4