How to scrape rugby union data from espn.co.uk

Question

This is the site I want to gather data from but I'm really not sure where to start.

http://www.espn.co.uk/rugby/playerstats?gameId=271423&league=271937

Player data, they often list all players in a table containing various stats i.e. tackles/meters run etc, from the team that played, that's the table I want. — Rugby_fan, Jan 13 '16 at 17:24
Are you looking for just the displayed data or a lot more? Are you trying to follow the advice from http://opendata.stackexchange.com/questions/5676/rugby-union-data — , Jan 15 '16 at 16:25
I'm trying to scrape the data from the espn site using a script rather than manually copy and pasting the tables presented in the URL I linked. I've done something similar before with curl to download the html source and then with some clever regexp I can get the table. However this site uses something else to display the table, so I was asking if anyone knows how I would be able to just get a text based form of the table. — Rugby_fan, Jan 18 '16 at 11:57

score 5 · Answer 1 · edited May 23 '17 at 12:37

5

You can use python, R or other code language as you prefer:

Example for python: https://stackoverflow.com/questions/6325216/parse-html-table-to-python-list

Example for R: https://stackoverflow.com/questions/1395528/scraping-html-tables-into-r-data-frames-using-the-xml-package

Here just two short examples from stackoverflow. If you want to iterate the process for various tables you can use a for cycle.

edited May 23 '17 at 12:37

Community

1

answered Jan 20 '16 at 19:55

GGA

275
1
6

If I download the page using urllib2.urlopen for instance, it doesn't contain the tables. I think that there is some interaction required on the page i.e. selecting the correct tab for the table. How do I emulate that interaction? – Rugby_fan Jan 25 '16 at 17:37
You should check out python's mechanize and scrapy. They're much more advanced than scraping HTML manually. – philshem Apr 03 '16 at 16:34

score 2 · Answer 2 · answered Apr 03 '16 at 05:32

In python there are many libraries available for web scraping these are some of the important ones scrapy, Beautiful Soup etc. In php we have simple_html_dom.php which can parse data from websites. You need a good programming skills to scrape websites by these programmes.

The best way for non-programmers would be using some online web scraper such as import.io but let me warn you not many sites work perfectly with this tool.

How to scrape rugby union data from espn.co.uk

2 Answers2