3

This is the site I want to gather data from but I'm really not sure where to start.

http://www.espn.co.uk/rugby/playerstats?gameId=271423&league=271937

philshem
  • 17,647
  • 7
  • 68
  • 170
Rugby_fan
  • 39
  • 1
  • what data are you looking for, team or player data? – magdmartin Jan 13 '16 at 13:03
  • Player data, they often list all players in a table containing various stats i.e. tackles/meters run etc, from the team that played, that's the table I want. – Rugby_fan Jan 13 '16 at 17:24
  • Are you looking for just the displayed data or a lot more? Are you trying to follow the advice from http://opendata.stackexchange.com/questions/5676/rugby-union-data –  Jan 15 '16 at 16:25
  • I'm trying to scrape the data from the espn site using a script rather than manually copy and pasting the tables presented in the URL I linked. I've done something similar before with curl to download the html source and then with some clever regexp I can get the table. However this site uses something else to display the table, so I was asking if anyone knows how I would be able to just get a text based form of the table. – Rugby_fan Jan 18 '16 at 11:57

2 Answers2

5

You can use python, R or other code language as you prefer:

Example for python: https://stackoverflow.com/questions/6325216/parse-html-table-to-python-list

Example for R: https://stackoverflow.com/questions/1395528/scraping-html-tables-into-r-data-frames-using-the-xml-package

Here just two short examples from stackoverflow. If you want to iterate the process for various tables you can use a for cycle.

GGA
  • 275
  • 1
  • 6
  • If I download the page using urllib2.urlopen for instance, it doesn't contain the tables. I think that there is some interaction required on the page i.e. selecting the correct tab for the table. How do I emulate that interaction? – Rugby_fan Jan 25 '16 at 17:37
  • You should check out python's mechanize and scrapy. They're much more advanced than scraping HTML manually. – philshem Apr 03 '16 at 16:34
2

In python there are many libraries available for web scraping these are some of the important ones scrapy, Beautiful Soup etc. In php we have simple_html_dom.php which can parse data from websites. You need a good programming skills to scrape websites by these programmes.

The best way for non-programmers would be using some online web scraper such as import.io but let me warn you not many sites work perfectly with this tool.

Eka
  • 121
  • 2