I'm pretty new to data science and want to get my hands dirty. What are some good publicly available data sets to play with?
7 Answers
I found hundreds of links to public data sets here:
https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
- 259
- 1
- 5
There is a public chess set at http://www.top-5000.nl/pgn.htm. You can use it predict the relative piece values in chess and more.
- 111
- 3
There are multiple datasets available in the UCI machine learning Repository. You can find this here
It depends on your interest. As others already suggested www.kaggle.com is a good place to start if you want to solve well defined problems. There is also a good community there You can learn from them. But data here is mostly clean which is not the case usually in real life. I would also suggest http://www.drivendata.org/ and https://www.crowdanalytix.com/ if You like competitions.
If You just want to play around with your new skills find any data source about the topic You are interested in (like https://www.quandl.com/ for financial and economic data) and apply what you learn for any data there.
as another user already said, this is entirely relative; on that note, it is still fun, and also a great way to learn about new datasets, etc. below are some of the more ridiculous datasets that i've come across, including my personal favorite, rat mapping in nyc!:
Rat Information Portal:
http://www.nyc.gov/html/doh/html/environmental/disclaimer.shtml
Percentage of Adults 65+ Who Have Had All of Their Natural Teeth Extracted (by State):
http://kff.org/other/state-indicator/percent-who-had-all-teeth-extracted/
- 11,885
- 4
- 30
- 57
UsingRordatasets, but maybe you need to investigate some topics of your interest: Economy, Stock market, wheather, public data, medical data... – Sep 01 '15 at 18:47