Most Popular

1500 questions
7
votes
3 answers

Where can I find datasets of mailing list archives of open source software?

I plan to mine the mailing list archives of any open source software to answer interesting research questions. How can I request for the data? What is the procedure? Are any small datasets of the mailing list archives available to perform a test…
Hemaa mathavan
  • 315
  • 1
  • 10
7
votes
2 answers

Medical Terminology in Patient Medical Records - Public Data Sets

I am interested in sample data of real patient medical records (anonymized or demographics removed completely) for the purpose of running through NLP system - specifically diagnoses, admissions and progress notes - anything where medical terminology…
DataMania
  • 173
  • 5
7
votes
1 answer

Where can I find data for Formula 1 races and race cars

I am looking for dataset on the outcomes of Formula 1 races, what cars partook in the race, and the specifications of these cars (such as type of tire used; type of engine; width, length and other parameters that describe the shape of the car). If…
Ragnar
  • 235
  • 1
  • 4
7
votes
5 answers

Which format (CSV, JSON, Atom, RSS?) should events data be published in?

I'm developing recommendations for local councils publishing event listings. Compared to other kinds of data, events data seems very likely to be used by web and mobile apps (as opposed to downloaded for analysis), and is inherently chronologically…
Steve Bennett
  • 850
  • 5
  • 12
7
votes
1 answer

19th Century Patent Data

Any 19th century patent data with geo-referenced locations of the submitter? So, for each city or county, it lists the number of patents filed in a given year/month etc?
LJB
  • 639
  • 1
  • 4
  • 13
7
votes
4 answers

Best practices for huge explorable linked data directories

The main requirements for our open data directory are: XML-based by default with ability to switch to JSON. It's important the easy way to make it human readable with just linking to XSL. All the data must be reachable by robot from the single…
7
votes
4 answers

Dataset for emotion classification

I'm looking for a dataset for moods or emotions (Happy, Angry, Sad) classification. That's to classify the sentiment of a given text. I would like to use Naive Bayes classifier for this analysis. Not only to train and test the model with the…
SOURAV
  • 193
  • 2
  • 6
7
votes
3 answers

Any APIs available that provide data of Indian vehicles?

I was looking for APIs that provide current latest data of vehicles (2-wheelers/4-wheelers) in India. I found quite a few but none had data of Indian vehicles. I looked at this question which is almost same as mine but couldn't get any help…
Amogh Natu
  • 171
  • 1
  • 1
  • 3
7
votes
2 answers

Batch conversions of lat, lon to US census tract?

I have 700,000 latitude/longitude pairs I need to convert to US Census tracts. Is there a free API that offers this in batches? So far the only option I have found is from the FCC and does not state a rate limit but has the form of a 1-1 call to…
sunny
  • 292
  • 3
  • 5
7
votes
2 answers

Can I get 1000 images from any image search engine for education/research purpose?

I'm researching on machine learning system that learns to recognize items based on image search results from search engine. After I searched around I found that Google and Bing Image Search api allow only small number of images and doesn't allow…
PtLearner
  • 73
  • 2
7
votes
2 answers

GIThub to share a set of SPARQL queries

I am using github to share a set of SPARQL queries: http://www.boisvert.me.uk/opendata/sparql_aq+.html?file=specific%20sensor.txt Currently the simple work allows end-users to access queries stored on the github repository, but ultimately I want to…
boisvert
  • 209
  • 1
  • 7
7
votes
1 answer

Database of adult sites

I would like to ban all adult content in my DNS/VPN service and I wouldn't like to outsource this. Is there a list of URLs I can use as a blacklist in my routers/servers? Format doesn't matter and if it would be actively maintained that would be…
CleanTheWeb
  • 71
  • 1
  • 4
7
votes
2 answers

Are there any open datasets with technical specifications for photographic equipment?

I want to set up a free service where photographers buying or searching for discontinued photographic equipment can reference for technical specifications. For instance, most eBay-listings of second-hand equipment are without any tech. specs…
user135
7
votes
2 answers

Creating data from web tables with import.io failed - other tools?

I found this site with solar system moon orbit data: Table of moons in solar system. I ran that through the http://import.io site and it only came up with Jupiter data. Is there a more comprehensive tool that will identify multiple tables and…
John Carlson
  • 231
  • 1
  • 3
7
votes
2 answers

Database of smartphone sensor data

I'm working on a machine learning project for classifying activity level (walking, running, sitting etc) based on smartphone accelerometer, gyroscope, and gps data Of course I can just collect this data myself but this is very time consuming. I'm…
Simon
  • 171
  • 4