Most Popular
1500 questions
10
votes
2 answers
What are the most common issues with data cleaning (e.g. outliers, duplicates)? Who has data sets that need to be prepared for analysis?
When preparing data for analysis, I often encountered issues such as outliers, data entries that are logically inconsistent (e.g. age=150/age=-2), duplicates (that are not exactly equal) etc. When integrating data sets from different sources, there…
Elisabeth
- 101
- 5
10
votes
5 answers
Open alternative to weatherbase.com
Does anyone know of an open alternative to weatherbase? I'm looking for what's listed here as "Average Temperature". I'm looking for monthly averages for a city, hopefully for as many cities as possible. This from NOAA is perfect but only for US…
Chris
- 255
- 2
- 7
10
votes
4 answers
Corpus of tagged text (English newspapers or any tagged text)
I'm developing a system to extract tags from text (English) and currently I have no dataset to test the system and evaluate, could someone point me to a source (preferably a free one) thanks.
NOTE:
By Tags I mean if there's an article about let's…
user10492
- 101
- 1
- 3
10
votes
3 answers
Is there a difference between open data and public data?
I was wondering if there were differences between "open data" and "public data"? In this SE group, the two terms seem to be used interchangeably, but I feel that they are not interchangeable.
This excellent page gives a very good definition of open…
Marcus D
- 1,119
- 1
- 9
- 26
10
votes
2 answers
Where can I find U.S. train traffic data?
I'm trying to get a general idea of railroad traffic along particular segments of track—whether it's abandoned, whether it gets one train a month, or 100 trains a day. Is there a source of such data? I've had no luck on the Department of…
Waldo Jaquith
- 363
- 2
- 7
10
votes
3 answers
Where can I find Historical GIS datasets?
I've been looking for a while for Historical GIS datasets. In a lot of places (also in answers here at StackExchange), people are referring to a great dataset at ThinkQuest, which contains detailed shapefiles for many years inbetween 2000 BC and…
carelfransen
- 113
- 4
10
votes
2 answers
Publishing Weather Data under Creative Commons / Peer Production License
I am from a volunteer at FSHM (Free Software Hardware Movement, Puducherry, India).
As a part of our community project, we have been working on building a weather station using Freedom Hardware. We are experimenting various things, and are building…
VoidSpaceXYZ
- 101
- 4
10
votes
4 answers
API to get Wikimedia Commons categories that are near a particular latitude/longitude
I have a coordinate, and I want to know what Commons categories are nearby.
For instance, for 40.7576,-73.9857 I would get Category:Times Square and probably Category:Broadway and a few others nearby.
Is there an API that gives this?
If not, is…
Nicolas Raoul
- 8,426
- 5
- 28
- 61
10
votes
3 answers
European crime data with spatial coordinates
I am looking for prostitution arrest and drug arrest data for a couple of major cities in Western Europe. So cities such as Paris, Rome, Rotterdam, Amsterdam, Berlin, etc. Ideally the data would have dates for each arrest, as well as latitude and…
krishnab
- 459
- 2
- 12
10
votes
2 answers
ISO 3166-2 codes to Olson Time Zone Codes
Just curious, is it possible to link ISO 3166-2 codes to Olson Time Zone Codes. For example the US has these Olson Time Zone Codes:
US United States America/Adak
US United States America/Anchorage
US United States America/Boise
US United…
cs0815
- 507
- 3
- 15
10
votes
1 answer
Data about completed prison term by country
I'm looking data (preferentially free and open data but I can afford small payments) by country worldwide on the percent of population that have completed a prison sentence or were sentenced.
I only found statistics about people who actually are in…
nelruk
- 323
- 1
- 10
10
votes
1 answer
Airline check for availability data
Where can I find a free web service, or data available in XML format, to check for flights availability? Something like OpenFlights to which provides airport data.
Lola Loulita
- 201
- 1
- 3
10
votes
6 answers
Searching for list(s) of babynames containing huge (10k+) amounts of unique names
I am looking for datasets or huge lists of human forenames. There's plenty of websites that curate lists of names. But none of these seems to offer functionality to export either raw data/lists of names, nor to list more than a few dozen names per…
dot_Sp0T
- 203
- 2
- 6
10
votes
2 answers
What is the status of OKFN's Open Product Data project?
Open Product Data (also known as Product Open Data) is a project run by OKFN. Its main goal is to build a public database of product data. There are already several questions and answers related to this project here on Open Data Stack Exchange.…
Patrick Hoefler
- 5,790
- 4
- 31
- 47
10
votes
5 answers
Daily electricity usage dataset
Where can I find a dataset of daily electricity usage from a zone or country anywhere in the world?
Anywhere = i accept any dataset from anywhere in the world, but at least a daily usage
andrepcg
- 201
- 2
- 4