Most Popular

1500 questions
10
votes
2 answers

What are the most common issues with data cleaning (e.g. outliers, duplicates)? Who has data sets that need to be prepared for analysis?

When preparing data for analysis, I often encountered issues such as outliers, data entries that are logically inconsistent (e.g. age=150/age=-2), duplicates (that are not exactly equal) etc. When integrating data sets from different sources, there…
Elisabeth
  • 101
  • 5
10
votes
5 answers

Open alternative to weatherbase.com

Does anyone know of an open alternative to weatherbase? I'm looking for what's listed here as "Average Temperature". I'm looking for monthly averages for a city, hopefully for as many cities as possible. This from NOAA is perfect but only for US…
Chris
  • 255
  • 2
  • 7
10
votes
4 answers

Corpus of tagged text (English newspapers or any tagged text)

I'm developing a system to extract tags from text (English) and currently I have no dataset to test the system and evaluate, could someone point me to a source (preferably a free one) thanks. NOTE: By Tags I mean if there's an article about let's…
user10492
  • 101
  • 1
  • 3
10
votes
3 answers

Is there a difference between open data and public data?

I was wondering if there were differences between "open data" and "public data"? In this SE group, the two terms seem to be used interchangeably, but I feel that they are not interchangeable. This excellent page gives a very good definition of open…
Marcus D
  • 1,119
  • 1
  • 9
  • 26
10
votes
2 answers

Where can I find U.S. train traffic data?

I'm trying to get a general idea of railroad traffic along particular segments of track—whether it's abandoned, whether it gets one train a month, or 100 trains a day. Is there a source of such data? I've had no luck on the Department of…
Waldo Jaquith
  • 363
  • 2
  • 7
10
votes
3 answers

Where can I find Historical GIS datasets?

I've been looking for a while for Historical GIS datasets. In a lot of places (also in answers here at StackExchange), people are referring to a great dataset at ThinkQuest, which contains detailed shapefiles for many years inbetween 2000 BC and…
carelfransen
  • 113
  • 4
10
votes
2 answers

Publishing Weather Data under Creative Commons / Peer Production License

I am from a volunteer at FSHM (Free Software Hardware Movement, Puducherry, India). As a part of our community project, we have been working on building a weather station using Freedom Hardware. We are experimenting various things, and are building…
VoidSpaceXYZ
  • 101
  • 4
10
votes
4 answers

API to get Wikimedia Commons categories that are near a particular latitude/longitude

I have a coordinate, and I want to know what Commons categories are nearby. For instance, for 40.7576,-73.9857 I would get Category:Times Square and probably Category:Broadway and a few others nearby. Is there an API that gives this? If not, is…
Nicolas Raoul
  • 8,426
  • 5
  • 28
  • 61
10
votes
3 answers

European crime data with spatial coordinates

I am looking for prostitution arrest and drug arrest data for a couple of major cities in Western Europe. So cities such as Paris, Rome, Rotterdam, Amsterdam, Berlin, etc. Ideally the data would have dates for each arrest, as well as latitude and…
krishnab
  • 459
  • 2
  • 12
10
votes
2 answers

ISO 3166-2 codes to Olson Time Zone Codes

Just curious, is it possible to link ISO 3166-2 codes to Olson Time Zone Codes. For example the US has these Olson Time Zone Codes: US United States America/Adak US United States America/Anchorage US United States America/Boise US United…
cs0815
  • 507
  • 3
  • 15
10
votes
1 answer

Data about completed prison term by country

I'm looking data (preferentially free and open data but I can afford small payments) by country worldwide on the percent of population that have completed a prison sentence or were sentenced. I only found statistics about people who actually are in…
nelruk
  • 323
  • 1
  • 10
10
votes
1 answer

Airline check for availability data

Where can I find a free web service, or data available in XML format, to check for flights availability? Something like OpenFlights to which provides airport data.
Lola Loulita
  • 201
  • 1
  • 3
10
votes
6 answers

Searching for list(s) of babynames containing huge (10k+) amounts of unique names

I am looking for datasets or huge lists of human forenames. There's plenty of websites that curate lists of names. But none of these seems to offer functionality to export either raw data/lists of names, nor to list more than a few dozen names per…
dot_Sp0T
  • 203
  • 2
  • 6
10
votes
2 answers

What is the status of OKFN's Open Product Data project?

Open Product Data (also known as Product Open Data) is a project run by OKFN. Its main goal is to build a public database of product data. There are already several questions and answers related to this project here on Open Data Stack Exchange.…
Patrick Hoefler
  • 5,790
  • 4
  • 31
  • 47
10
votes
5 answers

Daily electricity usage dataset

Where can I find a dataset of daily electricity usage from a zone or country anywhere in the world? Anywhere = i accept any dataset from anywhere in the world, but at least a daily usage
andrepcg
  • 201
  • 2
  • 4