Most Popular

1500 questions
13
votes
3 answers

What does OpenRefine offer that other data-parsing tools don't?

I see OpenRefine mentioned a lot here, but I don't see it doing much that R and others can't. What capabilities does it offer that I'm not seeing in the promo page that R or other data packages cannot?
Ari B. Friedman
  • 295
  • 2
  • 8
13
votes
2 answers

List of abbreviations and acronyms

I am searching an list of abbreviations,acronyms which should be downloadable as a sql table or json or sth. like that so no api cause it might be not fast enough or has a limit like this one: http://www.abbreviations.com/abbr_api.php Of course the…
Wikunia
  • 335
  • 2
  • 9
13
votes
1 answer

Where can I find massive and high dimensional survival datasets

I am working on developing some high-dimensional survival analysis methods with R, but I do not know where to find such high-dimensional survival datasets. Could anyone tell me where to find such datasets, for examples the data used in: "Predicting…
floodking
  • 231
  • 1
  • 2
13
votes
1 answer

Searching for Open Data Dataset That is No Longer Online

Many questions are posted here in search of a specific dataset that is either no longer online, or has died from linkrot, or a combination of the two. What is the best way to get around this? Or rather, can these datasets be recovered?
albert
  • 11,885
  • 4
  • 30
  • 57
13
votes
2 answers

Dump of WikiLeaks

Does a dump or scrape of WikiLeaks exist? I'm thinking of an equivalent to Wikipedia's database download: http://en.wikipedia.org/wiki/Wikipedia:Database_download So far, I haven't found direct access to its publicly released data. It seems…
szxk
  • 810
  • 6
  • 13
13
votes
7 answers

Is there a free downloadable administrative division database of Germany?

Is there downloadable and freely available database with administrative units of Germany (lands, cities, and if available, streets with zip codes)? In many countries such databases are provided freely by central statistical offices, but for example,…
user139
13
votes
5 answers

Cost of living dataset

I'm trying to compare what the equivalent salary would be between two cities based upon the cost of living in each city. I want to be able to build something like CNN's cost of living…
greenJavaDev
  • 233
  • 1
  • 2
  • 5
13
votes
4 answers

Dataset of sentences translated into many languages

I'm looking for a dataset of human translated sentences. The ideal dataset would look like this: 1, en, The weather is nice today. 1, de, Das Wetter ist heute schön. 1, es, El clima es agradable hoy. 1, el, Ο καιρός είναι καλός σήμερα. ... for as…
philshem
  • 17,647
  • 7
  • 68
  • 170
13
votes
5 answers

Airport / airline data from all over the world

Where can I get a database with airports and possible with (available / closed) runways from all over the world? I am looking for airlines and contact info of managers in decision-making positions at airlines too.
János
  • 899
  • 8
  • 20
13
votes
2 answers

Rocket attacks dataset in Israel and State of Palestine

I'm looking for a dataset listing the rocket attacks in Israel and the State of Palestine with as many following fields as possible: timestamp GPS number of casualties reason for attack (e.g. a pointer to a previous attack) number of articles…
Franck Dernoncourt
  • 7,780
  • 9
  • 39
  • 86
13
votes
4 answers

Releasing old historical/genealogical datasets as open data

I work with a a couple of small non-profit genealogical and historical groups and we are interested in releasing some of the datasets we've compiled over the years as open data. This information is already freely searchable through our online…
Asparagirl
  • 486
  • 4
  • 7
13
votes
1 answer

How to construct a database with the underlying real estate data displayed by Redfin, Zillow, or Trulia?

Regardless of whether the home is for sale, if you type any street address into Zillow, Redfin, or Trulia, they will often tell you the square footage, the last-sold-date, the taxable value, and often some other official information. Here is one…
Anthony Damico
  • 1,480
  • 10
  • 16
13
votes
7 answers

Dataset of domain names

There are many web resources to find domain names (whois.com), and using the WHOIS protocol there are some APIs. Some examples are the unix command line tool jwhois and the python library pywhois. These tools return the full WHOIS record, which…
philshem
  • 17,647
  • 7
  • 68
  • 170
13
votes
4 answers

A dataset of resumes

This is a question I found on /r/datasets. Does OpenData have any answers to add? I'm looking for a large collection or resumes and preferably knowing whether they are employed or not. Does such a dataset exist? Link to reddit post
philshem
  • 17,647
  • 7
  • 68
  • 170
13
votes
2 answers

Clickstream sample dataset

I am looking for some web traffic or clickstream dataset, ideally from an ecommerce website. I like to do some analysis on purchasing pattern if possible. For example: visit duration, conversion, shopping cart abandonment, cross-category shopping,…
Hawk
  • 131
  • 1
  • 1
  • 3