5

We are looking for data sets which are divided into a treatment and control group and where a "treatment effect" can be identified.

It is important only that the sample is "large", since we want to be able to run computations on sub-samples. "Large" is in this context simply defined as " even with a sub-sample of the data, the main treatment effect can be identified."

  • The field from which the data stems could be any (e.g. medicine, economics, biology, pol sci).
  • "Treatment" should be random, binary and ideally without any stratification (simply i.i.d Bernoulli)
  • It is fine (and even desired) if this data has been studied extensively in previous studies.

The more independent data sets we could get the better.

Update/Edit: I feel I need to add that I placed the bounty after the first two answers were given, with the intention of rewarding additional answers!

sheß
  • 1,179
  • 5
  • 24
  • 1
    Browse your favorite economics journal for RCTs. There've been plenty over the last 15 years or so and most top econ journals have open data policies. – RoyalTS Nov 18 '15 at 00:30
  • Not really, they are hardly ever i.i.d. Bernoulli and usually employ some form of stratification. Also sample sizes tend to be rather small. If you know a counter-example, please let me know. – sheß Nov 18 '15 at 00:50

2 Answers2

6

If you can relax your IID Bernoulli assumption, you might check out the Tennessee STAR dataset. An R version is available in the mlmrev package. Further details: https://cran.r-project.org/web/packages/mlmRev/

4

The ReplicationWiki (that I founded) lists 3 RCTs with accessible data, and you find some further studies if you search for "randomized".

What is large for you?

Jan Höffler
  • 343
  • 1
  • 5
  • 1
    "large" for our purpose is loosely defined by "being able (power-wise) to run treatment effect regressions on randomly drawn subsets of the data". How exactly this will be done is still to be determined. Out of the blue I'd say at least 1'000 units of treatment assignment. But obviously that hinges on several factors. Great resource btw – sheß Feb 08 '16 at 14:00
  • Would be greatly appreciated if you could register and add the information you find if you try out the datasets. Size and description of dataset would be a valuable contribution but just endorsements by joining are also already helpful so that we can expand the database. – Jan Höffler Feb 08 '16 at 19:24
  • Glad to see that the W4G Rating Bar is still useful somewhere :-) – Franck Dernoncourt May 08 '16 at 17:41
  • It's only useful if it is used, so please just do it! – Jan Höffler May 08 '16 at 17:43