5

I have been trying to learn more about "Regression Discontinuity". This appears to be a statistical method designed at testing the effectiveness of some sort of intervention. The Wikipedia article (https://en.wikipedia.org/wiki/Regression_discontinuity_design) states that "regression discontinuity design (RDD) is a quasi-experimental pretest-posttest design that aims to determine the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned."

I had the following question: For the Regression Discontinuity framework to be valid - does an intervention have to be present?

As an example - suppose you are interested in studying if neighborhoods with higher median incomes have more hospital-to-population ratios. At first glance, the following analysis options seem possible:

  • Decide some arbitrary cut-off (e.g. median income less than 50k and median income greater than 50k) and perform a hypothesis test to examine if the average hospital-to-population ratios in neighborhoods above and below this cut-off are statistically similar.

  • Assuming linear correlation, you can calculate Pearson's Correlation Coefficient to study the correlation between both variables.

  • You can also fit a standard regression model to study the general effect of median income on the hospital-to-patient ratio.

However, I am interested about performing analysis around some cut-off value (e.g. $50,000).

I have heard many professors on YouTube lectures mention that in real world examples, individuals just below some arbitrary cut-off are likely to similar to individuals just above this same arbitrary cut-off (in all other observable aspects).

Using a similar thought process, it could be thought that neighborhoods with a median income of 49k are likely very similar to neighborhoods with a median income of 51k. Thus, if the hospital-to-population ratio were to suddenly "jump" around the 50k mark - we might have reasons to believe that something special happens at the 50k median income mark that could be worth further investigating.

However, in this example, there does not seem to be an active intervention. I have usually seen Regression Discontinuity presented in examples where students who score above 90% on an exam were (deliberately) awarded a scholarship (i.e. an intervention) and those who scored below 90% were not given a scholarship, and the researchers were interested in studying the effect of receiving this scholarship on whether the graduation rates between both students. In my example, the people in charge of deciding if a hospital should be built in a specific neighborhood were likely not actively making this decision by explicitly consulting the median income of each neighborhood - thus, I am not sure if this can be considered as an active intervention and if Regression Discontinuity is appropriate in such a problem as mine (i.e. median income vs hospital-patient ratio over some arbitrary cutoff).

Can someone please comment on this? In my example that I have outlined - is the Regression Discontinuity design a suitable approach, given this "indirect intervention"?

PS: I found this R package online (https://cran.r-project.org/web/packages/rdmulti/rdmulti.pdf) - apparently Regression Discontinuity can be used to analyze the "jumps" in multiple cut-offs?

Dave
  • 62,186
stats_noob
  • 1
  • 3
  • 32
  • 105
  • 1
    It is hard to see anything in the theory which supposes an intervention. – mdewey Nov 26 '22 at 17:06
  • @ mdewey: thank you for your reply! can you please elaborate on this comment? thanks! – stats_noob Nov 26 '22 at 23:55
  • 2
    I share your skepticism about arbitrary cutoffs based on convenient or “round” numbers. If there are real threshold effects (e.g., a company over a certain size has to report certain data), then the approach seems more reasonable. – Dave Nov 30 '22 at 03:35
  • @ Dave: Thank you for your reply! In general, do you think Regression Discontinuity is not well suited for this problem? – stats_noob Nov 30 '22 at 03:50
  • I am used to thinking of the regression discontinuity design as a longitudinal one. It is, to my understanding, a test of an outcome before and after a key point in time when an intervention occurred. The assumption is that, absent the intervention, levels of the outcome would have been on average the same before and after that point, but that the intervention may have made a difference. Dichotomizing by other model variables would not be applicable in that case, and would figure to hurt precision and power. I am seeing now that the longitudinal type is not the only type of RD design. – rolando2 Dec 01 '22 at 17:10
  • 1
    Assuming linear correlation, you can calculate Pearson's Correlation Coefficient to study the correlation between both variables. You do not have to assume linear correlation for linear correlation to be well defined. You could assume a linear relationship / a linear model, but again that is not necessary. – Richard Hardy Dec 05 '22 at 10:08
  • 1
    @stats_noob I notice that you often offer bounties on questions, but sometimes do not award them, even when there are answers that have received upvotes. How should prospective answer-writers understand this? It seems that it will create situations where an answer-write might spend some hours writing a good answer in anticipation of receiving a bounty, only to find that it is not awarded. – Sycorax Dec 07 '22 at 20:06
  • @ Sycorax: Isn't the bounty automatically awarded to the answer with the most votes? – stats_noob Dec 07 '22 at 20:11
  • Not always. Please familiarize yourself with how bounties work. https://meta.stackexchange.com/questions/16065/how-does-the-bounty-system-work In particular, please read the section "What happens if I feel my question is still unanswered? / What is automatic awarding?" In some cases, not awarding a bounty will cause automatic awarding of half the bounty. In some cases, no bounty will be awarded. – Sycorax Dec 07 '22 at 20:25
  • I've bountied questions and gotten responses that I did not feel completely addressed my concerns. Nonetheless, for someone who takes the time to write a decent answer, I am willing to award the bountry...just seems classy to reward the effort. – Dave Dec 07 '22 at 20:51
  • I always upvote everyone who leaves a comment/answer to my questions! – stats_noob Dec 07 '22 at 20:52
  • @Sycorax Since the answer has been accepted, won’t the bounty be awarded in full once the grace period expires in a few hours? – Dave Dec 07 '22 at 21:08
  • 2
    @Dave Yes, if the answer is still marked as accepted when the grace period expires, half the bounty will be awarded. But if you look at the post history, you can see that the answer was accepted after I wrote my initial comment, but before you wrote your most recent comment. – Sycorax Dec 07 '22 at 21:10
  • 2
    @stats_noob Upvoting answers that are helpful is great. I'm asking why you haven't awarded a bounty on several of your recently bountied questions. A person might make the inference that it is not worth the effort to write a high-quality answer to your bountied questions, because they may not be rewarded with the bounty. For instance, dimitry has not yet received the bounty, and only 6 hours of grace period remain. Why is that? – Sycorax Dec 07 '22 at 21:12
  • please check if the bounty has been awarded - thanks! – stats_noob Dec 08 '22 at 02:15
  • @stats_noob Was it a bounty for $50$ or $100$ points? // Even though the bounty has been awarded, Sycorax raises a legitimate point that it’s not our etiquette to offer a bounty to entice members into responding, receive an answer, and never award the bounty. This is worth keeping in mind in the future. – Dave Dec 08 '22 at 02:24
  • @Dave If you look at the edit history, you can see that the a 50-point bounty was offered. (Why that's not visible from the post activity is mysterious...) – Sycorax Dec 08 '22 at 05:17

1 Answers1

10

You need an intervention/treatment for RD to make sense. The basic idea is that by looking around the cutoff, you are comparing people with similar unobservables, with the only difference coming from the intervention. This is why you see people run various false placebo tests with RD.

Since you are looking for an authoritative source, I recommend A Practical Introduction to Regression Discontinuity Designs: Foundations. On page 3 of the draft version, they write

There are three fundamental components in the RD design—a score, a cutoff, and a treatment.

To use your example, suppose you want to figure out the causal effect of local income on the hospital beds to population ratio. The concern is that people must be paid more to take on riskier or more stressful work because it makes them sicker. Hospitals choose to locate near sicker and wealthier people. So some of the observed relationship between income and hospitals may be because of this omitted job risk-health factor. If you went out and just gave people bags of money, it would have a smaller effect on hospital construction and availability than the richer vs. poorer neighborhood comparison you propose.

There is something called Geographic RD, which is similar in spirit to what you have in mind. Here the running variable is two-dimensional (latitude and longitude). For example, some counties in the US state of Colorado have all-mail elections where voting can only be conducted by mail, and in-person voting is not allowed. In contrast, other counties have traditional in-person voting. Where the two types of counties are adjacent, the administrative border between the counties induces a discontinuous treatment assignment between in-person and all-mail voting. A Geographic RD design can be used to estimate the effect of adopting all-mail elections on voter turnout. This is covered in A Practical Introduction to Regression Discontinuity Designs: Extensions (the second unpublished volume of the pubkushec book I mentioned above). Note that there is still a treatment of different laws here.

dimitriy
  • 35,430
  • @ Dimitriy : thank you for your answer! Just to reiterate - in my original question, the way I have described the problem: Regression Discontinuity is not appropriate because there is no "direct intervention" - correct? – stats_noob Nov 30 '22 at 08:05
  • Yes, that’s right. – dimitriy Nov 30 '22 at 08:07
  • 1
    This is a good answer, but it suggest that the theory requires an intervention to make sense. Regression discontinuity is just fitting different lines to different regions of the data. That idea makes sense without an intervention. It can be used as an interpretable approximation to a more complex function. Piecewise linear splines are essentially the same model with more knots/lines, and no one would say you need interventions corresponding to each knot point. – Eli Dec 04 '22 at 17:57
  • @Eli The point of RD is not to approximate a complex function, but estimate an effect by a gap between lines at the threshold. We don’t even care about the whole line, just the neighborhood around the threshold. PLS would not work here since it forces the pieces to connect, though that could be useful in a kink RD design where we are interested in a change in a slope at the threshold rather than in the level. – dimitriy Dec 04 '22 at 18:21
  • To make it concrete, suppose we are interested in how unions change wages. RD compares post-unionization wages in firms with a 49% pro vote with wages in firms that had a 51% pro vote. We are not just fitting curves on two regions of the data. The threshold is meaningful because the probability of intervention/unionization changes when there is a majority in favor. We are also not trying to approximate a whole curve that speaks to what would happen to wages had a firm with a pro vote of 5% unionized. The two curves are necessary, but you need lots of other stuff before it becomes RD. – dimitriy Dec 04 '22 at 19:00
  • @Eli I get the same impression - regression discontinuity models are simply fitting piecewise linear or piecewise polynomial regression lines to longitudinal data (for example) with the knot of interest at time $t_{knot}$. But as dimitriy stated we're not imposing the continuous restriction on $t_{knot}$, so that we can estimate if there's a "jump" (different intercepts and/or slope) at time $t_{knot}$. – RobertF Oct 17 '23 at 15:17