2

As time goes by I have learned of more and more ways that correlations can be spurious and more and more tests and correction procedures intended to avoid taking such correlations as meaningful. My question concerns whether either of two common correction procedures are sufficient as applied to economic time series with the usual characteristics of such series.

Suppose I have two highly correlated economic time series each of which is approximately stationary (after differencing if need be) but with some internal time structure such as auto correlation. Suppose, moreover, that there is a plausible story suggesting a casual relationship between the two, but, unknown to me, there is no real causal relationship between these series, direct or indirect. Will either of the following procedures, without more, generally reveal the spurious nature of the relationship?

  1. If I estimate a simple linear model of one on the other, but subject it to a lasso penalty with a cross-validated shrinkage coefficient, will the penalty usually shrink the coefficient to roughly zero?

  2. If I run a standard error-correction model of one variable on the other, can I assume that the coefficients on the level and change of the dependent variable will show up as insignificant?

I am not asking about pathological cases. Obviously any test can be defeated by a sufficient coincidence in the random components of the variables. My question is, can I trust such results to the (admittedly limited) extent that I should generally accept significance levels as evidence of a true relationship? Or are there additional tests beyond these that are required before I should take an apparent relationship between two time series seriously?

andrewH
  • 3,117

1 Answers1

1

The qualifier spurious in spurious correlation comes from the subject-matter interpretation of the observed relationship, not the probabilistic one. Probabilistically, spurious correlation is as good as nonspurious correlation. As Ben writes in this thread,

it is not the correlation that is spurious, but the inference of an underlying (false) causal relationship. So-called "spurious correlation" arises when there is evidence of correlation between variables, but the correlation does not reflect a causal effect from one variable to the other. If it were up to me, this would be called "spurious inference of cause", which is how I think of it.

Within a given non-causal model, you will not be able to distinguish one type of correlation from another. Thus attempting to get rid of it via penalized estimation or a non-causal modification of the model does not make sense. What could make sense is building a causal model and making inference on causal relationships rather than probabilistic ones.

See also "Spurious relationships: flavours, terminology" for a brief overview of types of spurious regressions.

Richard Hardy
  • 67,272
  • Thanks Richard. This is a good answer to my question, though it does not help much with my underlying confusion. When I am estimating statistical models, it is pretty much always because because I hope they will shed light on a causal relationship. And I think that is quite generally if not absolutely universally true. But the only really valid things I feel I have learned about the inference of causal relationships from statistical ones are the ways that they fail to be true. Yet, if this inference is never valid, why does it seem to be a primary, maybe the primary, method of science? – andrewH May 01 '20 at 20:20
  • Thank you. The fundamental problem in causal analysis is that one can never infer causal relationships from data alone. One necessarily needs causal assumptions. So causal assumptions (and data) in, causal inference out. Then the question is, how justified the assumptions are. If the data is coming from a randomized experiment where the effect of interest is unlikely to be confounded by other variables, the assumptions are relatively plausible. Alternatively, one can sometimes take advantage of natural experiments. – Richard Hardy May 02 '20 at 06:11
  • There are also other tricks to achieve a setting in which the assumptions are plausible. The answer to the above-mentioned question informs about the degree of validity of causal inference. So perhaps you are right that inference is never valid in the sense of 100% valid, but sometimes we can achieve results that are a sufficiently close approximation of the truth. – Richard Hardy May 02 '20 at 06:13