0

Hi guys I have a very strange result

My Spearman correlation says there is no correlation between temperature and number of users (alpha 5%). Data is autocorrelated, non normal, heteroskedastic. That is why I go for non parametric

Yet if I do the Theil-Sen estimator, I get a significant slope

enter image description here

Which test should I trust ?

Output Spearman

   cor.test(Anzahl_Nutzer,Niederschlagshöhe,method = c("spearman"))


   S = 167870000, p-value = 0.1253
   alternative hypothesis: true rho is not equal to 0
   sample estimates:
          rho 
    0.04805172

Output Theil Sen

fit = mblm(residuals_differenced~Diff_Niederschlagshöhe_7Tage,total, repeated = TRUE)
Coefficients:
                             Estimate     MAD V value Pr(>|V|)   
(Intercept)                    -8.299 303.344  236866  0.08426 . 
Diff_Niederschlagshöhe_7Tage   17.610 111.736  290750  0.00332 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1402 on 1024 degrees of freedom
Neon67
  • 69
  • Both tests ask different questions of the data. Additionally, it is not strange for one statistical test to have more power than another even when they ask the same question of the data. – Heteroskedastic Jim May 13 '18 at 22:36
  • You should specify exactly which function (from which package) you're calling so we can see what test is actually being applied. A small reproducible set of data would make for a comparison everyone can discuss in detail. – Glen_b May 14 '18 at 05:07
  • mblm für Theil-Sen, und cor.test für Spearman – Neon67 May 14 '18 at 09:49
  • Theil's regression is linked to Kendall's tau. The implication would be that Kendall's tau and Spearman's rho would provide differing results, where Kendall's tau would be significant and Spearman's rho would not. Generally, Kendall's tau has somewhat better properties than Spearman rho. Tau is somewhat more efficient and robust. Rho is a biased estimator of the population correlation. However, Spearman's rho has a similar interpretation to Pearson's rho, while Kendall's tau does not. You should look at both rho and tau and how they are constructed to determine which is more appropriate. – Dave Harris May 17 '18 at 21:09

1 Answers1

1

Because the data are autocorrelated, none of the tests you do have their nominal properties. That is, your p-values will be wrong in any case.

However, the issue you raise about the difference would remain even if the autocorrelation issue were not present, so it's worth answering as if that were not at issue.

Theil-Sen regression corresponds to the slope estimate that would leave the Kendall correlation between residuals and x at 0.

That is, normally I'd expect to test whether the population version of a Theil-Sen regression coefficient is different from 0 using a test of Kendall's tau on (x,y).

Without knowing what function you called (as the post stands at this moment), I can't know for sure that's what test was used but it would be the obvious candidate.

A test of the Kendall correlation will not generally yield the same p-value as a test of the Spearman correlation and it's easily possible to get samples that will reject for one and not for the other:

Plot of p-values for Kendall and Spearman correlations

This simulation was for samples with n=100.

The plot has been truncated to only show p-values between 1-10% on both axes. Axes are log-scaled. The plot shows a tight spread about a straight diagonal upsloping line -- indicating that p-values tend to be pretty similar for both tests for the sort of data I generated -- but there are quite a few samples that are significant at 5% for one test and not for the other test, in both directions.

Particular kinds of pattern for the relationship between x and y can produce substantial deviation in the values of the correlations, and the p-values can sometimes tend to differ quite a bit - the two tests are more sensitive or less sensitive to different kinds of pattern.

Consequently the result you see is not necessarily surprising.

If you want a slope estimate (and thereby, test and CI) that directly corresponds to the Spearman correlation, this is relatively straightforward for a root-finding algorithm.

Glen_b
  • 282,281
  • I edited the first post,I can't give the data though – Neon67 May 14 '18 at 09:54
  • You still don't seem to say what package the function is from. Is the mblm function you call the one in the package called mblm or some other function called mblm? Note that I didn't ask for the same data you're trying to do this on -- that's too large in any case. Any small data set with the same issue would suffice; it typically helps get more/better answers more quickly. If you're not able to produce one, don't worry. – Glen_b May 14 '18 at 17:48
  • its from the package mblm. Unfortunatly I can't give the data, even a sample – Neon67 May 15 '18 at 08:21
  • I thought Spearman Korrelation didn't care about autocorrelated data, as it is non parametric ? – Neon67 May 15 '18 at 08:24
  • The distribution of the Spearman correlation coefficient is certainly affected by autocorrelation in the two series. More generally "nonparametric" doesn't suggest that a procedure is unaffected by dependence - one needn't even be very mathematically inclined to see it, since this is easy to check via simulation. (If you have a book that suggests that it is unaffected by autocorrelation, I'd love to know what it actually says about it, such as whether it suggests some particular conditions are involved -- for example, it may not matter if only one of the two series have autocorrelation) ... – Glen_b May 15 '18 at 10:26
  • ... ctd however I don/t think that would have much to do with it being nonparametric – Glen_b May 15 '18 at 10:44
  • I thought non parametric means the data don't have to follow assumptions at all ? – Neon67 May 15 '18 at 11:35
  • Not so. In effect nonparametric relates to the number of parameters in some model. e.g. in the case of procedures like Spearman correlation or Theil regression it refers to the number of parameters in the model for a distribution (of the data or the errors). If you can count how many parameters are in your full model before you see the data (i.e. if it's finite-parametric) then it's called parametric; if you can't (i.e. if the count is not fixed and may be potentially infinite) then it's non-parametric. ... ctd – Glen_b May 16 '18 at 00:00
  • ctd... For Theil regression, we don't have a fixed assumption about the distribution of values about the line - the test should work the same (in terms of significance level, not power) for any continuous distribution. However, we certainly have assumptions in Theil regression, and in some cases the remaining assumptions may matter a lot. For a simpler example, both t-tests and Wilcoxon-Mann-Whitney tests are affected by autocorrelation in the data, and across a range of cases the WMW is a bit more sensitive to that than the t-test is. – Glen_b May 16 '18 at 00:01
  • This is not a matter in which you need to trust any person's word or even your own thoughts; simple simulation resolves such questions rapidly without any (potentially complicated) algebraic manipulations. It's good practice to doubt all your own ideas and check which conceptions hold up to a serious attempt to make them fail (since the ones that fail were clearly misconceptions; the others might still be, but a vigorous and wide-ranging investigation may at least establish that there's a lot of situations where an idea is at least close to right). – Glen_b Aug 03 '23 at 03:15