1

Cross-reference overlapping question: "Interpretation of a scatter plot: an unclear correlation"

I have 3 scatter plots, where 3 variables are plotted against each other as follows:

  • Plot 1: Distance vs. Time
  • Plot 2: Distance vs. A time-related quantity
  • Plot 3: Time vs. A time-related quantity

At the top row, plots are shown in a lin-lin scale, whilst at the bottom row, plots are depicted in a log-log scale:

enter image description here

By eye, those three variables look scarcely correlated among each other, maybe with the exception of "Time vs. A time-related quantity" in Plot 3. To support my observations I calculated 3 correlation coefficients, i.e. the Pearson, Spearman and Kendall's ones, resulting as follows:

  • Plot 1: $\rho_P = 0.26, \rho_S = 0.77, \rho_K = 0.66$
  • Plot 2: $\rho_P = 0.26, \rho_S = 0.89, \rho_K = 0.77$
  • Plot 3: $\rho_P = 0.90, \rho_S = 0.86, \rho_K = 0.74$

According to References (1) and (2), due to the presence of outliers at the bottom right corners of Plot 1 and Plot 2, and at the top right corner of Plot 3, (at least) the Pearson correlation would be highly influenced by outliers, leading to a misleading interpretation of the results. Therefore, I would not rely too much on the Pearson correlation, but mostly on the Spearman and Kendall's ones. Despite my initial idea of a scarce correlation among the 3 variables, it looks like that the Spearman and Kendall's correlations have instead moderate to high values. Eventually, my questions are:

  1. Am I interpreting those results in a wrong way?
  2. Should I keep the outliers in the correlation coefficients calculations, or get rid of them? (I would like to keep them if possible)
  3. If I have a very weak correlation among the three variables (as I suspect), what kind of correlation do I have for plot 1 and 2 ? For example, about Plot 1, where the points are arranged around a vertical line, the correlation should be negligible, i.e. a "no correlation" type (About Plot 3, to me, it looks like a positive correlation, although most of the points are concentrated around the bottom left corner)

References:

(1) Susan J. Devlin, R. Gnanadesikan, J. R. Kettenring (1975) Robust estimation and outlier detection with correlation coefficients, Biometrika, Volume 62, Issue 3, Pages 531–545, https://doi.org/10.1093/biomet/62.3.531

(2) Rand Wilcox (2012) Chapter 9 - Correlation and Tests of Independence, Editor(s): Rand Wilcox, In Statistical Modeling and Decision Science, Introduction to Robust Estimation and Hypothesis Testing (Third Edition), Academic Press, https://doi.org/10.1016/B978-0-12-386983-8.00009-3.

paul
  • 33
  • Should cross-reference overlapping question https://stats.stackexchange.com/questions/491901/interpretation-of-a-scatter-plot-an-unclear-correlation I have the same reaction as in the previous question. Logarithmic scales will help. – Nick Cox Oct 14 '20 at 17:56
  • Yes, true, question correctly edited :) – paul Oct 14 '20 at 18:19
  • could you explain more the axis "a time related quantity". Is this a measure that is identical across all the graphs shown? what is the measure exactly – develarist Oct 14 '20 at 18:25
  • Hi @develarist :) unfortunately, I cannot say too much about that "a time related quantity".. However, (i) it is a time-dependent variable, (ii) adimensional and (iii) yes correct, it is identical across all the shown pictures. Moreover, all the three variables here considered are one-dimensional arrays with same length. Actually they have around 50 values each, but since many are zero, we do not see them in the shown plots. Hope this can be useful :) – paul Oct 14 '20 at 18:38
  • If you have some zeros, how did you manage to plot on logarithmic scales in your previous answer? Or this one? Are you just ignoring the zeros? – Nick Cox Oct 14 '20 at 19:10
  • Yes, I ignored the (0,0) pairs when occured.. – paul Oct 14 '20 at 19:20
  • Not good to hear. People disagree on what to do instead, but log scales are suspect when zeros are present and genuine. I don't much like log(1 + measure) but it's a possibility. – Nick Cox Oct 15 '20 at 08:47
  • Thanks a lot Nick for your comment! However, I think I did not understand exactly the meaning of your statements (i) "People disagree on what to do instead" and (ii) "but log scales are suspect when zeros are present and genuine"... About (i), did you mean that people disagree on what to do with (0,0) pairs, i.e. if to ignore them or not? And about (ii), what do you mean when you say that log scales are suspect if there are zeros? So, without zeros, are the log scales fine and therefore not suspect? What is the issue about the presence or not of zeros? Thanks & sorry for my long comment! – paul Oct 19 '20 at 15:20

0 Answers0