-1

I have two dependent variables (soccer dataset) that I'm interested in. They have the following skewness and kurtosis:

  • Variable A: % of minutes played --> Skewness: 0.145 | Kurtosis: -1.03
  • Variable B: Market value development --> Skewness: 7.1 | Kurtosis: 156.76

How can I deal with these high results? I want to analyze correlations and perhaps also conduct regression analyses.

Predictive variables are, for example, time spent at a club or position. enter image description here

enter image description here

  • Use robust metrics, would be one option. – user2974951 Oct 03 '22 at 11:53
  • Could you elaborate on that please? – Lasnik23 Oct 03 '22 at 11:55
  • 2
    Please add more information: What are these variables A and B? What are the predictor variables? What is the domain? What is the goal of your analysis? Plots would be very helpful too. – dipetkov Oct 03 '22 at 12:00
  • Thanks for pointing that out. I added some information. Does that help? – Lasnik23 Oct 03 '22 at 12:06
  • Yes: Variable A (% of minuted played) is a proportion and so linear regression is not appropriate for it. For variable B (market value development) linear regression might be an appropriate model. We can't tell from the histogram. Regression makes assumptions about the distribution of the residuals, not about the (marginal) distribution of the outcome. – dipetkov Oct 03 '22 at 12:11
  • Thanks. Does this influence my correlation analyses in any way? – Lasnik23 Oct 03 '22 at 12:24
  • You'll need to provide more details about your correlation analysis. – dipetkov Oct 03 '22 at 12:40
  • what kind of details? – Lasnik23 Oct 03 '22 at 12:41
  • Please add new info in comments as an edit to the post, we want posts to be self-containes, comments are ephemeral and not read by many – kjetil b halvorsen Oct 04 '22 at 16:12

1 Answers1

0

A [slightly facile] answer is that correlation is a number. If you want that number, you can calculate it with these data, or other data. (It may benefit you to read our related thread: Pearson's or Spearman's correlation with non-normal data.) A different question is whether the assumptions are met to test the point estimate against some null value (e.g., $0$), or more generally to form confidence intervals. There, not necessarily, but there are many ways to test correlations and some may be fine. On the other hand, it may be that you actually want, or should want, some other measure than correlation. If so, what you should do depends on what you want and why.

If you want to conduct regression analyses, you'll need a model that is appropriate for the data. As @dipetkov notes, "Regression makes assumptions about the distribution of the residuals, not about the (marginal) distribution of the outcome" (cf., my answer to: What if residuals are normally distributed, but y is not?). For percent time played, that's presumably a beta regression; for market value change, I don't know.

  • Thanks. I learned I should use Spearman's correlation as my data is not normally distributed. After plotting some scatterplots, I realized that some variables also show a monotonic relationship.

    You mentioned that I might want some other measure than correlation. What measure would I need for the following: I would like to find the min value for "% played" (IV in this case) so that the market value (DV) increases?

    You also mention regression analysis... Does that make sense if I couldn't find any correlation?

    – Lasnik23 Oct 04 '22 at 10:04
  • I read more about the topic and came across the distinction between parametric and non-parametric statistics. In my understanding, I clearly have non-parametric data as it is not normally distributed and there are no linear relationships between variables.

    Is this as uncommon as it seems to me? I have looked into a bunch of studies but they all seem to use parametric statistics. Is it reasonable for me to pursue non-parametric tests like Kruskal-Wallis instead of ANOVA? Or am I missing some important consideration here?

    – Lasnik23 Oct 04 '22 at 10:05
  • @Lasnik23, there are too many questions here & they are too fundamental. You should work with a statistical consultant. It is not necessarily true that you "should use Spearman's correlation as my data is not normally distributed". There is no such thing as nonparametric data. Etc. – gung - Reinstate Monica Oct 04 '22 at 13:08
  • Understood. Do you have any recommendation where I can read up on that myself? – Lasnik23 Oct 04 '22 at 14:31
  • @Lasnik23, there's lots of good information on the site. Pretty much all of those issues have been covered well multiple times. But you'll have to read a lot & you'll need to search around to find stuff. – gung - Reinstate Monica Oct 04 '22 at 16:32