1

enter image description here

I'm modeling the total revenue of sellers in a 1 year period. The distribution plot below shows quantity*price for each seller with outliers eliminated. The outliers were taken out with IQR * 1.5 fences.

What you see below on the x-axis are z-score scaled values.

What is the best way to go about the analysis of this curve? and possible modeling of a regression fit?

  • 1
    Welcome to Cross Validated! In terms of "inferences," the reason why you need to evaluate whether it's a normal distribution and why you felt OK about removing outliers would matter. The distribution of a single variable's values doesn't matter in many situations; see is-normality-testing-essentially-useless?, for example. Consider editing your question to specify the nature of your data and the hypothesis you are trying to test. – EdM Dec 09 '22 at 21:06
  • @EdM thanks for that reply, definitely right. Will edit the question when I find the proper way and response wanted. Thanks for the welcoming! – julian lagier Dec 09 '22 at 21:24
  • For building a regression model to predict total revenue, the distribution of total revenue values isn't of primary importance, and you probably shouldn't have removed apparent "outliers" so soon. Skew--even outliers--in the distribution of total revenue might be due to comparable skew/outliers in the predictors you would include in the regression model. Consider editing this question to describe your data in detail and what you hope to accomplish with your model. That should get you where you want to go faster than will trying to figure out a distribution that happens to fit total revenue. – EdM Dec 10 '22 at 16:24
  • No Gamma distribution, even when affinely transformed, looks like this. – whuber Dec 10 '22 at 17:14

1 Answers1

0

You can use statistical normality tests to check whether your data follows a normal distribution.

The density on the plot doesn't look normal at all as the tail (the right part of the curve) is too heavy. Also, it cuts off abruptly at $-0.1$, while normal distributions tend to spread to the left and to the right more or less evenly. This looks like a Gamma distribution or some distribution with positive support that was shifted by $0.1$ into negative values.

It could also be a mixture distribution: the peak to the left could be a normal distribution (as part of the mixture), while the tail could be a couple of "flat", high-variance distributions.

ForceBru
  • 310
  • Thanks for the answer, that's actually a great inference! I will look into gamma distributions. I ran normality tests and the distribution is not normal! – julian lagier Dec 09 '22 at 20:55
  • This doesn't resemble any Gamma distribution, either. Take to heart EdM's comment and please share with us how you compute this curve and why you are doing so. – whuber Dec 09 '22 at 22:09
  • Well, it might actually be a gamma distribution. This is sales of eccomerce retailers over a period of 1 YR. I will change the question now, but your comments have stirred me to the answer! – julian lagier Dec 10 '22 at 01:58