Finding the parameters of a (possibly) Rayleigh distributed data set

Question

An object is tracked in an experiment I ran. The object's velocities over a 100 time steps are recorded. A model for this object says the velocities should be Rayleigh distributed.

Question 1: For the given data, how can one estimate the parameters of the Rayleigh distribution.

Question 2: After estimating the parameters, how can we calculate the maximum percentage difference between the observed data and estimated model?

The data:

A hack that I tried was to find the mean and standard deviation assuming our data was normal. Knowing scale is not the same thing as standard deviation squared, I used the standard deviation squared as "scale".

Then, I ran the K-S test with two samples: (1) observed data, and (2) the expected values of a Rayleigh distribution with mean and scale (incorrectly as standard deviation) to find the D-max. However, while the D-max is acceptable, the p-values is low. So, I hope that you all can help me find a statistically robust method to find the scale.

Now, it is likely that my data will result in a good fit. However, even if it's just out of academic interest, can we try and solve this problem anyway?

Not critical, but if any of you have an approach to answer the aforementioned questions in Python, that would be of interest to me as well.

P.S.: Just to be sure I wasn't violating any rule, I went through the overview of the site , and it seems like my question is within the rules of the site. However, if there is something wrong, please assist me by making appropriate suggestions.

@Glen_b : Thanks for your reply.
A2: For data that's not normally distributed, there aren't different kinds of QQ-plots, right ( ref1 , ref2 ) ? So, if we want to examine how close our data comes to a Rayleigh distribution we would simply use the QQ-plots, correct? — troisquatre, Jul 01 '18 at 11:03

COOLSerdash · Accepted Answer · 2018-07-01T14:29:43.677

You can use maximum likelihood estimation to estimate the scale parameter $\sigma$ of the Rayleigh distribution.

The maximum likelihood estimator of $\sigma$ is: $$ \hat{\sigma}=\sqrt{\frac{1}{2n}\sum_{i=1}^{n}x_{i}^2} $$

This estimator is biased, however (see Siddiqui 1964). An unbiased estimator is given by:

$$ \hat{\sigma}=\sqrt{\frac{1}{2n}\sum_{i=1}^{n}x_{i}^2}\cdot\frac{\Gamma(n)\sqrt{n}}{\Gamma(n + 1/2)} $$

where $\Gamma$ denotes the gamma function.

Here is how to estimate the scale parameter in R using maximum likelihood:

library(fitdistrplus)
library(extraDistr) # implements the Rayleigh distribution

# The data

x <- c(4.1278, 3, 4.3681, 3.4994, 3.5099, 3.4568, 3.7245, 3.1743, 
2.7477, 4.1865, 3.1257, 4.6311, 4.4171, 3.8296, 3.9558, 4.4084, 
4.2123, 4.4592, 4.3426, 3.7141, 3.2105, 2.5693, 2.0113, 1.9839, 
2.7958, 2.4361, 2.4558, 1.0498, 0.57897, 0.4486, 1.4811, 1.5466, 
1.2266, 1.627, 1.9165, 1.3498, 2.4598, 1.706, 2.3715, 2.3808, 
3.9232, 2.2745, 1.5666, 1.7846, 2.5793, 1.2635, 1.7393, 1.9063, 
2.1778, 2.8, 1.8491, 3.7035, 2.7329, 3.2436, 2.7708, 2.4044, 
2.5396, 2.7072, 2.0353, 3.6457, 2.8089, 2.5456, 2.4234, 1.6, 
2.4024, 2.9129, 4.6792, 3.7958, 2.8101, 1.9567, 1.649, 1.3687, 
1.1426, 1.2341, 0.37162, 0.81565, 1.5374, 1.1496, 1.6362, 2.0498, 
1.6581, 0.32883, 0.4273, 0.89617, 1.4783, 1.2486, 1.783, 3.52, 
4.486, 3.1935, 3.2815, 1.2388, 1.5428, 1.7168, 1.4661, 1.6411, 
3.4185, 2.3762, 3.2665, 3.6832)

fit <- fitdist(x, "rayleigh", start = list(sigma = 1))

summary(fit) # print estimated parameters

Fitting of the distribution ' rayleigh ' by maximum likelihood 
Parameters : 
      estimate Std. Error
sigma 1.919764 0.09598814
Loglikelihood:  -152.7544   AIC:  307.5087   BIC:  310.1139

plot(fit) # inspect fit

The estimated scale parameter is $\hat{\sigma}=1.918$. Inspecting the Q-Q plot of the fit, I'd say that the Rayleigh distribution is probably a suboptimal fit to these data (there are deviations at the upper and lower end of the Q-Q plot).

The unbiased maximum likelihood estimator is 1.922.

Siddiqui, M. M. (1964) "Statistical inference for Rayleigh distributions", The Journal of Research of the National Bureau of Standards, Sec. D: Radio Science, Vol. 68D, No. 9, p. 1007

@COOLSerdash thanks a ton for your help. Providing a reference is particularly useful. I need time to digest and understand your proposed approach, after which I will appropriately vote and provide comments if necessary. — troisquatre, Jul 01 '18 at 12:53
Feel free to delete if this comment ought to be in the Meta site. I wasn't able to find resources to my problem that I could understand. Even the provided Siddiqui reference is not easy to understand as I don't have a background in stats. There are plenty of straightforward resources for normally distributed data, but I wasn't able to find much for Rayleigh. Further, my "hack" approach was clearly wrong. I believe this is a question that could benefit not just me. Lastly, framing a question such that it clearly expresses the objectives and satisfies the not-homework criteria, isn't trivial. — troisquatre, Jul 01 '18 at 13:21
@COOLSerdash you have both $N$ and $n$ in your formula for the unbiased version of the estimator. You probably want to make them all $n$'s. — Glen_b, Jul 01 '18 at 13:26

Finding the parameters of a (possibly) Rayleigh distributed data set

1 Answers1