Test whether two separate Weibull distributions describe the data better than a single Weibull distribution

Question

My question is very similar to this one, but with the Weibull distribution replacing the Poisson distribution.

Let's say I am analyzing the distribution of times between failures for an engine, with data over an entire calendar year. From past experience, I know that the Weibull distribution is appropriate here. I want to test whether or not the distribution in December is different from the distribution earlier in the year.

In other words, is it more likely that there is a single set of Weibull parameters that generated the entire year's data; or is it more likely that there are two sets of parameters, one for January-November, and another for December? How can I answer this question?

Option 1:

Fit a Weibull distribution to the entire year's data, and find the likelihood. Call this Model 1.
Fit one Weibull distribution to the data from January to November, and fit another one to the data from December. This is similar to a regression model with an indicator variable for "December". Call this Model 2.
Somehow compare the log-likelihood/AIC values for Model 1 vs Model 2. From this post it seems like I could get the log-likelihood for Model 2 by just adding the individual log-likelihoods of each Weibull fit.

Option 2:

Fit a Weibull distribution to January-November data.
Do a likelihood ratio test to compare two separate fits on December data. The first fit will be unrestricted, while the second will be restricted to have the same parameter values as those found in Step 1.

If the likelihood ratio test is not significant, conclude that there is no need for a separate fit for December data.

Would either of these approaches work?

I suspect that Option 1 is better, because it is asking whether there is some common vector of parameters such that the estimated parameters in each time period could be different solely due to random variation. On the other hand, Option 2 seems to ask, "Is the estimated parameter vector in the first time period also the best-fit parameter vector for the second time period?", which is a bit too specific.

score 2 · Answer 1 · answered Sep 14 '21 at 14:30

Let's call the MLEs $\theta_1$ (best fit to Jan-Nov), $\theta_2$ (best fit to December), $\theta_{12}$ (best fit to full year).

Your option 1 is indeed the classic formulation of the likelihood ratio test. The restricted model (Model 1) $\{\theta_{12}, \theta_{12}\}$ (i.e. using the same parameters for the whole year) is nested within the full model (Model 2) $\{\theta_1, \theta_2\}$ (best parameters for each time period). You can test $-2 \Delta L$ (twice the negative log-likelihood difference between Model 1 and Model 2) against a $\chi^2_2$ distribution (the unrestricted model has 4 parameters, shape and scale for each period; the restricted model has only 2). (You are correct that the log-likelihood for Model 2 is indeed the sum of the log-likelihoods for the sub-models, assuming that the observations from each period are independent.)

Do you have a reference/background material on how the $\chi^2$ statistics arise from this kind of nested model? — Dave, Sep 14 '21 at 15:31
Thank you! Would you expect Option 2 to give a different result from Option 1? — Nayef, Sep 14 '21 at 15:46

Test whether two separate Weibull distributions describe the data better than a single Weibull distribution

1 Answers1