Changing the scale in a hypothesis test of proportions

Question

When performing a hypothesis test of a difference of two proportions, does one change the integrity of the problem when changing the scale?

To clarify: I want to test the difference in proportions of maintenance errors per hour of maintenance for two sets of years, say A = [1990-2000] and B = [2000-2010]. When I did this, I found that I had 4 errors from set A and 4 errors from set B. I also found that I had about 2000 maintenance hours in set A and about 2500 maintenance hours in set B. This does not meet the "rule of thumb" for using the normal distribution in a CI of proportions. Also, I've been told that the n = 2000 is far to greater than x = 4 for the proportion x/n. (Same for y and m).

Are there any general methods to combat this? Does it change the integrity of the problem if I just change the scale from maintenance hours, to days of maintenance? (2000/24 = 83.3 so use 4/83.3 instead of 4/2000)

Sorry if this is unclear, I tried to explain it the best I could. I can't find any literature on this.

What are your numerator & denominator in these ratios? It seems like your denominator is hours, is your numerator ("4 errors") measured in hours or is it an event (the machine broke down)? — gung - Reinstate Monica, Nov 01 '17 at 15:09
In my answer I assumed an event. If 4 errors was measured in hours, the transformation of the time measure seemed inappropriate. 4 error-hours would definitely be something totally different then 4 error-days. — Bernhard, Nov 01 '17 at 15:23
@gung, My idea was to use maintenance errors / hour of maintenance. I wanted to use maintenance errors / maintenance "session" for lack of a better term, but that kind of data is not available. The best I could do was an approximation of the maintenance hours — ajmullins, Nov 01 '17 at 15:35
What is the nature of an "error" here? Is it an hour? Is it a physical malfunction? Can you describe what happens when there is an error? — gung - Reinstate Monica, Nov 01 '17 at 15:39
@gung yes, my apologies. An error here is considered doing something not described in the maintenance procedures that results in damaging the integrity of the mechanical structure. For example, a technician stripping a hex screw, or creating an arc that damaged the electronics, or using the wrong washer size that resulted in a leak, etc. — ajmullins, Nov 01 '17 at 16:00

gung - Reinstate Monica · Answer 1 · 2017-11-01T18:14:41.513

You don't really have proportions, you have ratios. A binomial is a certain number of 'successes' out of a fixed number of trials. The easiest way to think about this is a number of heads out of a fixed number of coin flips. Your numerator is an event—analogous to getting a heads—but your denominator isn't a trial. Thus, the binomial distribution is not appropriate for your data.

Your data are counts of events. The simplest distribution for counts is the Poisson. The Poisson distribution is generally taken to be a default distribution for counts, but it is very restrictive and is not actually typically used for that reason. It is much more common to use a distribution that allows for greater flexibility (specifically, greater variance) like the negative binomial. However, you don't have the ability to use the NB, because you have, in essence, only two data. Thus, you are in the unfortunate position of making a very strong (and likely incorrect) assumption, or not being able to test your data.

We also need to account for your denominator, which is an opportunity for an event to occur in some sense, even if it isn't quite a trial. This can be done by using an offset (see here and here).

At this point then, you can conduct a low-powered, possibly invalid test that provides a lower bound on the p-value. Using your data, here is that test (conducted with R):

poisson.test(x=c(4, 4), T=c(2000, 2500))
#   Comparison of Poisson rates
# 
# data:  c(4, 4) time base: c(2000, 2500)
# count1 = 4, expected count1 = 3.5556, p-value = 1
# alternative hypothesis: true rate ratio is not equal to 1
# 95 percent confidence interval:
#  0.232822 6.711136
# sample estimates:
# rate ratio 
#       1.25

This result shouldn't be that surprising. You have the same number of errors in each case, and nearly the same number of hours.

This is so comprehensive. Thank you. I knew something felt off about doing a hypothesis test of proportions when the denominator wasn't an event. Can you explain some of the fields in your code? What is the expected count1, what is the 95% CI for, and what exactly is the rate ratio? I read the questions and answers you linked, but I have no experience with GLM so i'm not sure I followed them entirely. — ajmullins, Nov 01 '17 at 18:14
@ajmullins, the documentation for the function is here: ?poisson.test. The output is putting the 2nd datum in terms of the 1st: the rates are the number of errors per 2k hours. The CI is the same as any other CI, just for the rate ratio, which is 4/3.5556=1.25. — gung - Reinstate Monica, Nov 01 '17 at 18:19

Bernhard · Answer 2 · 2017-11-01T15:16:33.863

0

If you cannot use the approximation of normality, then don't use it. Use some sensible approximation. Maintenance errors per hour sound like they could reasonably be regarded as Poisson distributed. Please check, if this applies.

If so, we have one single parameter $\lambda$ for each group, let'S call them $\lambda_A$ and $\lambda_B$. Assuming a flat prior (which is not that reasonable but a good first approach), we can model these with a conjugate prior as $Gamma$ distributed: https://en.wikipedia.org/wiki/Conjugate_prior#Discrete_distributions

$\lambda_A$ als being $Gamma(4, 2000)$ and $lambda_B$ as being $Gamma(4, 2500)$ distributed.

Let's sample 1 million possible $\lambda_A$ and one million possible $\lambda_B$:

lambda_A <- rgamma(1000000, 4, 2000)
lambda_B <- rgamma(1000000, 4, 2500)

plot(density(lambda_B), col="blue")
points(density(lambda_A), col="red", type="l")

Now let's see, how many times in one million samples. $\lambda_A$ was larger then $\lambda_B$

> sum(lambda_A < lambda_B)/1000000
[1] 0.379699
> sum(lambda_A == lambda_B)/1000000
[1] 0
> sum(lambda_A > lambda_B)/1000000
[1] 0.620301

So $\lambda_A$ was smaller in 38% of the samples and larger in 62%. This gives you a good estimation of the chances of each being larger then the other one.

How much larger?

> summary(lambda_A - lambda_B)
  Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.0073480 -0.0004217  0.0003514  0.0004001  0.0011750  0.0095330 
> plot(density(lambda_A - lambda_B))

The Poisson distributions seems more appropriate than normal approximation, except if you have more knowledge about the data generation process, than something different might be superior.

You might want to check, whether one million was a large enough sample to meet your precision needs.

Note: This is invariant on hours or working days:

> lambda_A <- rgamma(1000000, 4, 2000/24)
> lambda_B <- rgamma(1000000, 4, 2500/24)
> sum(lambda_A < lambda_B)/1000000
[1] 0.380063

-- still 38% as you'd expect from a sensible computation.

edited Nov 01 '17 at 15:16

answered Nov 01 '17 at 14:57

Bernhard

8,427
17
38

Wow, this is an awesome answer! Thank you so much for putting the time into helping me. I have bachelors degree in Mathematics (just earned) so my statistics knowledge is at an undergraduate level. I have a couple of questions
1. Can you explain the use of a "flat prior" and a "conjugate prior"?
2. What software did you use to perform your calculations?
Thank you!
– ajmullins Nov 01 '17 at 15:32
Sorry for leaving that out. I have a medical degree and no formal statistics education, so things can be learned... The software I used was R from http://www.r-project.org , which is free software and highly recommendable for a lot of different statistics applications. The approach to your problem taken here is called "Bayesian". Read more about it at https://en.wikipedia.org/wiki/Bayesian_inference . There it says: (to be ctd.) – Bernhard Nov 01 '17 at 16:36
"Bayesian inference derives the posterior probability as a consequence of two antecedents, a prior probability and a 'likelihood function' derived from a statistical model for the observed data." Prior probability coming into the computation is an important feature of Bayesian statistics. A 'flat prior' basically approaches the problem at hand as if we had no prior knowledge of how frequent maintenance problems are. I chose this voluntarily even though it is probably wrong. A conjugate prior is a conveniant way to express the prior probability (to be continued) – Bernhard Nov 01 '17 at 16:38
in a way, that makes further computations easy (i.e. choosing a gamma prior for a Poisson rate). Explaining these concepts in detail is asking to much for a post here, but this page will help in finding introductory information: https://stats.stackexchange.com/questions/125/what-is-the-best-introductory-bayesian-statistics-textbook . Please consider upvoting or accepting the answer, if it is helpful. – Bernhard Nov 01 '17 at 16:42
I absolutely did! Appreciate it! – ajmullins Nov 01 '17 at 17:56
No, you did not yet. You have a button to "accept" the best answer and you can up- and downvote as many answers as you want https://meta.stackexchange.com/questions/173399/how-to-upvote-on-stack-overflow – Bernhard Nov 01 '17 at 19:14

Changing the scale in a hypothesis test of proportions

2 Answers2