The hypergeometric distribution arises from sampling without replacement. The similar binomial sampling distribution assumes replacement. Hypergeometric distributions are commonly used in quality assurance to determine the acceptable quality limit/level (AQL) given a sampling rate. Given that the inverse CDF function is indispensable in computing confidence intervals and critical values, and given the ubiquity of sampling without replacement in the real world, I was surprised that I could not find a function for the inverse cumulative density function of the hypergeometric distribution. I suppose it is because the exact solution for the normal CDF is very complex, relying on the $_3 F _2$ of the generalized hypergeometric function.
- MathWorks has an organic function, hygeinv, but does not provide methods.
- In Wolfram Mathematica,
InverseCDF[HypergeometricDistribution[n, K, N], q]provides no deeper resolution. - Even the statistical literature, Exact Optimal Confidence Intervals for Hypergeometric Parameters, relies on tables.
While the normal distribution and binomial distributions approximate confidence intervals and critical values of the hypergeometric distribution, they may result in misleading thresholds, especially for small sample sizes and p values.
What is the ”good” analytical approach (no tables) to finding critical values and/or confidence interval for hypergeometrically distributed variables?
I define “good” here as a either a discrete or continuous method that converges to the true solution.
Also, there’s extra credit if you can improve the root finding algorithm given in the following rough VBA function:
Function inverseHyperGeom_exact(ByVal probability, ByVal Number_sample, ByVal Population_s, _
ByVal Number_pop, Optional ByVal fromLeft As Boolean = True, Optional ByVal maxIter As Integer = 500)
Dim trial_prob As Double
Dim i As Integer
' HYPGEOM.DIST(sample_s,number_sample,population_s,number_pop,cumulative)
' Sample_s Required. The number of successes in the sample.
' Number_sample Required. The size of the sample.
' Population_s Required. The number of successes in the population.
' Number_pop Required. The population size.
' Cumulative Required...
i = 0
Do While trial_prob < probability and i <= maxIter
trial_prob = WorksheetFunction.HypGeom_Dist(i, Number_sample, Population_s, Number_pop, True)
i = i + 1
Loop
If fromLeft = True Then
inverseHyperGeom_exact = i - 1
Else
inverseHyperGeom_exact = i
End If
End Function
qhyperis the inverse of the CDFphyperof a hypergeometric distribution. If a lot has 90 good items and 10 defective ones the probability that a sample of size 3 will have $\le 1$ good item isphyper(1, 90,10, 3), which returns 0.0257885. Soqhyper(.02, 90,10, 3)returns 1, whileqhyper(.03, 90,10, 3)returns 2. // Because intermediate computations can overflow, the programming of such functions has to be done carefully. There are 'log' options for extreme cases, see, for details. – BruceET Jun 10 '19 at 02:16i. (Is there a reason you're not using a better root finder than this sequential search?) Would good approximations qualify as "correct" or not? – whuber Jun 10 '19 at 16:44