4

Please read the problem till the end. It may appear first that this problem was answered in earlier posts, but it is not so. I have read all the related posts.

Problem: Suppose I have two data sets (for two treatments), G and A. I run two logistic regressions for G and A: \begin{eqnarray*} \log \left[ \frac{\Pr (R)}{1-\Pr (R)}\right] _{G} &=&\beta _{0G}+\beta _{1G}X+\beta _{2G}Y \\ \log \left[ \frac{\Pr (R)}{1-\Pr (R)}\right] _{A} &=&\beta _{0A}+\beta _{1A}X+\beta _{2A}Y. \end{eqnarray*}

Based on the estimates of logistic regressions, I have two lines: \begin{eqnarray*} x_{G}^{\ast } &=&-\frac{\hat{\beta}_{0G}}{\hat{\beta}_{1G}}-\frac{\hat{\beta}% _{2G}}{\hat{\beta}_{1G}}Y \\ x_{A}^{\ast } &=&-\frac{\hat{\beta}_{0A}}{\hat{\beta}_{1A}}-\frac{\hat{\beta}% _{2A}}{\hat{\beta}_{1A}}Y. \end{eqnarray*}

QUESTION: How do I test that $|\frac{\hat{\beta}_{2G}}{\hat{\beta}_{1G}}|>|% \frac{\hat{\beta}_{2A}}{\hat{\beta}_{1A}}|$, i.e., slope of $x_{G}^{\ast }$ is greater than the slope of $x_{A}^{\ast }?$

Progress so far (Jan 26, 2016): I came across a document, "Ratios: A short guide to confidence limits and proper use" by Franz (2007), which mentions methods such as Fieller, Taylor (or Delta), Bootstrap and Regression. However, all these methods are based on say, $\rho =\frac{E[Z]}{E[W]}$, where $Z$ and $W$ are random variables, and a test statistic is derived from the sample of $N$ paired measurements $(z_{i},w_{i})$, with $i=1,2,...,N$. Applied to my problem, $Z=\hat{\beta}_{2}$, and $W=\hat{\beta}_{1},$ where $% \hat{\beta}_{1}\sim N(\beta _{1},s.e.(\hat{\beta}_{1})),\hat{\beta}_{2}\sim N(\beta _{2},s.e.(\hat{\beta}_{2}))$ (asymptotically; I have large number of data points). However, I don't have paired measurements such as $\left( \hat{% \beta}_{11},\hat{\beta}_{21}\right) ,...,\left( \hat{\beta}_{1N},\hat{\beta}% _{2N}\right) .$ I am kind of stuck here. Will appreciate any help.

  • 2
    Sounds like a good place for bootstrap. – StatsStudent Jan 27 '16 at 03:21
  • @StatsStudent. Thanks, but don't I need an initial sample to start with for bootstrapping? I may be wrong. – highCuriosity Jan 27 '16 at 03:23
  • 1
    I'm not sure what you mean by an "initial sample." To use a non-parametric bootstrap, you simply need to take resamples of size N from your original dataset. You could also use the delta/Taylor Series method -- it does not require paired samples. – StatsStudent Jan 27 '16 at 03:52
  • Re. Bootstrap, by "initial sample," I meant paired measurements such as (beta11,beta21),...,(beta1N,beta2N), which I don't have. All I have is each beta's distributions. When you say N paired measurements from the original dataset, do you mean that I bootstrap datasets G and A to generate N bootstrap samples of G and A, and for every sample, I estimate β₂ and β₁? Then, for each of N bootstrap resamples, I calculate beta2G/beta1G and beta2A/beta1A? However, since I don't know the distribution of these ratios, how do I do hypothesis testing (disclosure: never done bootstrap before). Thanks. – highCuriosity Jan 27 '16 at 05:55
  • All you need for a bootstrap,is $\hat{\beta}{1r}, \hat{\beta}{2r}$ etc from bootstrap realiz<tion number $r$. – kjetil b halvorsen Jan 27 '16 at 09:56
  • @kjetilbhalvorsen, could you please elaborate in a bit more detail? As I mentioned, I am very new to bootstrapping. What exactly do I bootstrap here? What is my starting point? Thanks. – highCuriosity Jan 27 '16 at 16:08

1 Answers1

4

If $\beta_{1} \ne 0$, there is a limiting distribution theory. You can find the limiting distribution by using the Delta method.

Let the relevant coefficients be $\beta$, you have:

$\sqrt{n}\left(\hat{\beta} - \beta\right) \overset{d}\rightarrow N\left(0,\Sigma\right)$

For a covariance matrix $\Sigma$ (this is the covariance matrix for all $4$ coefficients in the two ratios).

Define the function $g\left(x_{1},y_{1},x_{2},y_{2}\right) = \frac{x_{1}}{y_{1}} - \frac{x_{2}}{y_{2}}$.

Assuming that $\beta_{1G}$ and $\beta_{1A}$ are non-zero,

$\sqrt{n}\left(g\left(\hat{\beta}_{2G},\hat{\beta}_{1G}, \hat{\beta}_{2A}, \hat{\beta}_{1A}\right) - g\left(\beta_{2G}, \beta_{1G}, \beta_{2A}, \beta_{1A}\right)\right) \overset{d}\rightarrow N\left(0, \nabla g\left(\beta_{2G}, \beta_{1G}, \beta_{2A}, \beta_{1A}\right)^{\top} \Sigma \nabla g\right)$.

Where $\nabla g$ is the gradient of $g$.

Now that you have the asymptotic distribution of the difference of the ratios, you can form hypothesis tests using standard techniques.

John
  • 435
  • Thanks John. To make sure that β₁≠0, is showing Coeff of Variation (β₁) <1/3 (re. Franz, 2007) enough? – highCuriosity Jan 27 '16 at 18:53
  • You could test the nulls that $\beta_{1A}= 0$ or $\beta_{1G} = 0$ with a $t$-test to give some evidence that $\beta_{1A} \ne 0$ and $\beta_{1G} \ne 0$. Usually, that's in the regression output. In R, it is, anyways. – John Jan 27 '16 at 19:20
  • Thanks John. A couple of clarification questions: (i) beta1 and beta2 (from both G and A logistic regressions in the question) are jointly normally distributed, asymptotically by the Central Limit Theorem, right? (ii) could another way to solve the problem be to calculate the confidence intervals (CI) of (beta2_G/beta1_G) and (beta2_A/beta1_A), by Delta/Taylor method, and show that the lower bound of the former's CI is greater than the upper bound of later's CI? – highCuriosity Jan 28 '16 at 13:15
  • (i) Yes, they are asymptotically normally distributed. (ii) If the $G$ and $A$ regressions use independent data sets, that would work because the estimates in both regressions would be uncorrelated with each other. If the two regressions do not use independent data, then no that won't work. The joint confidence region is different than combining the two confidence intervals. – John Jan 28 '16 at 15:47
  • (ii) yes, G and A are independent. Thanks so much John. Appreciate it. One last question: although we have a large dataset (N=7000), we don't know the population StDev(beta2/beta1); we only know the estimated (by the Delta method) one. For the calculation of the confidence interval, for a 95% confidence level, should I use t-score or z-score? – highCuriosity Jan 28 '16 at 17:29
  • 1
    delta method might not be very good for ratios, especially when $\beta_1$ is close to zero (if distribution of $\beta_1$ has a continuous density function positive at zero, then the expectation will not exist ... – kjetil b halvorsen Dec 17 '16 at 17:32