1

General question: How do I fit a model to data when the data points have asymmetric error bars? What is the cost function I use to calculate residuals, and from that, how do I calculate confidence intervals/covariances for the fit parameters?

More specific details: I have data that looks something like

$$ Y \sim B(n, A \sin(2\pi f t + \phi) + O)/n $$

Where $B(n, p)$ is the binomial distribution with $n$ trials and success probability $p$. $A, f, \phi$, and $O$ are considered to be fixed parameters that I am trying to estimate. I sample $Y$ $n$ times for a variety of values of $t$ to get $t/y$ data that I want to curve fit to. I calculate each $y$ data point as the fractional number of successes at each $t$. I use the Wilson Score Interval to calculate asymmetric error bars on the $y$ data points for each $t$. This differs from the naive estimation for the error on the mean for a binomial distribution when $p$ is close to 0 or 1 and the number of trials is small.

When error bars are symmetric I curve fit this data by calculating residuals for each point, weighting by the error for each point. What do I do for the case of asymmetric error bars? I could weight by the average of the upper and lower error bars? I could weight by the upper error bar when the data point is above the model line and the lower error bar when the data point is below the model line but this would give discontinuous residuals..

What's the statistically sound approach here?

  • it sounds like you should use logistic regression - you can pass successes and failures in many libraries – seanv507 May 16 '23 at 08:12
  • @seanv507 Can you please give more details about how logistic regression will answer my questions? Can you suggest python libraries that can be used for this? Perhaps you can expand your comment into an answer? – Jagerber48 May 16 '23 at 15:21
  • 1
    You should be modeling the counts directly rather than going through this intermediate formulation with error bars. That's what logistic regression will do for you. There are approximate ways that work well for large $n$ and consistent numbers of replicate observations, such as applying an Arcsine transformation to the proportions. – whuber May 16 '23 at 17:16
  • logistic regression will handle the asymmetric error bars. however it uses a fixed set of inputs,$g_i$ $ln(p(Y=1)/p(Y=0) = \sum_i \alpha_i g_i(t) + k$ and only adjusts the $\alpha_i$. If you are really after the curve fitting you are suggesting, then you might adapt the steps of logistic regression to your curve fit. if instead you can use a fixed set of sine waves as inputs [ with eg variable selection to drop frequencies], then straight logistic regression would work. Could you back up and explain what is the original problem you are trying to solve.why must it be a single sine wave etc – seanv507 May 16 '23 at 17:35
  • @seanv507 I am trying to estimate the parameter $f$ and maybe also $A, O$ and $\phi$. These parameters drive a binomial process that I can measure in such a way that the success probability is related to these parameters (and the measurement time $t$ which I can control as a fixed parameter) according to the sine function above. My current approach is to estimate the mean and confidence interval for a variety of time $t$ and then perform a weighted non-linear least squares fit to the data to extract the parameters I'm interested in. – Jagerber48 May 16 '23 at 18:15
  • @whuber ok, I'm interested to learn more about how I can "model the counts directly rather than going through this intermediate formulation with error bars". Can you say more about this or expand into an answer? – Jagerber48 May 16 '23 at 18:16
  • 1
    I can do much better than that: over a thousand highly upvoted answers on logistic regression questions are available with this site search. Because your model is nonlinear in the explanatory variable, I am reminded of another closely related approach I described long ago at https://stats.stackexchange.com/a/64039/919. But in your case, conditional on $f$ your problem is a standard logistic regression, which you can solve for any $f$ easily and then choose the $f$ with the best solution. – whuber May 16 '23 at 18:18

0 Answers0