0

I'm looking for a way to combine noisy biased measurements to find confidence intervals. As an example, we have two people Tom & Mary that are each taking free throws on separate days, we have the following results from the two days:

Day Player makes attempts
Day 1: Sunny Mary 89 100
Day 1: Sunny Tom 171 200
Day 2: Windy Mary 119 200
Day 2: Windy Tom 51 100
Day 3: Rainy Mary 24 50
Day 3: Rainy Tom 24 50

What kind of strategies are there to calculate which player is better and with what confidence?

I though about doing a gaussian approximation for each day and Combining uncertain measurements, but from the answers I've seen they all assume unbiased measurements, and as you can see, there is a bias based upon the weather.


Edit I have a strategy, but I am curious if it's way off base:

  1. For each day, I calculate the pct diff between the two players by:
    • approximating the distributions with gaussians
    • subtracting the gaussians:
      • $\mu_{\Delta} = (\mu_{mary} -\mu_{tom})$
      • $\sigma^2_{\Delta} = (\sigma^2_{mary} +\sigma^2_{mary})$
    • normalizing to $\Delta\%$ of tom's distribution
      • $\mu_{\Delta\%} = \mu_{\Delta} / \mu_{tom}$
      • $\sigma^2_{\Delta\%} = \sigma^2_{\Delta} / \mu_{tom}^2$
    • Result:
      • Day 1: $\mu = 0.041$, $\sigma^2 = 0.0022$
      • Day 2: $\mu = 0.017$, $\sigma^2 = 0.0139$
      • Day 3: $\mu = 0.000$, $\sigma^2 = 0.0409$
  2. I use the answer here https://stats.stackexchange.com/a/275520/3143 to calculate the mean of the deltas:
    • \begin{equation} \hat{x} = \frac{\Sigma_ix_i/\sigma^2_{x,i}}{\Sigma_i1/\sigma^2_{x,i}} \end{equation}
    • $$ \frac{1}{\sigma^2} = \sum_i \frac{1}{\sigma_{x,i}^2}$$
    • Result:
      • $\mu = 0.0557$, $\sigma^2 = 0.0018$
      • 95% confidence interval: -2.83% - 13.97%

Edit 2 As a check, I ran my calculations across the same data with identical days, once grouped and once split:

Day Player makes attempts
Each Day Mary 85 100
Each Day Tom 75 100
Total Mary 255 300
Total Tom 225 300

And got the following results:

  • By Day and aggregated:
    • $\mu = 0.133$, $\sigma^2 = 0.00186$
    • confidence interval: 4.89% - 21.77%
  • Total Row:
    • $\mu = 0.133$, $\sigma^2 = 0.00185$
    • confidence interval: 4.87% - 21.79%
seanv507
  • 6,743
  • Seems to me that your data are richer than needed to produce a unidimensional "better". Better in the rain, better in the wind are also options. If you want to aggregate the data into one dimension then think about weighting the weather results on the basis of how frequently those conditions might be expected. – Michael Lew Nov 08 '22 at 20:42
  • 2
    I would note that you do not have any data that helps you determine whether the apparent weather effects are actually just inter-day variation. – Michael Lew Nov 08 '22 at 20:43
  • @MichaelLew I chose weather as an artificial metric for purposes of illustration only. I'm not concerned with who is better on a sunny day, but rather who is better in aggregate. The primary purpose is to work around potential Simpson's paradox issues when there are unequal sample sizes and inter-trial variation. In this example, Tom has made a higher total pct (246/300 vs 232/300) despite being worse on each day.

    Does it seem like the method I posted in my question is a potential way of determining confidence intervals around how much better Mary shoots FTs?

    – Jacob Eggers Nov 08 '22 at 21:14
  • 1
    If you want to know who is better 'in aggregate' then you need to decide on a definition of 'in aggregate'. That's why I suggested that you consider a weighting of the frequency of weather conditions. There will not be a single obvious or 'best' (which would itself need a definition) choice. However, even though the example is just an example, think about what aspects of the system are well sampled by the available data. – Michael Lew Nov 09 '22 at 20:45
  • 1
    As to your method, I would not reach for a Gaussian for count data that are far from 50%. Run some simulations and see how your method fares with known systems. – Michael Lew Nov 09 '22 at 20:47
  • I picked a uniform weighting of trials because there are a large number of trials/days, and no differentiating data on trials other than that they occurred on different time frames. – Jacob Eggers Nov 09 '22 at 22:34
  • Also, my understanding is that normal approximations to binomials are fairly good when np(1-p) > 10, is that not the case? Do you have any suggestions on known systems that I could simulate/test? – Jacob Eggers Nov 09 '22 at 22:34
  • Normal approximations are probably OK here. I misread the table and thought that you had results far from 50%. Explore your problem by simulations in R. – Michael Lew Nov 10 '22 at 20:34
  • Your rationale for ignoring the information in the weather makes no sense. Particularly so if you have lots of data. – Michael Lew Nov 10 '22 at 20:35
  • Weather was just an example for purposes of illustration, the actual data is unrelated to weather & basketball. – Jacob Eggers Nov 11 '22 at 01:27
  • 1
    Yes, weather is just an illustrative example, but whatever replaces weather in a real dataset might be equally important and interesting. (This is my last comment here.) – Michael Lew Nov 11 '22 at 19:56
  • How are these measurements "biased" and "noisy"? It seems you are assuming that the recorded numbers are all precise!? – Christian Hennig Nov 15 '22 at 20:55
  • @ChristianHennig Maybe I'm using terminology incorrectly, but my reasoning is: – Jacob Eggers Nov 15 '22 at 22:45
  • Noisy because how many FTs someone makes is a noisy measurement of their true FT%. – Jacob Eggers Nov 15 '22 at 22:45
  • Biased because the variables of the day the trial (weather, height of rim, or other hidden variables) will affect the percent of FTs that are converted for that particular day. – Jacob Eggers Nov 15 '22 at 22:47
  • 2
    Fair enough. I wouldn't use the terms like this as the measurements in themselves are precise (as far as I assume). I'd use "bias/noise" of the measurement if there is reason to assume that somebody counts the values wrongly (systematically or randomly). But I see where you are coming from. – Christian Hennig Nov 15 '22 at 23:18

2 Answers2

2

The terms you used are misleading. your "Noise" - is called sampling error, and basically any statistical method addresses sampling error.

There is no "Bias", but dependence on multiple independent variables. So you would look at (multiple) logistic regression for your problem. People talk of "controlling" for other variables (How exactly does one “control for other variables”?)

Think of studies of effect of smoking on chance of heart attack * controlling* for eg age/gender etc. ...

This is just a standard logistic regression problem with weather and player as factors.

So you would look at confidence intervals around the coefficient for player. (see eg https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_multivariable/bs704_multivariable8.html)

That's the basic "mechanics". as others have said, quite what you mean by aggregate will change the details.

seanv507
  • 6,743
0

Since I haven't received any answers, I'm posting the strategy I ended up using as an answer here. I'll wait to accept this answer for a bit to see if anyone else has a different strategy or simulation which might show different results:

  1. For each day, I calculate the pct diff between the two players by:
    • approximating the distributions with gaussians
    • subtracting the gaussians:
      • $\mu_{\Delta} = (\mu_{mary} -\mu_{tom})$
      • $\sigma^2_{\Delta} = (\sigma^2_{mary} +\sigma^2_{mary})$
    • normalizing to $\Delta\%$ of tom's distribution
      • $\mu_{\Delta\%} = \mu_{\Delta} / \mu_{tom}$
      • $\sigma^2_{\Delta\%} = \sigma^2_{\Delta} / \mu_{tom}^2$
    • Result:
      • Day 1: $\mu = 0.041$, $\sigma^2 = 0.0022$
      • Day 2: $\mu = 0.017$, $\sigma^2 = 0.0139$
      • Day 3: $\mu = 0.000$, $\sigma^2 = 0.0409$
  2. I use the answer here https://stats.stackexchange.com/a/275520/3143 to calculate the mean of the deltas:
    • \begin{equation} \hat{x} = \frac{\Sigma_ix_i/\sigma^2_{x,i}}{\Sigma_i1/\sigma^2_{x,i}} \end{equation}
    • $$ \frac{1}{\sigma^2} = \sum_i \frac{1}{\sigma_{x,i}^2}$$
    • Result:
      • $\mu = 0.0557$, $\sigma^2 = 0.0018$
      • 95% confidence interval: -2.83% - 13.97%

I did some simplified simulations for deltas in the range of +/-16% which were close to the number I got from the above strategy.