I like to infer the contribution of each rower in a crew boat from a number of races: 8 rowers are split repeatedly into two boats of 4 rowers each. The race over a distance leads to an estimate of the power the crew delivered. For example, the 4 rowers Amanda, Cam, Emily and Cait raced and delivered 711 Watt during the race. Likewise the 4 other rowers in the second race. My goal is to infer each rower's contribution, which is assumed to be constant over the races:
Amanda Cam Emily Cait Paula Janeska Charli Diana power
1 1 1 1 1 0 0 0 0 711.0960
2 0 0 0 0 1 1 1 1 667.5720
3 0 1 0 1 1 0 1 0 540.5055
4 1 0 1 0 0 1 0 1 783.7682
5 0 1 1 0 0 1 1 0 657.2489
6 1 0 0 1 1 0 0 1 667.5720
7 1 1 0 0 1 1 0 0 627.5287
8 0 0 1 1 0 0 1 1 590.6250
9 1 1 0 0 0 0 1 1 647.1376
10 0 0 1 1 1 1 0 0 599.5737
11 0 1 1 0 1 0 0 1 734.2822
12 1 0 0 1 0 1 1 0 608.7041
fit <- lm(power ~ 0 + Amanda + Cam + Emily + Cait
- Paula + Janeska + Charli + Diana)
The basic idea is that the total power in each race is the sum of the individual power contributions and there is no other power source. Multi linear regression infers the coefficient for each rower towards the total and the intercept is zero because there is no other power source.
> summary(fit)
Call:
lm(formula = power ~ 0 + Amanda + Cam + Emily + Cait + Paula +
Janeska + Charli + Diana)
Residuals:
1 2 3 4 5 6 7 8 9 10
36.366 36.366 9.169 9.169 9.443 9.443 -43.891 -43.891 -29.612 -29.612
11 12
18.525 18.525
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
Amanda 401.77 27.11 14.819 2.53e-05 ***
Cam -43.29 30.47 -1.421 0.2146
Emily 409.47 27.11 15.103 2.31e-05 ***
Cait -93.22 30.47 -3.059 0.0281 *
Paula 349.58 27.11 12.894 5.00e-05 ***
Janeska -36.64 30.47 -1.202 0.2830
Charli 318.27 27.11 11.739 7.89e-05 ***
Diana NA NA NA NA
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 43.09 on 5 degrees of freedom
Multiple R-squared: 0.9982, Adjusted R-squared: 0.9957
F-statistic: 396.7 on 7 and 5 DF, p-value: 1.483e-06
It almost works but here is where I could use help:
There are not enough races to solve this exactly. This could be understood as a linear algebra problem and we can't solve for 8 parameters from 12 equations here. Because races are tiring, we can't simply add races.
Is it possible to better describe what we learn about each rower as races progress?
Rather than asking for absolute power contributions, can we infer relative contributions? For example, rowers who participate in high-powered races are likely to be a contributor. How can this be better captured?
This is a new take on a similar question I had asked before.I am still looking for the right statistical framework.
R code:
Amanda = c(1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L)
Cam = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L)
Emily = c(1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L)
Cait = c(1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L)
Paula = c(0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L)
Janeska = c(0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L)
Charli = c(0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L)
Diana = c(0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L)
power = c(711.096048081832,
667.572021484375, 540.50554255546, 783.768218256806, 657.248927084595,
667.572021484375, 627.528708276313, 590.625, 647.137583778637,
599.573712607379, 734.282165754758, 608.704121100815)
df <- data.frame(Amanda,Cam,Emily,Cait,Paula,Janeska,Charli,Diana,power)
fit <- lm(power ~ 0 + Amanda + Cam + Emily + Cait
- Paula + Janeska + Charli + Diana)
Additional Constraints
This section was added late and covers some additional constraints that could be modelled.
Each rower has only one oar and thus a crew consists of two rowers rowing on so-called bow side and two on so-called stroke side. Rowers don't switch sides - a rower always rows on the same side.
Stroke side: Emily, Amanda, Paula, Charli; bow side: Diana Janeska, Cam, Cait
Pairs of races happen in short succession or side by side: 1/2, 3/4 and so on. This implies all rowers are split between the two boats racing. If we assume that a rower's power output is constant, that would imply that the total power of such a pair is constant as well. As can be seen, this is only approximately the case and is not modelled. A typical reason is that rowers get tired and can't emit as much power in their 3rd race as in the 1st.
Because rowers are rowing with one oar, the power difference in a crew between stroke side and bow side can't be too large as the boat would not go straight otherwise. This is currently not modelled.
Traditional Method
The traditional method of ranking rowers is based on the time races take: each rower accumulates the time they spent racing and are later ranked based on the accumulated time. This is equivalent to summing up for each rower the crew power of the boats they raced in and then ranking based on power. My goal is to improve on this as this method has no insight into the uncertainties.
lmhas arbitrarily selected a method to identify the rest. – whuber Sep 09 '22 at 21:26