1

Consider the following scenario. I have a data set with information on several bouts, for both boxers and for the three judges within each bout (we are speaking about boxing fights).

I want to conduct an analysis at the judge level, for both boxers (i.e. I want to study whether a certain characteristic of the boxer, interacted with a characteristic of the judge, affects the probability of the judge's score to be larger than that of the opponent boxer). So, within each bout I would have 6 observations, that is, 2 observations per judge. These observations from the same judge would have duplicate information on the two boxers, but with opposite outcome.

For instance, within one bout, I have judge J who gives a larger score to boxer A than to his opponent B. For boxer A, the outcome =1 (i.e. he got a larger score than the opponent B), this outcome is explained by X, that is, certain boxer A's characteristics, as well as by Y, that is, certain opponent's characteristics (i.e. the characteristics of boxer B). For boxer B, the outcome =0 (i.e. he got a smaller score than the opponent A), this outcome is explained by X, that is, certain boxer B's characteristics, X, as well as by Y, that is certain opponent's characteristics (i.e. the characteristics of boxer A).

I intuit there is some problem that would affect my econometric estimates, but I do not know what problem. Moreover, I guess I would have a similar problem if I used the plain judge's score as the outcome.

What are the econometric implications of using duplicate information that leads to within-judge "complementary" outcome?

[I came up with a solution that is based on a transformation of the outcome, basically the percentile within-bout score rank, and I am doubting about the presence of problems of some sort there as well. I have illustrated this solution in another stackexchange post, in terms of division of cakes; however, I guess that also that outcome might hid some issue]

I think that similar problems might rise when you investigate a data set with information at the individual level, clustered on household, and only one individual within the household can get a certain positive outcome, while the other individuals within the household receive outcome =0 and you want to control for these "other member's individual characteristics".

Fuca26
  • 1,097
  • I am thinking that a possible econometric implication could be a downward bias in the estimated standard errors, and thus an increase in the type II error (so, not a bias in the estimates, but a bias in the standard errors) – Fuca26 May 19 '20 at 20:19
  • I have found this paper https://ojs.ub.uni-konstanz.de/srm/article/view/7149/6478 and this post https://stats.stackexchange.com/questions/19698/if-i-repeat-every-sample-observation-in-a-linear-regression-model-and-rerun-the but both of them are about duplicate observations (i.e. entire rows of a dataset are duplicated). These studies hint at a downward bias in the s.e. (and thus an increase in type II error) – Fuca26 May 19 '20 at 21:05

0 Answers0