I want to build a database of devices performance on various tests so that I can compare devices on a single test. Unfortunately not all reviewers review every device, this leaves holes in my data that prevents comparing every device in the dataset on every metric.
But there is enough overlap of measurements of the same devices that I can estimate how different tests are related. i.e. everyone reviews the iPad, but not everyone reviews the middle size of Huawei device.
What kind of statistical model can I use to help predict the performance on missing data points based on past data? (Bonus points if I can generate a confidence interval for predicted points).
I have used regression between two sources based on the overlap to help predict what the results might look look like for missing points, but I assume there is a way to exploit all the data at once to get better predictions, as well as avoid using two "hops" of estimation in some cases.
Toy Example for battery life:
| Model | Bob's Guide | Footbook Check | The Berge |
|---|---|---|---|
| iPad | 10.95 | 12 | 10 |
| Tab S8 | 12.88 | 13.5 | ? |
| Tab S8 Ultra | ? | 10.85 | ? |