2

I want to build a database of devices performance on various tests so that I can compare devices on a single test. Unfortunately not all reviewers review every device, this leaves holes in my data that prevents comparing every device in the dataset on every metric.

But there is enough overlap of measurements of the same devices that I can estimate how different tests are related. i.e. everyone reviews the iPad, but not everyone reviews the middle size of Huawei device.

What kind of statistical model can I use to help predict the performance on missing data points based on past data? (Bonus points if I can generate a confidence interval for predicted points).

I have used regression between two sources based on the overlap to help predict what the results might look look like for missing points, but I assume there is a way to exploit all the data at once to get better predictions, as well as avoid using two "hops" of estimation in some cases.

Toy Example for battery life:

Model Bob's Guide Footbook Check The Berge
iPad 10.95 12 10
Tab S8 12.88 13.5 ?
Tab S8 Ultra ? 10.85 ?
Mark Omo
  • 125
  • 6
  • 1
    Hi Mark - this is a matrix completion problem. Search "matrix completion" on this R task view for some R packages that may help https://cran.r-project.org/web/views/MissingData.html – Paul Apr 04 '23 at 18:19
  • Paul "matrix completion" is the keyword I was looking for, thanks! – Mark Omo Apr 04 '23 at 19:55

0 Answers0