Estimation of missing measurements based on other measurements in a sparse dataset

Question

I want to build a database of devices performance on various tests so that I can compare devices on a single test. Unfortunately not all reviewers review every device, this leaves holes in my data that prevents comparing every device in the dataset on every metric.

But there is enough overlap of measurements of the same devices that I can estimate how different tests are related. i.e. everyone reviews the iPad, but not everyone reviews the middle size of Huawei device.

What kind of statistical model can I use to help predict the performance on missing data points based on past data? (Bonus points if I can generate a confidence interval for predicted points).

I have used regression between two sources based on the overlap to help predict what the results might look look like for missing points, but I assume there is a way to exploit all the data at once to get better predictions, as well as avoid using two "hops" of estimation in some cases.

Toy Example for battery life:

Model	Bob's Guide	Footbook Check	The Berge
iPad	10.95	12	10
Tab S8	12.88	13.5	?
Tab S8 Ultra	?	10.85	?

Hi Mark - this is a matrix completion problem. Search "matrix completion" on this R task view for some R packages that may help https://cran.r-project.org/web/views/MissingData.html — Paul, Apr 04 '23 at 18:19
Paul "matrix completion" is the keyword I was looking for, thanks! — Mark Omo, Apr 04 '23 at 19:55

Estimation of missing measurements based on other measurements in a sparse dataset

0 Answers0