2

I know overfitting means low bias and high variance while under-fitting means high bias and low variance. I want to understand how does increasing the rank of latent factor model affect the bias-variance trade off in a recommender system?

1 Answers1

2

I assume you're talking about a recommender system in the style of the Netflix problem that works via a matrix factorization. In this case, higher rank models intuitively should have higher variance and lower bias.

  • They have lower bias because they can capture anything captured by a lower-rank model and more; therefore, they can get closer to the true data generating process.

  • They intuitively should have higher variance, though this is harder to justify with math. If you're willing to consider the specific case of PCA, then consider that the higher-rank model's learned parameters (rather, a subset of them) can be transferred literally to the lower-rank model. So, any low-variance method to fit the high-rank model can equally well fit the low-rank model, and the variance of the low rank model's parameters is therefore no higher than that of the high-rank model.

To be more formal, I would start with that idea, but work out the variance of the predictions, not the parameters, and I would try to be more general than just PCA. The predictions of the high-rank model can be written as the sum of the predictions of the low-rank model plus an extra term $\Delta$:

$$\hat Y_{hi} = \hat Y_{lo} + \Delta \implies $$ $$Var(\hat Y_{hi}) = Var(\hat Y_{lo}) + 2Cov(\hat Y_{lo}, \Delta) + Var( \Delta)$$

But, I don't know how to finish this off, because the covariance term may be negative. In fact, there are probably nasty examples where the high-rank model actually has lower variance than the low-rank one. So, I can only offer a rule of thumb, and not a theorem, that more flexibility "usually" means more variance. You may be able to evaluate this claim on your own datasets using bootstrapping or cross-validation, but that calls for a separate discussion.