I have a dataset with 20 rows and 480 columns. When I run plsr command with validation="LOO" my output shows RMSEP or CV is increasing with number of components and stabilizing after 6 components. My impression is that RMSEP should decrease. Confused! Help!
Asked
Active
Viewed 515 times
1 Answers
1
You are right, generally RMSECV is supposed to decrease first, followed by stabilization or increment or very slow decrement. If you are observing a steady increment of RMSECV starting from first component, it is likely:
- Your dependent and independent variables are not linearly related to each other(enough).
- You have insufficient number of observations to reveal the relationship between dependent and independent variables.
- You really need only 1 component to model the data.
gunakkoc
- 1,532
- 1
- 12
- 23
-
Thank you! Looks like I need to investigate at least (1),(2) and (3) above. However, in terms of variance capture it looks as though I need about 7 components (captures 99% of variance) and RMSEP is stabilizing at this point. I know my sample size is very small - but gasoline has almost similar sample size. I think I shall try non-linear first. – Rk57 Feb 10 '18 at 22:33