4

I was wondering how one could use PCA in e.g. a dashboard for non Subject Matter Expert.

For example, you are quite certain that 2 PCs are sufficient based on the current data. It also makes sense for the data generating process (that is not that high-dimensional). You would visualize the scores and loading in a biplot and your Subject Matter Experts (SME) can easily identify any unusual behavior with a suitable outlier detection method like Mahalanobis distances. Now you want to give this Dashboard to your SME and every new datapoint is added to this Dashboard using a connection to a server.

My question: The covariance/correlation matrix will clearly change with every new data point. Would you just recalculate the PCs with every new data point or just project it, hoping you have enough samples to have a good estimate of the covariance/correlation matrix? I think that the second seems "dangerous" as some special causes, resulting in extreme outliers, could really become a problem long term ... If you think the first one makes sense... Assuming normality, how many samples would one need to have a reliable estimate of a p-Dimensional covariance/correlation matrix ...

0 Answers0