6

PCA is known to be quite sensitive to outlier noise (and this is why several Robust PCA techniques exists.) However, I am looking for a concrete example of sensitivity of PCA to adversarial noise that is a synthetic setting in which we can show that an adversary can severely affect the quality of eigenvectors obtained. Can anyone provide a simple example for this or, better still, provide a reference?

I am particularly looking at PCA of graph Laplacians where a malicious adversary can add a small fraction of nodes and/or edges. Any insights will be most appreciated. Thanks!

Rajhans
  • 99
  • Can you tell more in the question about adversarial noise? It seems to be a specific term but not often used. Expain what it is. – ttnphns Nov 23 '12 at 07:06
  • It could be related to the term adversary configurations of outliers (the configuration of outliers that is causing the largest bias for a given rate of contamination...for many class of estimators this is well defined). – user603 Nov 23 '12 at 13:12
  • While there are a couple of different notions of adversary in terms of its "strength", typically adversarial noise means that there is an "adversary" who wants to hurt your process and can perturb your data in a certain way to do so. So in my case, it means that there is an adversarial process which wants to hurt the quality of the eigenvectors produced over my data. This process can generate some \epsilon (bad) fraction of the data while being aware of the remaining 1-\epsilon (good) data points; it is really a worst case situation. – Rajhans Nov 26 '12 at 01:50

1 Answers1

4

Here is one for you: the 10 percent of outliers have so much influenced the PCA that the 1st principal component is now nearly orthogonal to its true value.

library(MASS)
n<-50
p<-100
eps<-0.1
x0<-mvrnorm(n-floor(n*eps),rep(0,p),diag(p))
x1<-mvrnorm(floor(n*eps),rep(100,p),diag(p)/100)
O0<-prcomp(x0)
O1<-prcomp(rbind(x0,x1))
O1$rotation[,1]%*%O0$rotation[,1]
user603
  • 22,585
  • 3
  • 83
  • 149
  • Thanks for the response! I see that adding a low-variance noise to high-variance signal can skew PCA quite a bit. However, my focus is on showing something similar for noise in graph Laplacians -- it seems a little bit more difficult for that case; however, in that case, I suppose playing with the eigengap can yield something. Thanks again. – Rajhans Nov 27 '12 at 03:36