I'm looking for a way to create a matrix with a known number of signals and background error for use in PCA. The following example attempts this by combining signals of known amplitude, followed by the addition of error, also of prescribed amplitude. I'm a bit unsure as to whether I can definitively say that only X PCs are expected to be significant due to these definitions.
Ideally, I would like to know this precisely, but I may need to ensure that the true signals are completely orthogonal to one another? I'm also unsure as to whether I have correctly prescribed the desired signal amplitude to be identified by the PCA.
# make data ---------------------------------------------------------------
set.seed(123456)
m <- 200 # row dimension
n <- 50 # column dimension
s <- 15 # number of signals
d <- rev(seq(s)) # amplitude of signals
e <- 8 # amplitude of error
make row signals
x <- do.call("cbind", lapply(d, function(x){scale(cumsum(rnorm(m)))}))
image(x)
plot(x[,2])
make column signals
y <- do.call("cbind", lapply(seq(s), function(x){scale(cumsum(rnorm(n)))}))
image(y)
plot(y[,2])
combine into field
Z <- matrix(0, nrow = m, ncol = n)
for(i in seq(s)){
tmp <- as.matrix(x[,i]) %% t(as.matrix(y[,i])) d[i]
Z <- Z + tmp
}
signal variance
sig.var <- apply(scale(Z, center = T, scale = F), 2, function(x){sum((x^2)/(length(x)-1))})
add error
err <- array(rnorm(length(Z), sd = e), dim = dim(Z))
error variance
err.var <- apply(scale(err, center = T, scale = F), 2, function(x){sum((x^2)/(length(x)-1))})
full field
Z <- Z + err
image(Z)
total variance
tot.var <- apply(scale(Z, center = T, scale = F), 2, function(x){sum((x^2)/(length(x)-1))})
sum(err.var) / sum(tot.var)
pca ---------------------------------------------------------------------
P <- prcomp(Z)
plot(P$sdev[1:n]^2, log = "y")
sum(P$sdev^2); sum(tot.var)
abline(v = e-1+0.5, lty = 2, col = 2) # expected signal cut-off
