0

I am trying to extract only the important neurons from the latent space of an Autoencoder to be converted later to a pattern for a model pattern recognizer.

PCA Loadings helps in finding the highest correlation coefficient on the neurons of the latent space.

Thus, the output of the PCA is not used, only the eigenvalues and eigenvectors to extract which neurons correlate more to the highest eigenvalues and pick only those neurons.

For Example, Cifar-10 dataset.

  1. Extracting the latent Space of the Autoencoder.
  2. Then do PCA and extract the loadings.
  3. Pick only the neurons with high correlation to the principal components, with explained variance above 90%.

My Questions:

  1. Before doing PCA on the latent space, do/do not normalize the data?
  2. Is this approach of employing PCA wrong to start with?
  3. There are multiple PCA variations like SparsePCA, KernelPCA, and RobustPCA. Is one of variation might be more beneficial for this task?

2 Answers2

0
  1. You should normalize the latent space data to have 0 mean and unit covariance before PCA. This will reduce the impact of latent variables that are high in scale and variance. See this post here: PCA on correlation or covariance?

  2. This seems like an interesting approach! The key here would be how to translate the PCA importances to the neurons — since each PC is a linear combination of neurons, how are you going to transfer these importances based off of the weights in the linear combination? The interesting exploration will be here.

  3. I would first run through the process in (2), make observations, and then read into further PCA methods to see if they align better with what you learned in (2).

0

I have found that the PCA's loadings are more interpretable if the number are first normalized.

Then I built a voting system, where I selected the components where the explained variance ratio (EVR) is >=90%, and took their respective explained variance (EV) then normalized the number to sum up to 1.

The Loadings are approximately between [-1, 1], because of the normalization and could be interpreted as correlation coefficient, thus each neuron contributed more than 0.20 to the selected principal components (in my case 8 PCs have more than 90% EVR) get a vote. I transformed the loadings to absolute values, because the negative correlation provide meaning too.

These votes are like a matrix where the columns are the PCs and the rows are the neurons. The values are either 0 or 1 (vote, no vote). Then, multiplying the normalized EV and sum the weights of the votes for each neuron, yield how the weight of each neuron representing his contribution to PCA. The maximum weight a neuron can has is 1, and the lowest is 0.

Now it is just a matter of selecting the number of neurons. For me, I did two splits:

  • Greater than the mean: where I select all neurons that contributed more than the mean of weights.
  • Top Third: where I sort the weights in descending order and select only top third.

Example: if my last hidden layer has 80 neurons, top third will select top 26 neurons. On the other hand, greater than the mean might select 40 or not, depending on the weights of the neurons if there are neurons that have very high or very low weights might skew the selection. In my case, the greater than the mean always halved the number +-2.

Finally, I only used the normal PCA provided by scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

I didn't have time to test other flavors of PCA.

P.S: I am not aware of any work that is utilizing the PCA's loadings as I discussed previously. Thus, I don't have any reference. But if there are any, kindly share it here.