I have found that the PCA's loadings are more interpretable if the number are first normalized.
Then I built a voting system, where I selected the components where the explained variance ratio (EVR) is >=90%, and took their respective explained variance (EV) then normalized the number to sum up to 1.
The Loadings are approximately between [-1, 1], because of the normalization and could be interpreted as correlation coefficient, thus each neuron contributed more than 0.20 to the selected principal components (in my case 8 PCs have more than 90% EVR) get a vote.
I transformed the loadings to absolute values, because the negative correlation provide meaning too.
These votes are like a matrix where the columns are the PCs and the rows are the neurons. The values are either 0 or 1 (vote, no vote).
Then, multiplying the normalized EV and sum the weights of the votes for each neuron, yield how the weight of each neuron representing his contribution to PCA.
The maximum weight a neuron can has is 1, and the lowest is 0.
Now it is just a matter of selecting the number of neurons.
For me, I did two splits:
- Greater than the mean: where I select all neurons that contributed more than the mean of weights.
- Top Third: where I sort the weights in descending order and select only top third.
Example: if my last hidden layer has 80 neurons, top third will select top 26 neurons. On the other hand, greater than the mean might select 40 or not, depending on the weights of the neurons if there are neurons that have very high or very low weights might skew the selection. In my case, the greater than the mean always halved the number +-2.
Finally, I only used the normal PCA provided by scikit-learn:
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
I didn't have time to test other flavors of PCA.
P.S: I am not aware of any work that is utilizing the PCA's loadings as I discussed previously. Thus, I don't have any reference. But if there are any, kindly share it here.