Working through digit recognition on the MNIST data set, one of the example problems reduces the dimensionality of the feature set, which is 768 samples, down to 18 using PCA. For illustration purposes, we project the input examples onto the principal components.
- What exactly does this projection mean, how should it be interpreted intuitively?
- What is the distinction between a principal component and the projection of the input sample onto a principal component?