Principal Component Analysis (PCA) in Python numpy using the Snapshot method

Question

I am trying to implement PCA analysis using numpy to mimic the results from sklearn's decomposition.PCA classifier.

I am using as input vectors of N flattened images of fixed size M = 128x192 (image dimensions) joined horizontally into a single matrix D of dimensions MxN

I am aiming to use the Snapshot method, as other implementations (see here and here) crash my build while computing np.cov, since the size of the covariant matrix would be C = D(D^T) = MxM.

The snapshot method first computes C_acute = (D^T)D, then computes the (acute) eigenvectors and values of this NxN matrix. This gives eigenvectors that are (D^T)v, and eigenvalues that are the same.

To retrieve the eigenvectors v from the (acute) eigenvectors, we simply do v = (1/eigenvalue) * (D(v_acute)).

Here is the reference implementation I am using adapted from this SO post (which is known to work):

class TemplatePCA:
  def __init__(self, n_components=None):
    self.n_components = n_components

  def fit_transform(self, X):
    X -= np.mean(X, axis = 0)  
    R = np.cov(X, rowvar=False)
    # calculate eigenvectors & eigenvalues of the covariance matrix
    evals, evecs = np.linalg.eig(R)

    # sort eigenvalue in decreasing order
    idx = np.argsort(evals)[::-1]
    evecs = evecs[:,idx]
    # sort eigenvectors according to same index
    evals = evals[idx]
    # select the first n eigenvectors (n is desired dimension
    # of rescaled data array, or dims_rescaled_data)
    evecs = evecs[:, :self.n_components]
    # carry out the transformation on the data using eigenvectors
    # and return the re-scaled data
    return -1 * np.dot(X, evecs) #

Here is the implementation I have so far.

class MyPCA:
  def __init__(self, n_components=None):
    self.n_components = n_components

  def fit_transform(self, X):
    X -= np.mean(X, axis = 0)  

    D = X.T
    M, N = D.shape

    D_T = X # D.T == (X.T).T == X
    C_acute = np.dot(D_T, D)

    eigen_values, eigen_vectors_acute = np.linalg.eig(C_acute)

    eigen_vectors = []
    for i in range(eigen_vectors_acute.shape[0]): # for each eigenvector
      v = np.dot(D, eigen_vectors_acute[i]) / eigen_values[i]
      eigen_vectors.append(v)
    eigen_vectors = np.array(eigen_vectors)

    # sort eigenvalues and eigenvectors in decreasing order
    idx = np.argsort(eigen_values)[::-1]
    eigen_vectors = eigen_vectors[:,idx]
    eigen_values = eigen_values[idx]

    # select the first n_components eigenvectors
    eigen_vectors = eigen_vectors[:, :self.n_components]

    # carry out the transformation on the data using eigenvectors
    # return the re-scaled data (projection)
    return np.dot(C_acute, eigen_vectors)

The reference text I am using notes that:

The eigenvector is now (D^T)v, so to do face detection we first multiply our test image vector by (D^T) before projecting onto the eigenimages.

I am not sure whether it is possible to retrieve the exact same principal components (i.e. eigenvectors) using this method, and it would seem impossible to even get the same eigenvectors back, since the size of the eigen_vectors_acute is only (4, 6) (meaning there are only 4 vectors), compared to the other method where it is (6, 6) (there are 6).

Running both on an input:

x = np.array([
        [0.387,123, 789,256, 4878, 5.42],
        [0.723,9.78,1.90,1234, 12104,5.25],
        [1,123, 67.98,7.91,12756,5.52],
        [1.524,1.34,23.456,1.23,6787,3.94],
    ])


# These two are the same
print(sklearn.decomposition.PCA(n_components=3).fit_transform(x))

print(TemplatePCA(n_components=3).fit_transform(x))

# This one is different
print(MyPCA(n_components=3).fit_transform(x))

Output:

[[ 4282.20163145   147.84415964  -267.73483211]
 [-3025.62452358   683.58580386    67.76941319]
 [-3599.15380006  -569.33984612  -148.62757658]
 [ 2342.57669218  -262.09011737   348.5929955 ]]

[[-4282.20163145  -147.84415964   267.73483211]
 [ 3025.62452358  -683.58580386   -67.76941319]
 [ 3599.15380006   569.33984612   148.62757658]
 [-2342.57669218   262.09011737  -348.5929955 ]]

[[ 3.35535639e+15, -5.70493660e+17, -8.57482740e+17],
  [-2.45510474e+15,  4.17428591e+17,  6.27417685e+17],
  [-2.82475918e+15,  4.80278997e+17,  7.21885236e+17],
  [ 1.92450753e+15, -3.27213928e+17, -4.91820181e+17]]

Principal Component Analysis (PCA) in Python numpy using the Snapshot method

0 Answers0