Nonlinear Sparse PCA

Question

Given data $x_1, \dots, x_n \in\mathbb{R}^d$, I am looking for a nonlinear dimensionality reduction technique $f: \mathbb{R}^d \rightarrow \mathbb{R}^q$ that only uses a limited number of dimensions to represent the data in $\mathbb{R}^q$. In the linear setting, the natural candidate is sparse PCA, which limits the number of variables used to determine a low-dimensional representation of the data (i.e. $f(x) = Ax$ where the matrix $A \in \mathbb{R}^{q \times d}$ has sparse rows). Arguably, the natural nonlinear generalization of PCA is Kernel PCA. However, sparse Kernel PCA does not limit the number of input dimensions used to construct the representations. Instead, it limits the number of points that are used to construct the nonlinear representation. This leads me to my question: Which nonlinear dimensionality reduction methods impose sparsity on the variables used for the transformation?

Could you explain the point of using only a subset of your variables for dimensionality reduction? — whuber, Aug 18 '23 at 15:07
@whuber Ultimately we have to leave it up to the OP to clarify what they want. One possible application is estimating out-of-sample reconstruction error. If you plan on re-using the transformation then in some applications it may be useful to know how much information will be lost on future data. — Galen, Aug 18 '23 at 16:28
@Galen But wouldn't one normally be able to anticipate which variables will be available in the future? I cannot see how letting some algorithm choose the variables to use could possibly be helpful for applications to future data. — whuber, Aug 18 '23 at 17:13
@whuber To your direct question: yes. My thinking behind my last comment was about using PCA as a lossy data compression algorithm. If you're working with a high throughput production system you can save time or space at the cost of losing some accuracy by using compressed data. Knowing how lossy the compression is can inform whether it will be sufficiently useful. — Galen, Aug 18 '23 at 17:28
Cauchy Principal Component Analysis, a 2014 paper by Xie and Xing does a nice job of reviewing various approaches to sparse PCA. https://arxiv.org/abs/1412.6506 — user78229, Aug 18 '23 at 22:04
Sparse PCA provides better interpretability when dimensions are large. Moreover, if using information for every input axis comes with some cost, then a sparse mapping would be preferred. I am hoping to achieve the same with whatever sparse nonlinear PCA exists. — Claudio Moneo, Aug 20 '23 at 08:00

Nonlinear Sparse PCA

0 Answers0