1

This is a figure of PCA Map of several class data that i am trying to classify:

enter image description here

By looking at it, I wanted to be sure if this is not really applicable to be classified? (since it has a high overlap rate) or is there anyway I can do to somehow separate the class (so it minimize the class overlay) and make it more favorable to be classified?

PC Information:

enter image description here

enter image description here

Edit: Figure of ICA Map (alternatives)

enter image description here

Dave
  • 62,186
Jovan
  • 159
  • 1
    What if you use multiple PCs in your model, not just two? – Dave Jan 30 '23 at 01:49
  • 1
    PCA for dimension reduction can harm the predictive power of your model. See: https://stats.stackexchange.com/questions/448200/is-pca-always-recommended/448203#448203 – Sycorax Jan 30 '23 at 02:13
  • @Dave I have 8PCs in total for this.. I mapped the first two.. – Jovan Jan 30 '23 at 02:26
  • @Sycorax Thanks for this, so maybe try to use another dimension reduction tools instead of PCA? – Jovan Jan 30 '23 at 02:31
  • Why use dimension reduction at all? What problem are you trying to solve, and how does dimension reduction help to solve it? – Sycorax Jan 30 '23 at 02:49
  • @Sycorax I am actually trying to visualize the data i am trying to classify. recently i directly build models like decision trees, bagged tree, random forest, but the prediction tends to overfit the result.. i wonder if these overlaps in dimension reduction has any relation to do with the overfitting – Jovan Jan 30 '23 at 03:08

1 Answers1

1

(This answer will set aside the issues related to PCA or other dimension-reduction techniques.)

It is possible that your classes simply are not separable on your features. For instance, if category $1$ has a bivariate distribution of $N\left(\begin{pmatrix}0\\0\end{pmatrix},\begin{pmatrix}1&0\\0&1\end{pmatrix}\right)$ and category $2$ has a distribution of $N\left( \begin{pmatrix}0\\0\end{pmatrix},\begin{pmatrix} 1&0\\ 0&2 \end{pmatrix} \right)$, but the only feature you observe is the first one where both categories have standard normal distributions, then you cannot reliably distinguish between the classes. In more formal language, the posterior probability of either category, given an observation of that first feature, is equal to the prior probability of that category: $P(\text{category c}\vert\text{feature 1})=P(\text{category c})$.

In a more realistic scenario where the feature distributions of the two categories are not exactly the same, they might be different but only slightly. In such a scenario, there is slight ability to distinguish between the two classes but only slight.

If your classes have considerable overlap that precludes strong performance, yet you force a model to separate them by fitting to coincidences in the training data, then that absolutely could result in overfitting and having terrible out-of-sample performance. A picture on the Wikipedia article on overfitting shows this to some extent.

enter image description here

Imagine if the red and blue dots overlapped considerably. Yes, it would be possible to snake around and get most of the points on the correct side of some kind of decision boundary (you could get everything right if you drop the requirement for the decision boundary to be continuous), but that likely would have overfit to coincidences in the data rather than the real trend.

Dave
  • 62,186