If a class overlay is too high for one to another, is it possible to do classification?

Question

This is a figure of PCA Map of several class data that i am trying to classify:

By looking at it, I wanted to be sure if this is not really applicable to be classified? (since it has a high overlap rate) or is there anyway I can do to somehow separate the class (so it minimize the class overlay) and make it more favorable to be classified?

PC Information:

Edit: Figure of ICA Map (alternatives)

PCA for dimension reduction can harm the predictive power of your model. See: https://stats.stackexchange.com/questions/448200/is-pca-always-recommended/448203#448203 — Sycorax, Jan 30 '23 at 02:13
@Dave I have 8PCs in total for this.. I mapped the first two.. — Jovan, Jan 30 '23 at 02:26
@Sycorax Thanks for this, so maybe try to use another dimension reduction tools instead of PCA? — Jovan, Jan 30 '23 at 02:31
Why use dimension reduction at all? What problem are you trying to solve, and how does dimension reduction help to solve it? — Sycorax, Jan 30 '23 at 02:49
@Sycorax I am actually trying to visualize the data i am trying to classify. recently i directly build models like decision trees, bagged tree, random forest, but the prediction tends to overfit the result.. i wonder if these overlaps in dimension reduction has any relation to do with the overfitting — Jovan, Jan 30 '23 at 03:08

Dave · Accepted Answer · 2023-01-31T12:11:13.553

(This answer will set aside the issues related to PCA or other dimension-reduction techniques.)

It is possible that your classes simply are not separable on your features. For instance, if category $1$ has a bivariate distribution of $N\left(\begin{pmatrix}0\\0\end{pmatrix},\begin{pmatrix}1&0\\0&1\end{pmatrix}\right)$ and category $2$ has a distribution of $N\left( \begin{pmatrix}0\\0\end{pmatrix},\begin{pmatrix} 1&0\\ 0&2 \end{pmatrix} \right)$, but the only feature you observe is the first one where both categories have standard normal distributions, then you cannot reliably distinguish between the classes. In more formal language, the posterior probability of either category, given an observation of that first feature, is equal to the prior probability of that category: $P(\text{category c}\vert\text{feature 1})=P(\text{category c})$.

In a more realistic scenario where the feature distributions of the two categories are not exactly the same, they might be different but only slightly. In such a scenario, there is slight ability to distinguish between the two classes but only slight.

If your classes have considerable overlap that precludes strong performance, yet you force a model to separate them by fitting to coincidences in the training data, then that absolutely could result in overfitting and having terrible out-of-sample performance. A picture on the Wikipedia article on overfitting shows this to some extent.

Imagine if the red and blue dots overlapped considerably. Yes, it would be possible to snake around and get most of the points on the correct side of some kind of decision boundary (you could get everything right if you drop the requirement for the decision boundary to be continuous), but that likely would have overfit to coincidences in the data rather than the real trend.

If a class overlay is too high for one to another, is it possible to do classification?

1 Answers1