1

Sometimes we encounter data that is uncorrelated. Specifically, from the correlation matrix we observe that the target variable shows low or no correlation with any of the features.

To provide context, I am working on a predictive maintenance project. I am asking in relation to tabular data containing continuous or categorical variables. My question is concerned with such type of data for general applications, not specifically PdM.

How to implement machine learning or deep learning on such type of data?

Do we need to perform any specific types of pre-processing? Are there algorithms that are suitable for such problems?

  • What is your belief, i.e., what are you trying to show or understand? What about data format, examples? You need to state what you are trying to do first, then worry about the methods required for answering the question. Why ML methods? These are typically employed for pattern recognition and class discovery (clustering), or class prediction (classification analysis). – wjktrs Mar 14 '24 at 14:31
  • Note that correlation looks at the relation between only two variables at the same time. If you have a target variable and several features, there may still well be a good way to predict the target from all the features taken together, and it is even possible that a standard learning approach will find it (as any standard approach allows for situations in which the correlations don't show what goes on taking all features into account together). So I'd start by trying out your favourite learner regardless. It may just work fine. – Christian Hennig Mar 14 '24 at 22:44
  • "It may just work fine." - Another possibility is that it won't work fine, but actually nothing is possible and nothing will help as your data is just noise. There are also possibilities "in between" and situations in which specific learners will work and others won't, but just from your description it is not clear that anything is particularly more promising here than anything standard. – Christian Hennig Mar 14 '24 at 22:47
  • is there something about my s seer that you find lacking? Perhaps I can address it, but what I wrote seems to address both the original question and the edit. – Dave Mar 15 '24 at 11:39
  • @Dave Your answer is very appropriate and thorough. I edited the question just to provide some context to viewers. – Avijit Jaiswal Mar 15 '24 at 11:51

1 Answers1

3

In some sense, you already have analyzed the data. After all, you determined that that there is minimal (Pearson) correlation between the target and the proposed features. That sure seems like analysis to me!

If you still want to make predictions of that outcome of interest, there are a few considerations.

  1. Many features with only small correlations with the outcome might wind up being strong predictors of the outcome. The simulation below demonstrates such a situation, where the regression adjusted $R^2$ is very high, exceeding $0.98$. However, no feature has a correlation magnitude of even $0.1$ with the target.
set.seed(2024)
N <- 10000
p <- 500
X <- matrix(rnorm(N*p), N, p)
B <- (rbeta(p, 1/3, 1/3) - 0.5)
e <- rnorm(N, 0, 1)
Ey <- X %*% B
y <- Ey + e
ALL <- cbind(X, y)
L <- lm(y ~ X)
summary(L)$adj.r.squared # Adjusted R^2 of 0.9865925
summary(cor(ALL)[1:p, p + 1]) # Never exceed magnitude of 0.1
  1. Various transformations of the original features might yield much stronger predictors of the outcome. For instance, the correct relationship might be quadratic. If you haven’t included a squared term, you will miss that relationship. To capture nonlinearity, Frank Harrell’s Regression Modeling Strategies textbook advocates for spline basis functions. Interactions between features and transformations of those features are possible, too, even in (generalized) linear models. Various machine learning techniques like neural networks will start to figure out nonlinear patterns and interactions between variables, given enough data, with the caveat that they can overfit to the noise when they have such flexibility (ditto for super-flexible (generalized) linear models, which is why that Harrell book talks about deciding on how much flexibility you can afford, given the data). Neural networks can even be seen as layers of feature engineering followed by a (generalized) linear model on those engineered features.

The lack of correlation between features and the target hardly means your prediction problem is hopeless.

Dave
  • 62,186