1

I have been working with a set of data which is set in an engineering discipline. However my aim is predictive in nature, i.e., I need to get the relationship between the parameters as well as predict the variable of interest using its dependent features.

I am aware of machine learning techniques like ANN, random forests, ensemble techniques with which I can predict the given value.

$$Y = a_1X_1 + a_2X_2 +...$$

However I started out with a simple regression (multiple regression) $(n\gg p)$ and it gave me a not too high residual error. I am certain that these errors can be further minimized using more complicated techniques, and this is because many of the features $(X_1, X_2, \ldots)$ share a non-linear relationship with the variable $(Y)$ to be predicted. However is there any way using regression I can prove / see whether any of the features are related to the predictor linearly?

Another concept which bothers me is that if I was to change my regression to include non linear feature terms as shown below, would that technically still be a linear regression or would the model (regression model) start to capture some of the non linear effects of the model

$$Y = a_1X_1 + a_2X_1^2 + a_3X_2 + a_4X_2^2 + \ldots$$

1 Answers1

2

Since I don't have enough points to comment yet, I'm going to post this as an answer.

Regression with nonlinear functions are performed all the time! In Bishop's pattern recognition book, he talks about them as "basis" functions. This is a common way of talking about them, such as in this tutorial.

Also, in a more machine-learning/engineering-for-results perspective as opposed to a theoretically-justified perspective, basis functions are just feature engineering on your data. Feature engineering is a major cornerstone of machine learning. In fact, intelligent features will often take the cake in competitions, rather than better algorithms.