Non linearity in data using regression

Question

I have been working with a set of data which is set in an engineering discipline. However my aim is predictive in nature, i.e., I need to get the relationship between the parameters as well as predict the variable of interest using its dependent features.

I am aware of machine learning techniques like ANN, random forests, ensemble techniques with which I can predict the given value.

$$Y = a_1X_1 + a_2X_2 +...$$

However I started out with a simple regression (multiple regression) $(n\gg p)$ and it gave me a not too high residual error. I am certain that these errors can be further minimized using more complicated techniques, and this is because many of the features $(X_1, X_2, \ldots)$ share a non-linear relationship with the variable $(Y)$ to be predicted. However is there any way using regression I can prove / see whether any of the features are related to the predictor linearly?

Another concept which bothers me is that if I was to change my regression to include non linear feature terms as shown below, would that technically still be a linear regression or would the model (regression model) start to capture some of the non linear effects of the model

$$Y = a_1X_1 + a_2X_1^2 + a_3X_2 + a_4X_2^2 + \ldots$$

Yes, you can add polynomial terms (eg, squared terms) to see if the relationship is curvilinear, & yes, the resulting model is still a 'linear regression model' in the technical sense of the term. If may help to read my answer here: Why is polynomial regression considered a special case of multiple linear regression? Having said that, it there anything left of your question to answer? — gung - Reinstate Monica, Mar 11 '15 at 02:34
(1) I might add that restricted cubic splines (there are other alternatives) are often used instead. Polynomials are less common (2) MARS/Earth package in R may be of interest (3) with non-linear terms there is always a concern for overfitting and this should be given consideration prior to analysis — charles, Mar 11 '15 at 03:24
Any advise/ links to examine the residuals and predict non linearity? I have searched extensively on a guide for this but I am unable to get one which is satisfactory and covers all that one can do with the residuals! Cheers — Chiranth Hegde, Mar 11 '15 at 16:47

score 2 · Accepted Answer · answered Mar 11 '15 at 04:54

Since I don't have enough points to comment yet, I'm going to post this as an answer.

Regression with nonlinear functions are performed all the time! In Bishop's pattern recognition book, he talks about them as "basis" functions. This is a common way of talking about them, such as in this tutorial.

Also, in a more machine-learning/engineering-for-results perspective as opposed to a theoretically-justified perspective, basis functions are just feature engineering on your data. Feature engineering is a major cornerstone of machine learning. In fact, intelligent features will often take the cake in competitions, rather than better algorithms.

Non linearity in data using regression

1 Answers1