What is the influence of transforming explanatory variables in regression?

Question

Let's say I have a participant who performs a test and I am measuring EMG (multiple trials/epochs). This participant has to come in for multiple sessions and every time the EMG is measured. Now I want to find out if features that are found in EMG are informative for predicting the result of the test.

For every signal, I am therefore extracting multiple features which are then used for multiple linear regression to predict the score of the test. I want to compare the weights that are assigned to every feature to find out what features have the most influence on the prediction.

Now the problem is that for outlier rejection, I remove the data points that lie further than 3 standard deviations away from the mean, if the feature is Normally distributed. If the feature is not Normally distributed, I first use a transformation (Yeo-Johnson) to obtain a Normal distribution, and then use the same outlier removal technique. Can the weights then still be compared?

For e.g., feature C in EMG signal 1, the transformation does not have to be applied (because it is already Normally distributed). For feature C in EMG signal 2 is also does not have to be applied. But for feature C in EMG signal 3 the transformation is applied.

Then what is the influence on the regression weights for that feature? Am I still able to compare the weights for a feature if it was transformed in some signals, but not in others?

Or: could a method be to just use the transformation to find out what the outliers are, but not keep the transformed data?

Welcome to CV. It sounds like you are trying to solve several problems at once: outlier detection; treating with the outliers; re-expressing the variables; fitting a regression model; and interpreting the coefficients. It would be (far) better to follow a principled approach to these problems. See https://stats.stackexchange.com/a/7933/919 for an overview; https://stats.stackexchange.com/a/3530/919 about re-expressions; https://stats.stackexchange.com/a/35717/919 for some practical advice; and https://stats.stackexchange.com/search?q=model+selection+score%3A50 about model selection. — whuber, Jun 07 '23 at 15:23

What is the influence of transforming explanatory variables in regression?

0 Answers0