Let's say I have a participant who performs a test and I am measuring EMG (multiple trials/epochs). This participant has to come in for multiple sessions and every time the EMG is measured. Now I want to find out if features that are found in EMG are informative for predicting the result of the test.
For every signal, I am therefore extracting multiple features which are then used for multiple linear regression to predict the score of the test. I want to compare the weights that are assigned to every feature to find out what features have the most influence on the prediction.
Now the problem is that for outlier rejection, I remove the data points that lie further than 3 standard deviations away from the mean, if the feature is Normally distributed. If the feature is not Normally distributed, I first use a transformation (Yeo-Johnson) to obtain a Normal distribution, and then use the same outlier removal technique. Can the weights then still be compared?
For e.g., feature C in EMG signal 1, the transformation does not have to be applied (because it is already Normally distributed). For feature C in EMG signal 2 is also does not have to be applied. But for feature C in EMG signal 3 the transformation is applied.
Then what is the influence on the regression weights for that feature? Am I still able to compare the weights for a feature if it was transformed in some signals, but not in others?
Or: could a method be to just use the transformation to find out what the outliers are, but not keep the transformed data?