0

I am trying to build a linear regression model.

I have some high cardinal categorical features on which I want to apply target encoding. But my target (real-valued) variable distribution is highly right skewed, so I will apply some transform to get rid of skew.

Which of the following approach is sensible :

  1. I should transform my target variable first and then apply target encoding on categorical feature based on transformed target.

  2. I should apply target encoding on categorical feature based on original target. After that I should apply skew removal on my target variable.

Thanks in advance..

  • What do you mean by transforming a skewed categorical distribution? – Dave Mar 24 '21 at 17:14
  • My target variable is not categorical. It is real valued. – Sandeep Maurya Mar 24 '21 at 17:17
  • 1
    Depending on what you're doing, the transformation might not be so important; we like normal residuals, not a normal pooled distribution of the response variable. However, how does the category to which an observation belong depend on the transformation? – Dave Mar 24 '21 at 17:20
  • I wish to train a linear regression model using this. As I learnt, if input features as well as target variable has gaussian-like distribution then Linear models tends to perform better. – Sandeep Maurya Mar 24 '21 at 17:27
  • @SandeepMaurya You're likely looking at the histogram of the outcome, which is the marginal distribution of the outcome. The assumption of normality is about the conditional distribution. See my answer here and the referenced answer therein. – Demetri Pananos Mar 25 '21 at 02:32

0 Answers0