I am doing fixed effect regression after conducting hausmann test on panel data.
I received significant results in line with what's expected for my model. My data set has around 6000 observations and is not normally distributed. If do log transform, I get normal distribution for some variables only. For both the transformed and original data set, I get significance nonetheless.
A professor who is helping me with the wrong told me thsr normality is not an issue in my case due to the high number of observations. However, am concerned thatI have to justify this for my report. He asked to check up Gauss Markov theorem which essentially implies - "Without the assumption of normality you can also prove efficiency in the class of linear, unbiased estimators via the Gauss-Markov theorem.
If the errors are normally distributed, you can also establish that the least-squares estimators coincide with the maximum likelihood estimators. " - However my errors are also not normally distributed. So my prof said that it's still not an issue considering the large dataset. He is very reputed and definitely knows what he's talking about. However, the professor who is in charge of correcting my paper will expect some justification as to why I prOceeded with the original data and not log transformed one.
Please help as any guidance would be greatly appreciated. I am nearing the deadline so the thought of doing it all again with log transformed data for the sake of it is also scary and I'm looking to avoid that situation if there's another way to justify, or cite some papers etc.
I am also confused how to address 1 and 2 below, or is it better not to address? Assumptions:
- Linear
- Normality
- Multicollonearity- addressed with correlation matrix
- Autorelation and heterodeskacity- addressed in STATA with vc cluster command
The data is S&P 500 ESG scores effect on firm performance (4 independent variables and 3 dependent variables - each with around 6000 values for 4 regression)