I am using Stata 13 to estimate a simple regression. Given a rather positive skew of a few of my covariates, I figured to ln-transform the variables. However, I have a substantial amount of zeroes in the covariates. Ln-transforming the variables thus leads to many missings.
I came across several ways of handling the issue:
- Replacing zeroes to a small value (e.g. 0.00000001)
- Nottreating the issue
- Dummying the zeroes in a new variable
I do not like option 1 and 2. Replacing feels somewhat random and just wrong. Not treating the issue results in missing information. I thus prefer option 3. But it does not seem to work for me so far. Here is an example of what I do.
clear
clear matrix
set more off
sysuse nlsw88
hist tenure
gen ln_tenure=ln(tenure)
gen null_tenure = 1 if ln_tenure==. & tenure!=.
reg wage grade tenure south
reg wage grade ln_tenure south
reg wage grade ln_tenure null_tenure south
The nlsw88 example dataset provides us with 51 observation with tenure=0. Simply regressing wage on grade with tenure is hence based on 51 more observations compared to regressing wage on grade and ln_tenure.
To not miss out on the information at tenure=0, I created the dummy null_tenure=1 for all tenure=0. Obviously, null_tenure gets omitted when introducing it to the regression.
I have two questions:
- Does this way of handling missings created by ln-transforming data make sense?
- If so, How can I circumvent the dummy to be omitted ?
/R
`clear
sysuse nlsw88
gen ln_tenure=ln(tenure)
gen null_tenure = 0
replace null_tenure=1 if tenure==0
reg wage grade ln_tenure null_tenure`
– Rachel May 26 '15 at 10:03tenurewith 1s before transformation byln(). Ask on Statalist and I or others will expand. – Nick Cox May 26 '15 at 10:08