Natural logarithm transfomation and zeroes

Question

I am using Stata 13 to estimate a simple regression. Given a rather positive skew of a few of my covariates, I figured to ln-transform the variables. However, I have a substantial amount of zeroes in the covariates. Ln-transforming the variables thus leads to many missings.

I came across several ways of handling the issue:

Replacing zeroes to a small value (e.g. 0.00000001)
Nottreating the issue
Dummying the zeroes in a new variable

I do not like option 1 and 2. Replacing feels somewhat random and just wrong. Not treating the issue results in missing information. I thus prefer option 3. But it does not seem to work for me so far. Here is an example of what I do.

clear
clear matrix
set more off

sysuse nlsw88

hist tenure

gen ln_tenure=ln(tenure)

gen null_tenure = 1 if ln_tenure==. & tenure!=.


reg wage grade tenure south
reg wage grade ln_tenure south

reg wage grade ln_tenure null_tenure south

The nlsw88 example dataset provides us with 51 observation with tenure=0. Simply regressing wage on grade with tenure is hence based on 51 more observations compared to regressing wage on grade and ln_tenure.

To not miss out on the information at tenure=0, I created the dummy null_tenure=1 for all tenure=0. Obviously, null_tenure gets omitted when introducing it to the regression.

I have two questions:

Does this way of handling missings created by ln-transforming data make sense?
If so, How can I circumvent the dummy to be omitted ?

/R

One common fallacy should be mentioned straight away. Using extremely small positive constants rather than zero, rather than being a very conservative change to the data, is a drastic change to the data. This is marginally easier to see with logs base 10: a constant $10^{-6}$ becomes $-6$ on log 10 scale, one $10^{-9}$ becomes $-9$ on log scale; in short, the smaller the constant, the bigger the negative outliers created and on log scale too! — Nick Cox, May 26 '15 at 09:25
Exactly! Merely replacing is not an option (allthough it seems to be regularly done). — Rachel, May 26 '15 at 09:27
My point was that this particular replacement is unsound, but as the cited thread indicates there are other replacements that preserve information. — Nick Cox, May 26 '15 at 09:28
@NickCox, thanks for the thread which if course is very close to what I am asking. In fact, it answers question (1) of this thread. Answering the more technical question (2) however remains troublesome for me. whuber suggested to generate a second variable that takes the value 1 if the ln-transformed variable was initally zero (see here http://stats.stackexchange.com/a/1795/77419) - but how can I prevent the dummy variable of being omitted? — Rachel, May 26 '15 at 09:49
The dummy gets omitted.
`clear

sysuse nlsw88

gen ln_tenure=ln(tenure)

gen null_tenure = 0
replace null_tenure=1 if tenure==0

reg wage grade ln_tenure null_tenure` — Rachel, May 26 '15 at 10:03
The question now seems to ne that when you try to implement that procedure (in Stata?) it doesn't work for you. I suggest that is (a) difficult to answer without worked example and code and error report (b) off-topic here if you did that; Statalist is now the better forum for that. If you think @whuber's suggestion was incorrect in principle, then you should be raising that in the original thread. — Nick Cox, May 26 '15 at 10:06
Stata code is not universally intelligible or reproducible here! Please note my previous comment, which crossed with yours. Nevertheless your error is simple: you must replace the zeros in tenure with 1s before transformation by ln(). Ask on Statalist and I or others will expand. — Nick Cox, May 26 '15 at 10:08
Thanks! I cross-posted the question with a more technical focus in Statalist http://www.statalist.org/forums/forum/general-stata-discussion/general/1295600-how-to-create-a-dummy-that-indicates-original-zeroes-before-ln-transformation — Rachel, May 26 '15 at 10:29

Natural logarithm transfomation and zeroes

0 Answers0