1

at first I want to mention that I am fully aware of this question here as well as the answers. Still, things won't work out as intended (using R).

I have a lot of hourly rainfall data that include zeros (so those are natural zeros, no errors, no missing data, no over-sensitivity whatsoever). Most of my hourly rainfall data is therefore heavily skewed towards the zeros. For some tests I need, however, +- a normal distribution. I tried fiddling around with log10(rain+1) or log1p(rain) without success. Based on the previous question mentioned above I tried to solve this with a Box-Cox transformation. There are several packages for R that are capable of doing so, but none is working the way I need it:

  • BoxCox from forecast library: really strange histogram with high negative values
  • BoxCox from geoR library: locks the lower bound to 0, but still far from normality
  • BoxCox from EnvStats library: Error: All non-missing, finite values of 'x' must be positive

I have some reproducible code here with all my attempts:

rain <- c(0.5,0.0,2.9,3.7,2.8,0.9,0.0,3.4,0.0,1.7,0.0,9.9,0.0,0.7,0.1,0.0,0.0,0.7,0.0,0.9,16.4,0.2,0.8,0.0,1.8,0.1,11.0,9.9,3.9,0.6,0.0,8.9,4.8,0.0,0.0,1.8,0.8,3.4,0.0,0.0,0.3,9.1,6.6,0.3,0.0,11.7,0.0,0.2,1.1,1.7,0.0,1.0,0.0,0.5,0.0,3.6,3.4,1.3,0.5,2.1,1.8,12.1,0.0,0.0,2.3,2.5,0.2,0.0,0.0,0.0,3.2,0.1,1.4,1.8,9.0,3.1,4.8,0.0,1.3,0.0,8.7,1.7,0.0,2.3,0.0,0.0,0.0,0.0,1.0,4.6,1.9,0.0)
hist(rain)
qqnorm(rain)
qqline(rain)

#### Log transform
rain.log10 <- log10(rain+1)
hist(rain.log10)

rain.log1p <- log1p(rain)
hist(rain.log1p)

##### BoxCox from forecast lib

library(forecast)
lda <- BoxCox.lambda(rain, method=c("guerrero"))

trans.rain <- BoxCox(rain,lda)
hist(trans.rain)

#### BoxCox from geoR lib

library(geoR)
ml <- boxcoxfit(rain, lambda2=TRUE)
ml$lambda
trans2.rain <- dboxcox(rain, lambda=ml$lambda[1], lambda2=ml$lambda[2])
#trans2.rain <- dboxcox(rain, lambda=ml$lambda[1], lambda2=NULL)
hist(trans2.rain)
qqnorm(trans2.rain)
qqline(trans2.rain)

#### BoxCox from EnvStats lib

library(EnvStats)
boxcox(rain)
# Error: All non-missing, finite values of 'x' must be positive
GeoEki
  • 111
  • 3
    For what "tests [do you] need... +- a normal distribution"? Your data have "natural zeros, no errors, no missing data, no over-sensitivity whatsoever", so you should use tests & models that are appropriate for those data, not try a transformation so that you can shoehorn something inappropriate. – gung - Reinstate Monica Feb 28 '16 at 17:41
  • Ordinary kriging (if the empirical distribution of the data is skewed then the kriging estimators are highly sensitive to a few large data values); but mainly for regression kriging. – GeoEki Feb 28 '16 at 19:14
  • 1
    Trying to transform to normality any sample that's heavily concentrated on a single value is basically hopeless. What exactly are you trying to figure out? There is probably a reasonable nonparametric procedure that would help. – dsaxton Feb 28 '16 at 19:57
  • 1
    There aren't any nonparametric versions of regression kriging (aka universal kriging or kriging with drift), but there are versions that are appropriate for such data (such as spatial GLMs). But please notice at the outset that the assumptions for this form of kriging apply to the residuals, not to the data themselves. – whuber Feb 28 '16 at 20:03
  • Yeah I'm aware that those assumptions are only relevant for the residuals. I was just curious if I could also improve my variogram (which I'm not very satisfied with) with transformed data. But the answer from @dsaxton might be true, that heavy concentration on a single value is probably hopeless for such a transformation. – GeoEki Mar 01 '16 at 11:15

0 Answers0