I have a dataset that I want to fit according to
$$\log(y) = a + b_1\log(x_1) + b_2\log(x_2) +\cdots + b_k\log(x_k).$$
My statistical package has options to do a linear regression and lognormal. I am not sure which one I should choose.
I have a dataset that I want to fit according to
$$\log(y) = a + b_1\log(x_1) + b_2\log(x_2) +\cdots + b_k\log(x_k).$$
My statistical package has options to do a linear regression and lognormal. I am not sure which one I should choose.
Probably your best bet is just to form two new variables:
ly = log(y)
lx = log(x)
Then you can use those with a regular linear regression.
N = 30;
x = matrix(runif(N*5), ncol=5);
y = runif(N);
X = cbind(y,x);
lX = apply(X, 2, log);
(Note that this would scale up to any number of columns.)
– gung - Reinstate Monica Mar 29 '13 at 21:06Your original model will be non-linear.
$y = cx^b $ $(1)$
If you take the natural log on both sides:
$\ln(y) = \ln(c) + b*\ln(x)$ $(2)$
So, in your model: $\ln(c)=a$
You can run equation 1 with lognormal [actually, it should be log linear] [no transformation of variables needed] or you can run equation 2 with linear regression. To implement later, you need to log transform the x and y variables as mentioned by @gung, i.e. $ly=\ln(y)$ and $lx=\ln(x)$ where $lx$ and $ly$ are the new variables created from $x$ and $y$.
Note that you can't run log-normal or log linear if either your $x$ or $y$ has negative values.
lognormal model. I just know that there exists the term lognormal distribution. if $x$ is a normally distributed then $logx$ is log normally distributed:
a logit model is a case where your outcome takes a value of 1 or 0. I assume that your outcome is a continuous variable.
– Metrics Mar 29 '13 at 21:12Transformation of coordinates
Initial form (linear) ln(y) = a + b*ln(x)
with algebra this becomes power y = exp(a)*x^b = A*x^b
So don't choose lognormal, choose to do a linear fit on transformed coordinates.
Some general "good practices"
There is a lot of "real world" data that in theory fits either a linear analytic form, but in practice this nearly never happens. Things are almost always more complex. The high value things are always more complex.
EDIT: whuber is right. I am expressing this in Engineering terms. Implicit in my notation is that all the expressions are y_approximation where:
y_true = y_approximation + error
Statistics folks consider it rigor to append an epsilon and explicitly indicate that there is error in the expression. The variable they often use to indicate the error is epsilon.