Modeling the shape of the power function

Question

In cognitive psychology, it has been argued that the learning curve of a skill follows the power function, in which the practice of the skill yields progressively smaller decrements in error. I want to test this hypothesis with the data I have, and to do this I need to fit a power law function to the data.

I heard that the power function can be fit in R by lm(log(y) ~ log(x)). An answer to this post, however, suggests using glm(y ~ log(x), family = gaussian(link = log))~~, and indeed the resulting fit prefers the glm approach~~ (EDIT: See comments. This is not true.).

# Define a function to compute R-squared in linear space 
> rsq <- function(data, fit) {
+ ssres <- sum((data - fit)^2)
+ sstot <- sum((data - mean(data))^2)
+ rsq <- 1 - (ssres / sstot)
+ return(rsq)
}

# generate data and fit a power function with lm() and glm()
> set.seed(10)
> lc <- (1:10)^(-1/2) * exp(rnorm(10, sd=.25)) # Edited
> model.lm <- lm(log(lc) ~ log(1:10))
> model.glm <- glm(lc ~ log(1:10), family = gaussian(link = log))
> fit.lm <- exp(fitted(model.lm))
> fit.glm <- fitted(model.glm)

# draw a graph with fitted lines
> plot(1:10, lc, , xlab = "Practice", ylab = "Error rate", las = 1)
> lines(1:10, fit.lm, col = "red")
> lines(1:10, fit.glm, col = "blue")
> legend("topright", c("original data", "lm(log(y) ~ log(x))", 
+ "glm(y ~ log(x), family = gaussian(link = log))"), 
+ pch = c(1, NA, NA), lty = c(NA, 1, 1), col = c("black", "red", "blue"))

enter image description here

# compute R-squared values for both models
# (EDIT: See comments. These are not good measures to use.)
> rsq(lc, fit.lm)
[1] 0.9194631
> rsq(lc, fit.glm)
[1] 0.9205448

From the $R^2$ values, it seems that the glm model has a better fit. When the procedure was repeated 100 times, it was always the case that the glm model has a higher $R^2$ value than the lm model (EDIT: See comments. $R^2$ is not a good measure for this purpose).

At this point I have two questions.

What is the difference between the lm model and the glm model above? I thought a generalized linear model with the log link function is practically the same as regressing log(y) in general linear models.
I'm trying to compare the fit of the power function against other models (e.g., linear models as in lm(y ~ x)). To do this, I calculate the fitted values in linear space and compare the $R^2$ values computed in linear space, just like I did to compare lm and glm models above. Is this a proper way to compare this type of models?

My final question concerns the estimation of the four-parameter power law function given in this paper (p.186; I modified the notation a little).

$$ E(Y_N) = A + B(N + E)^{-β} $$ where

$E(Y_N)$ = expected value of $Y$ on practice trial $N$
$A$ = expected value of $Y$ after learning is completed (i.e., asymptote)
$E$ = the amount of prior learning before $N = 0$

I have $N$ and $Y$ (1:10 and lc respectively in the code above), and want to estimate $A$, $B$, $E$, and $-β$ from the data. In the example above, I assumed $A$ = $E$ = 0 mostly because I wasn't sure how to estimate four parameters concurrently. While this isn't a huge problem, I would like to estimate $A$ and $E$ from the data as well if possible. So my final question:

Is there a way to estimate the four parameters in R? If not, is there a way to estimate three parameters, excluding $A$ from the above (i.e., assuming $A$ = 0)?

The paper I linked above says the authors used the simplex algorithm to estimate the parameters. But I am familiar with neither the algorithm nor the simplex function in R.

I'm aware of Clauset et al's work and the poweRlaw package in R. I believe both of them are dedicated to modeling the distribution of one categorical variable, but neither models the curve of the power law.

Any help is appreciated.

I stopped reading where the code began, because it does not simulate a power law relationship: you use an exponential function. That's why the lm fit is so bad. Try lc <- (1:10)^(1/2) instead. Both fits will be perfect because no error is introduced. Which one is better in general depends on the error structure: multiplicative (which is more likely) would be best analyzed with lm and additive with glm. For instance, re-run your code several times with lc <- (1:10)^(1/2) * exp(rnorm(10, sd=.25)). — whuber, Apr 10 '14 at 21:42
@whuber Oops, I was somehow convinced that's the shape the power function should model. Thank you for pointing it out. I will fix it. Thank you also for pointing me to the difference in the error structure between lm and glm. I will look into the issue. The simulation showed glm always yields higher $R^2$ than lm. Is it mathematically so? — Akira Murakami, Apr 10 '14 at 22:42
One consequence of the multiplicative error structure is that variance will be proportional to mean-squared, while the additive error would have variance proportional to a constant. Even ignoring other effects, this clearly weights the data differently. — Glen_b, Apr 10 '14 at 23:17
@Glen_b Thank you for the conceptual explanation. Having read this thread on R-help, I started to have a grasp of the issue. So $R^2$ does not seem to be the best measure to use to select between glm and lm. I wonder, then, whether it is appropriate to use $R^2$ values to compare power-law models (be it lm or glm) and linear models (my second question). — Akira Murakami, Apr 11 '14 at 09:20
I don't even think $R^2$ is generally the right criterion for comparing two linear models fitted with lm. — Glen_b, Apr 11 '14 at 09:34
@Glen_b If two models have the same dependent variable, I'd use likelihood ratio tests or information-theoretic approaches like AIC. But I'm not sure if there is a proper way to compare linear and power-law models... Anyway, thanks for the help. — Akira Murakami, Apr 11 '14 at 09:56
If you're comparing lm with lm on the same response variable, I think AIC should work. — Glen_b, Apr 11 '14 at 10:28
@Glen_b AIC does not work when one model has $y$ as the dependent variable and the other has $log(y)$, does it (as claimed here)? I believe that's what I need when comparing a power-law model against a linear model. — Akira Murakami, Apr 11 '14 at 10:44
You still contend the glm results are "preferred." That is not at all clear. In particular, glm will attempt to follow any high outliers more than lm will. I will therefore emphasize a previous recommendation: select the model based on expectations (and evidence) about the error structure. Look in the direction of goodness of fit tests (which include graphical diagnostics) rather than some direct comparison based on a single number like AIC. Re the final question, search our site. — whuber, Apr 11 '14 at 13:52
mrkm-a You are correct, you can't compare models with different y's that way. I misunderstood your question in comments; I thought you were now asking a slightly different question than that. Sorry. — Glen_b, Apr 11 '14 at 14:56
@whuber Oh no, I'm not claiming glm is overall better than lm: Whether it is was precisely my interest, and I learned the answer depends on assumed error structure. I would now like to know how to compare power-law models with other models (e.g., linear). The problem is that I want to test whether the development is better modeled by a power function in over 1,000 learners and see how many of them follow the power law. Given the size, I prefer an automated (or at least semi-automated) way of model comparison. By "search our site", do you mean CrossValidated? — Akira Murakami, Apr 11 '14 at 15:35
What's confusing is that three-quarters of your question still focuses on an earlier misunderstanding has been cleared up in these comments: $R^2$ is irrelevant (and possibly misleading) for the comparison; the only reason glm looks better is that $R^2$ measures errors according to its assumptions; it does not measure errors according to the linear regression of logarithms. As far as the site search goes, if you follow the link in my previous comment you will see which site you are directed to :-). — whuber, Apr 11 '14 at 15:43
@whuber Sorry. I have just modified the question. I'm not quite sure what you mean by "$R^2$ measures errors according to its assumptions". $R^2$ is derived solely on the basis of the difference between observed and fitted values and the total variance, isn't it? — Akira Murakami, Apr 11 '14 at 16:25
The additive difference is not relevant when you are regressing logarithms of the responses. What you are doing is like teaching somebody Olympic gymnastics and then rating them on how well they score football goals: although it can be done, it is neither useful nor meaningful. — whuber, Apr 11 '14 at 16:48
@whuber Hmm, is it the issue of minimizing least squares in log-log space in building a model but comparing $R^2$ values in linear space? I don't quite understand what problems it causes and why. I may need more mathematical background of the technique to further the understanding. — Akira Murakami, Apr 11 '14 at 17:24
See http://arxiv.org/abs/0706.1062 for proper treatment of estimation of power laws from empirical data! — kjetil b halvorsen, Jun 30 '14 at 19:28

score -1 · Answer 1 · answered Jul 05 '17 at 22:57

You can to use AIC/BIC values to compare the model fits to each other instead of R-squared values. The smaller AIC value is the better fit. It can sometimes be helpful to plot the AIC values as the difference from the bet fit model because the units of AIC are massive and arbitrary (I think), so it is the relative difference in AIC that you want to show; however it is helpful to have a table of AIC values as supplementary information.

I cant offer any advice on how to get AIC values in R as that is what I was looking for when I came across your post.

What I wish to compare is a power-law model and a linear model, and I believe it is not appropriate to use information-theoretic measures like AIC or BIC when I want to compare statistical models with different dependent variables, as in log(y) vs y above. — Akira Murakami, Jul 07 '17 at 15:18

Modeling the shape of the power function

1 Answers1