nls() singular gradient matrix at initial parameter estimates error

Question

I am trying to run a nls() to interpolate height growth of a tree given a series of samples, where each sample has a value of age, and height. I am using the Richards (1959) method:

$$ \textrm{Height}=x\left(1-e^{-y\cdot\textrm{Age}}\right)^z $$

my age and height values:

> age
[1]  0  3  6 17 29 47 57 63 72
> height
[1]   37   70  199  464  736 1008 1280 1462 1643
>

the script I am using to calculate parameters x, y, and z, is:

e<-2.71828
growCurve<-nls(
  formula=height~x*(1-e^(-1*y*age))^z,
  start=c(x=0,y=0,z=0)
)
b <- environment(growCurve[["m"]][["incr"]])[["internalPars"]][1]
y <- environment(growCurve[["m"]][["incr"]])[["internalPars"]][2]
z <- environment(growCurve[["m"]][["incr"]])[["internalPars"]][3]
b
y
z

when I run this however, I get the error

singular gradient matrix at initial parameter estimates

I tried using another fix from this post, but I still get identical results so I am not really sure on how to proceed.

$(0,0,0)$ is a particularly bad starting value for this model: it sits in one corner of the space of possible values. Choose something better. My answer in the thread you reference provides some ways to come up with better starting values. — whuber, Dec 14 '22 at 20:00
@whuber I did try that solution you suggested, however when I take those new output values using that solution and put them back into the nls() I still get the exact same error. — hawkingRadiation_, Dec 14 '22 at 20:03
Plotting these data reveals that your model cannot be identified: it isn't distinguishable from a model of the form Height = b * Age. You could fit it with lm(height ~ age - 1). To check whether a nonlinear model might be an improvement, compare it to lm(height ~ age + I(age^2)) Neither the intercept nor the quadratic term improve things. — whuber, Dec 14 '22 at 20:05
@whuber I suppose then I am open to suggestions on how else to implement this interpolation given the date that I have. The contemporary literature on the methods for this is rather scant — hawkingRadiation_, Dec 14 '22 at 20:10
BTW, when I use reasonable starting values -- try, say, $(x=2000, y=1/20, z=1)$ -- I get different errors showing the algorithm is having trouble converging. That's one indication of the lack of identifiability in this model for these data. Also, my sense is although you might hope to estimate $yz$ when the tree approaches its limiting height, unless you have hundreds or thousands of data points, you probably cannot separately find $y$ and $z.$ — whuber, Dec 14 '22 at 20:12
@whuber That's interesting to hear. The methods I am following here are from Karlsson, 2000. Where they use an identical number of samples per tree, although they are running this method for a collection of trees with similar ages at a specific height. Still they could not have had that many more data point to go off of I would not imagine. So I am curious what their methods may have been computationally. — hawkingRadiation_, Dec 14 '22 at 20:25
Provided the tree approximates its final height, you can estimate some values for $y$ and $z.$ The thing to look for is the standard error of estimate: are those values statistically significant or not? The correct test for $z,$ btw, is to compare it to $1$ (where it has no effect), not $0$ (which is what most software will do by default). The link to your reference only vaguely refers to "anamorphic" and "polymorphic" functions but doesn't indicate what they are. — whuber, Dec 14 '22 at 20:58
Can you add a full citation for the paper you refer? Richards (1959) is very unspecific ... — kjetil b halvorsen, Dec 14 '22 at 21:03
@kjetilbhalvorsen A Flexible Growth Function for Empirical Use, Richards (1959). Journal of Experimental Botany, Volume 10, Issue 2, June 1959, Pages 290–301, https://doi.org/10.1093/jxb/10.2.290 — hawkingRadiation_, Dec 14 '22 at 21:29
@whuber vagueness is definitely the issue I am battling. It is difficult to find any substantial literature that has useful methods on this topic. — hawkingRadiation_, Dec 14 '22 at 21:33
The Richards paper has a great deal of guidance concerning the meanings of the parameters. However, as you can see from the examples given there, your data -- which essentially grow linearly -- simply cannot determine what the time scale is, the maximum growth, or even the shape of the curve. — whuber, Dec 14 '22 at 21:37
A technical issue, but potentially an important one, concerns the errors of measurement. Tree height measurements likely have standard deviations proportional to the heights, indicating a direct least-squares fit of the data is not optimal. Either transform the model to a homoscedastic response or consider weighted least squares solutions (nls supports that). — whuber, Dec 14 '22 at 21:39
There are many good comments and answers to this question... As a practical matter, for data that look like this, there is probably no need to try to fit the exponential model. Even if in theory the tree height will follow this kind of curve, for these data the growth is still linear in its height vs. age. Perhaps this species of tree doesn't begin to decrease it's annual height increase until it's more than 80 years old ? Or perhaps the site conditions are pushing the trees to continue to increase height linearly even at 80 years ? — Sal Mangiafico, Dec 15 '22 at 14:41
@Sal Or perhaps the age is in days and the heights are in millimeters :-). — whuber, Dec 16 '22 at 14:42

whuber · Answer 1 · 2022-12-15T14:34:27.647

There are some things we can do. These are generally useful to know for any kind of optimization problem and are not limited to the nls function in R.

The first is to parameterize the model to make it everywhere differentiable in its coefficients and automatically constrain the parameters to reasonable ranges. These steps make it harder for the algorithm to have numerical problems or wander into forbidden territory. One way (of many possibilities) is to write $x$ as an exponential (making it positive), $y$ as a square (ditto), and $z$ -- which in the Richards paper is close to $1$ -- in the form $1 + \cos(t)$ for some angle $t.$

The second is to find a good set of starting values. I discovered one set by experimenting with a fit by eye, shown as the dotted curve in the plot below.

The third is to prevent nls from exiting, so even if it fails to find a solution, you can inspect the solution.

The fourth is to use suitable weights. I speculate that higher heights ought to be fit with larger error and so use weights proportional to height. This makes little difference here, but can sometimes be helpful and more statistically appropriate.

This procedure finds the solution plotted in red. You can see it follows the data closely and is slightly better than the initial eyeball fit.

The fitted values (in terms of the original parameters) are $x = 134571,$ $y = 0.000118,$ and $z = 0.926.$

The problem is evident: these data trend almost linearly, which is not enough to determine all three parameters in the model. Given that the model forces the curve through the origin, a linear trend uses just one parameter, making the other two parameters superfluous.

To verify this, analyze the correlation matrix of the estimates. In this case the eigenvectors are approximately $2.96,$ $0.039,$ and $10^{-8}.$ The last two are essentially zero compared to the first. The corresponding eigenvectors correspond roughly to (a) a contrast between $x$ and the other parameters (increases in $x$ are almost perfectly offset by decreases in $y$ and $z$), (b) the $z$ parameter, and (c) the $x$ and $y$ parameters, in that order. This tells us the model is confident about a particular combination of $x,$ $y,$ and $z$ (which determines the slope for small ages) but can glean essentially no specific information about $z$ or $y$ from these data, apart from a tiny hint of curvature suggested by the fact the second eigenvalue is not truly zero.

Thus, a least squares fit ought to work about as well. Using fit <- lm(height ~ 0 + age) I found a slope of $22.89.$ Plotting this line (through the origin) gives a curve barely distinguishable from the red curve above, so I haven't shown it.

Here is the R code showing the details.

#
# The model.
#
f <- function(age, beta) {
  x <- beta["x"]
  y <- beta["y"]
  z <- beta["z"]
  exp(x) * (1 - exp(-y^2 * age))^(1 + sin(z))
}
#
# The data.
#
age <- scan(text = "0  3  6 17 29 47 57 63 72")
height <- scan(text = "37   70  199  464  736 1008 1280 1462 1643")
plot(age, height)
#
# Draw an eyeball fit.
# These parameters correspond to an exponential curve (z == 0) that over the
# range of ages in the data hardly departs from linearity (age * y is always
# tiny).  Only `x` has to be estimated; it determines the slope, which is
# easy to gauge from the plots.
#
s <-  c(x = 9, y = sqrt(1/300), z = 0)
curve(f(x, s), add = TRUE, lwd = 2, col = gray(0.3), lty = 3)
#
# Polish this fit.
# The `control` object shows how to specify some of the most useful aspects
# of the search.
#
obj <- nls(height ~ exp(x) * (1 - exp(-y^2 * age))^(1 + sin(z)), 
    start = s, weights = height, 
    control = list(minFactor = 2^(-16), maxiter = 1e4, warnOnly = TRUE))
#
# Plot the fit.
#
b <- coefficients(obj)
curve(f(x, b), add = TRUE, lwd = 2, col = hsv(0.02, 1, 0.85))
#
# Report the estimates in terms of the original parameterization.
#
c(x = exp(b["x"]), y = b["y"]^2, z = 1 + sin(b["z"]))
#
# Analyze the correlations among the estimates.
#
V <- vcov(obj)
v <- sqrt(diag(V))
V <- t(V/v)/v
eigen(V) # Alternatively, `eigen(cov2cor(vcov(obj)))`
#
# Look at a simpler alternative model.
#
fit <- lm(height ~ 0 + age)
abline(fit, col = "Blue")

If the parameters have some biological/chemical/physical interpretation, they might have priors, possibly even strong priors, and a Bayesian non-linear model might also be an option. — Roland, Dec 15 '22 at 07:01
Two minor comments: i) In the code, you define b (the coefficients) after converting them in the original parametrization, which leads to an error because b has not yet been defined. ii) An easier (but probably less instructive) way to convert V to a correlation matrix would be the built-in function cov2cor. — COOLSerdash, Dec 15 '22 at 07:03
@COOLSerdash I apologize for the error--I reordered the code before posting it and neglected to rerun the entire block. I'll fix it. Thanks for the cov2cor tip. — whuber, Dec 15 '22 at 14:33

G. Grothendieck · Answer 2 · 2022-12-15T15:26:30.420

6

The problem seems to be in convergence of the z parameter so fix it at each of a sequence of values and optimize on the others for each and then take the result with the lowest residual sum of squares. We use the plinear algorithm of nls which does not require starting values for the parameters that enter linearly -- x enters linearly so its starting value can be omitted. Since we are fixing z and since x enters linearly we only have to specify a starting value for y. To use plinear the right hand side should be a column that multiplies x but x itself is not specified (in a similar way to lm).

age <- c(0, 3, 6, 17, 29, 47, 57, 63, 72)
height <- c(37, 70, 199, 464, 736, 1008, 1280, 1462, 1643)
fo <- "height ~ cbind(x = (1 - exp(-y * age))^z)"
zseq <- seq(0, 2, 0.01)
sse <- sapply(zseq, function(z) {
  res <- try(nls(fo, start = list(y = 1), algorithm = "plinear"))
  if (inherits(res, "try-error")) NA else deviance(res)
})
z <- zseq[which.min(sse)]
z # z value which minimizes the SSE
[1] 1
fm <- nls(fo, start = list(y = 1), algorithm = "plinear")
fm

giving

Nonlinear regression model
  model: height ~ cbind(x = (1 - exp(-y * age))^z)
   data: parent.frame()
        y    .lin.x 
3.384e-03 7.468e+03 
 residual sum-of-squares: 18004
Number of iterations to convergence: 9 
Achieved convergence tolerance: 1.103e-06

Plotting the solution

plot(height ~ age)
lines(fitted(fm) ~ age)

(continued after chart)

Note that a simple linear regression actually gives a better fit (lower residual sum of squares) and has fewer parameters (2 vs 3).

fm.lin <- lm(height ~ age)
deviance(fm.lin)
## [1] 13546.89
deviance(fm)
[1] 18003.86

The following is just to show that the deviance function in R is equivalent to taking the residual sum of squares for both nls and lm.

sum(resid(fm.lin)^2)
## [1] 13546.89
sum(resid(fm)^2)
[1] 18003.86

edited Dec 15 '22 at 15:26

answered Dec 15 '22 at 00:54

G. Grothendieck

1,840

The simple linear regression is not nested within the original model (which has no additive intercept term), so it might be misleading to compare the residual sums of squares. I'm not sure these two deviances are even measuring the same thing--the reported reduction looks too great. – whuber Dec 15 '22 at 14:53
It doesn't matter whether they are nested or not unless you are doing likelihood ratio tests. They are both measuring the sum of squares of height - fitted so they are certainly measuring the same thing. If you are unsure about the R deviance function you can get the exact same results using sum(resid(fm)^2) and sum(resid(fm.lin)^2) – G. Grothendieck Dec 15 '22 at 15:12
I did that before posting my comment and got different results, with much closer deviances. The reason why the comparison might be misleading is that the two models have different meanings: being able to reduce the deviance by introducing a behavior (the intercept) that was not reflected in the original model might be scientifically inappropriate. In my experiments yesterday I could not even establish that the intercept was statistically significant. – whuber Dec 15 '22 at 15:22
You must have made some computer error. I have added the results to the end of the answer and you can see they are identical. There is not much data here so I would not rely on statistical significance. This is more exploratory. – G. Grothendieck Dec 15 '22 at 15:28
I made no error. My solution has a smaller deviance than yours and I used my solution. – whuber Dec 15 '22 at 15:36
And the linear model deviance is lower than either of the nonlinear models. – G. Grothendieck Dec 15 '22 at 15:38

score 1 · Answer 3 · answered Dec 14 '22 at 22:53

1

Minor point: Don't try to recreate the exponential function in your code

The excellent answer by whuber shows how you can use the data to choose better starting values for your iteration and how you can constrain the search space appropriately. This is something you should usually do when fitting nonlinear models using an iterative method. One other minor thing I notice in your code is that you define your own constant e and then take powers of this in later steps to yield your own exponential function. That is unecessary and it introduces additional rounding error. Instead of trying to create the exponential function yourself, use the exp function built into the program. (You will see that whuber has done this in his own code.)

answered Dec 14 '22 at 22:53

Ben

124,856

1

This is a good point, but I don't see how using a specified constant for e introduces any rounding error to be concerned about in the modeling. The reason is that I would expect e^y to be implemented as $\exp(y\log(e)),$ thereby requiring an additional multiplication, but there's really no harm done except perhaps in the 52nd binary digit of the mantissa. Now, if by "rounding error" you mean in the reporting of the estimates, that's spot on due to the error in equating $\exp(1)$ with $2.71828,$ which is off by almost one part per million. – whuber Dec 14 '22 at 23:01
I agree with your description of the implementation and effect. Thus, why add a rounding error of one-part-in-a-million in $\log(e)$? Why not use exp instead and not have that rounding error? Also, if someone sees the coding method in OP and thinks this is how to do things, maybe they will think it's similarly okay to use e <- 2.718 (error now one-part-in-ten-thousand) or e <- 2.72 (error now six-parts-in-ten-thousand). Better to nip this method in the bud, no? – Ben Dec 14 '22 at 23:17
I agree -- it's a useful coding comment. – whuber Dec 15 '22 at 14:47

nls() singular gradient matrix at initial parameter estimates error

3 Answers3

[1] 1

[1] 18003.86

[1] 18003.86

Minor point: Don't try to recreate the exponential function in your code

Linked