There are some things we can do. These are generally useful to know for any kind of optimization problem and are not limited to the nls function in R.
The first is to parameterize the model to make it everywhere differentiable in its coefficients and automatically constrain the parameters to reasonable ranges. These steps make it harder for the algorithm to have numerical problems or wander into forbidden territory. One way (of many possibilities) is to write $x$ as an exponential (making it positive), $y$ as a square (ditto), and $z$ -- which in the Richards paper is close to $1$ -- in the form $1 + \cos(t)$ for some angle $t.$
The second is to find a good set of starting values. I discovered one set by experimenting with a fit by eye, shown as the dotted curve in the plot below.
The third is to prevent nls from exiting, so even if it fails to find a solution, you can inspect the solution.
The fourth is to use suitable weights. I speculate that higher heights ought to be fit with larger error and so use weights proportional to height. This makes little difference here, but can sometimes be helpful and more statistically appropriate.
This procedure finds the solution plotted in red. You can see it follows the data closely and is slightly better than the initial eyeball fit.

The fitted values (in terms of the original parameters) are $x = 134571,$ $y = 0.000118,$ and $z = 0.926.$
The problem is evident: these data trend almost linearly, which is not enough to determine all three parameters in the model. Given that the model forces the curve through the origin, a linear trend uses just one parameter, making the other two parameters superfluous.
To verify this, analyze the correlation matrix of the estimates. In this case the eigenvectors are approximately $2.96,$ $0.039,$ and $10^{-8}.$ The last two are essentially zero compared to the first. The corresponding eigenvectors correspond roughly to (a) a contrast between $x$ and the other parameters (increases in $x$ are almost perfectly offset by decreases in $y$ and $z$), (b) the $z$ parameter, and (c) the $x$ and $y$ parameters, in that order. This tells us the model is confident about a particular combination of $x,$ $y,$ and $z$ (which determines the slope for small ages) but can glean essentially no specific information about $z$ or $y$ from these data, apart from a tiny hint of curvature suggested by the fact the second eigenvalue is not truly zero.
Thus, a least squares fit ought to work about as well. Using fit <- lm(height ~ 0 + age) I found a slope of $22.89.$ Plotting this line (through the origin) gives a curve barely distinguishable from the red curve above, so I haven't shown it.
Here is the R code showing the details.
#
# The model.
#
f <- function(age, beta) {
x <- beta["x"]
y <- beta["y"]
z <- beta["z"]
exp(x) * (1 - exp(-y^2 * age))^(1 + sin(z))
}
#
# The data.
#
age <- scan(text = "0 3 6 17 29 47 57 63 72")
height <- scan(text = "37 70 199 464 736 1008 1280 1462 1643")
plot(age, height)
#
# Draw an eyeball fit.
# These parameters correspond to an exponential curve (z == 0) that over the
# range of ages in the data hardly departs from linearity (age * y is always
# tiny). Only `x` has to be estimated; it determines the slope, which is
# easy to gauge from the plots.
#
s <- c(x = 9, y = sqrt(1/300), z = 0)
curve(f(x, s), add = TRUE, lwd = 2, col = gray(0.3), lty = 3)
#
# Polish this fit.
# The `control` object shows how to specify some of the most useful aspects
# of the search.
#
obj <- nls(height ~ exp(x) * (1 - exp(-y^2 * age))^(1 + sin(z)),
start = s, weights = height,
control = list(minFactor = 2^(-16), maxiter = 1e4, warnOnly = TRUE))
#
# Plot the fit.
#
b <- coefficients(obj)
curve(f(x, b), add = TRUE, lwd = 2, col = hsv(0.02, 1, 0.85))
#
# Report the estimates in terms of the original parameterization.
#
c(x = exp(b["x"]), y = b["y"]^2, z = 1 + sin(b["z"]))
#
# Analyze the correlations among the estimates.
#
V <- vcov(obj)
v <- sqrt(diag(V))
V <- t(V/v)/v
eigen(V) # Alternatively, `eigen(cov2cor(vcov(obj)))`
#
# Look at a simpler alternative model.
#
fit <- lm(height ~ 0 + age)
abline(fit, col = "Blue")
lm(height ~ age - 1). To check whether a nonlinear model might be an improvement, compare it tolm(height ~ age + I(age^2))Neither the intercept nor the quadratic term improve things. – whuber Dec 14 '22 at 20:05nlssupports that). – whuber Dec 14 '22 at 21:39