2

I am attempting to fit nls() for 520 users to achieve the coefficients of the exponential fitting. The following is a small representation of my data.

dput(head(Mfrq.df.2))
structure(list(User.ID = c("37593", "38643", "49433", "60403", 
"70923", "85363"), V1 = c(9L, 3L, 4L, 80L, 19L, 0L), V2 = c(10L, 
0L, 29L, 113L, 21L, 1L), V3 = c(5L, 2L, 17L, 77L, 7L, 2L), V4 = c(2L, 
2L, 16L, 47L, 4L, 3L), V5 = c(2L, 10L, 16L, 40L, 1L, 8L), V6 = c(4L, 
0L, 9L, 22L, 1L, 7L), V7 = c(6L, 8L, 9L, 8L, 0L, 6L), V8 = c(2L, 
17L, 16L, 24L, 2L, 1L), V9 = c(3L, 20L, 7L, 30L, 0L, 4L), V10 = c(2L, 
11L, 5L, 11L, 2L, 3L)), row.names = c(NA, 6L), class = "data.frame")

Finally, I found two ways of doing this. However for both, I get an error stating singular gradient.

#Way I
x=1:10
Mfrq.df.2_long <- pivot_longer(Mfrq.df.2, matches("V\\d{1,2}"), names_to = NULL, values_to = "Value")

Mfrq.df.2_long %>% group_by(User.ID) %>% mutate(fit = nls(Value ~ A * exp(-k * x), start = c(A =2, k = 0.01)) %>% list())

#Way2
L1 = c()
for (i in unique(Mfrq.df.2$User.ID)) {L1[[as.character(i)]]=seq(1,10)}
length(L1) #520 users
dput(head(L1))

list(`37593` = 1:10, `38643` = 1:10, `49433` = 1:10, `60403` = 1:10, 
    `70923` = 1:10, `85363` = 1:10)
#Way 2 Continue
L2=list.ids.RecSOC.2
length(L2) #520 users
dput(head(L2))

list(`37593` = c(9L, 10L, 5L, 2L, 2L, 4L, 6L, 2L, 3L, 2L), `38643` = c(3L, 
0L, 2L, 2L, 10L, 0L, 8L, 17L, 20L, 11L), `49433` = c(4L, 29L, 
17L, 16L, 16L, 9L, 9L, 16L, 7L, 5L), `60403` = c(80L, 113L, 77L, 
47L, 40L, 22L, 8L, 24L, 30L, 11L), `70923` = c(19L, 21L, 7L, 
4L, 1L, 1L, 0L, 2L, 0L, 2L), `85363` = c(0L, 1L, 2L, 3L, 8L, 
7L, 6L, 1L, 4L, 3L))
#Way 2 Continue    
control=nls.control(maxiter=1000)
res <- mapply(function(x,y){
  nls(y~A*(exp(-k*x)),
      start=list(A=100, k=0.01), control=control,
      trace= TRUE, data=data.frame(x, y))},L1,L2, SIMPLIFY=FALSE)

To the best of my understanding, it has something to do with the starting values. I find it hard to find starting values that would work for all 520. Especially knowing not all of them are following the defined curve. I still need all 520 coefficients (A&k) to do my further analyses.

Any recommendations? Thanks

MK25
  • 31
  • 2
    Please explain what your code is attempting to do before (or typically here, instead of) presenting any code. Trying to separate implementation errors from errors of understanding is very difficult -- unless everything works, we're left trying to guess what you were actually hoping to do from code that doesn't do it. – Glen_b Oct 30 '22 at 22:09

2 Answers2

3

I cannot comment much on the code. It reads difficult to me. But it seems like you are fitting an exponential function y ~ A*(exp(-k*x)).

  • For this you can find starting values by first solving a linearized fit.

    ln(y) ~ a + bx
    

    Then you can use starting values for your parameters by using $k = -b$ and $A = exp(a)$

  • In addition you can solve the same problem as a generalized linear model and do not really need to use nls.

    y = exp(ln(A) - k*x)
    

    An example you see here: What is the objective function to optimize in glm with gaussian and poisson family?

  • a follow-up question. For solving for linearized fit, should I just replace nls() part in my provided code with ln(y) ~ a + bx ? Can you please provide your recommended code for both? – MK25 Oct 30 '22 at 22:46
  • @MK25 an example is here: https://stats.stackexchange.com/questions/454981/ and there are many others on the internet. – Sextus Empiricus Oct 31 '22 at 06:20
0

This may not be the most elegant solution, but my general method to determine starting values when a using function like nls() is to put the x and y values in a spreadsheet program like Excel. And plot these. I then add two columns for the x and y values for the fitted function. And add a curve for the fitted line to the plot. I have cells for the coefficients of the fitted function, and the fitted-y values are hot-linked to these, so that when I change the values in the cells for the coefficients, the curve for the fitted function changes. In this way, you can make sure that the initial values you choose result in a function that reasonably fits the data. Hopefully this makes sense.

Sal Mangiafico
  • 11,330
  • 2
  • 15
  • 35
  • 1
    it makes sense. My only concern is that I have to do this for all 520 users. Please correct me if I am wrong. – MK25 Oct 30 '22 at 23:12
  • @MK25 , so you need to fit 500 models ? In that case, either hope that the starting values for one will work for everyone. Or, choose a model that can be linearized, like Sextus suggests. – Sal Mangiafico Oct 30 '22 at 23:19