5

Physical and biological processes often exhibit (exponential) decay on multiple timescales. A standard approach to modelling such a decay is to fit a sum of of exponentials $$ y(t) = \sum_k a_k \mathrm{e}^{-t/\tau_k} $$ where $\tau_k > 0$ is the $k$th characteristic timescale. Unfortunately, parameter estimation for this model is notoriously ill-conditioned when the characteristic timescales are unknown. If one is explicitly interested in these timescale parameters then the estimation difficulties are naturally unavoidable. However, if one simply wants a parametric curve which resembles well the empirical decay curve in the observed region, whilst tending to zero asymptotically, then fitting a sum of exponentials seems unnecessarily troublesome.

Suppose we relax the assumption of strictly exponential decay and instead assume only that

  1. $y(t) > 0$,
  2. $y'(t) < 0$,
  3. $y''(t) > 0$.

Is there a such family of curves which yields a better-conditioned (ideally orthogonal) parameter estimation problem than a sum of exponentials?

Edit: As requested in the comments, here is an example dataset.

y <- c(0.884055209313849, 0.803893983397689, 0.746397132725163, 0.705777060403705, 0.661407995137169, 0.623483588432092, 0.594882295689734, 0.580521453477321, 0.547799857276996, 0.539458241092239, 0.528979849768679, 0.508823655437141, 0.48875512562363, 0.475737660387551, 0.453989602466627, 0.453607155116856, 0.449699469387466, 0.430540770961381, 0.412202491804542, 0.419165082551951, 0.421113556516559, 0.398730401071103, 0.383534941709539, 0.41715157898334, 0.391276942058177, 0.373333743559256, 0.363797164291244, 0.371068290220633, 0.35363773187149, 0.34623321877465, 0.343873146975883, 0.344943217991846, 0.308395826422007, 0.340956700470444, 0.335410736459545, 0.318216669004373, 0.295125468244325, 0.300611724590081, 0.31725877961607, 0.306508299912537, 0.271323331043375, 0.280502445536032, 0.280827802071935, 0.271333873640008, 0.266933780694651, 0.268445439712102, 0.278541051077256, 0.26523910578266, 0.248729693401411, 0.254131544486039, 0.262245962214525, 0.26253719264958, 0.251761317152903, 0.245506599261308, 0.24366897722567, 0.23575216060148, 0.245532374691214, 0.249893026353401, 0.235110805525794, 0.235423945450893, 0.245767596093225, 0.241825321273371, 0.217470051497189, 0.229911041413764, 0.223960393239219, 0.202973515263714, 0.213084763047804, 0.233613157145369, 0.197615145867789, 0.196579426794745, 0.224726046610225, 0.218789241363574, 0.195947751219023, 0.202236685856023, 0.205679319690375, 0.194047325153948, 0.18680318400662, 0.191940475226121, 0.174568427766484, 0.205176020344368, 0.195087839892282, 0.200841363532111, 0.187013251178195, 0.205334174211714, 0.201165384816937, 0.178234442349502, 0.190482218491778, 0.199502959362586, 0.190662149528195, 0.183426946535524, 0.175182229708743, 0.186226262354694, 0.176864247898865, 0.167166380736567, 0.178755342191183, 0.18304083166045, 0.187981160568871, 0.180141043883917, 0.18547885616634, 0.179747807156895, 0.168423260536129, 0.162118692825004, 0.174788613344953, 0.172213971380834, 0.165022198566688, 0.18223228609134, 0.153622884106958, 0.150179710485986, 0.158928542459721, 0.15138508444697, 0.155135383810794, 0.168199849617203, 0.159054269326527, 0.153400466933264, 0.162750985987796, 0.157968723780781, 0.140144273187311, 0.14706658022586, 0.144054328987362, 0.137411787353518, 0.155493934447654, 0.1331965011179, 0.157137003243437, 0.140453165528883, 0.135619047347813, 0.152034087715058, 0.152421263833777, 0.14502200457531, 0.147184370242521, 0.131045308545383, 0.140656762485822, 0.140129955362258, 0.134451870933126, 0.157809692099575, 0.147509469160597, 0.137515576031743, 0.136658138665687, 0.145432694960525, 0.13764716382126, 0.139155482637229, 0.12416393096613, 0.129782255617861, 0.122481982883383, 0.137107898256597, 0.126771311533978, 0.127216799993841, 0.129934310612188, 0.130719366599547, 0.116466195528625, 0.10056145969211)

  • i am unaware about the exact problems you are dealing with. but regularisation is a standard way of dealing with ill conditioning, A Lasso model would implicitly do variable selection (on the timescales) – seanv507 Nov 06 '19 at 12:01
  • Are you suggesting penalizing the absolute value of the timescale parameters? – Estacionario Nov 06 '19 at 12:18
  • 1
    yes penalising the absolute value of the $a_k$ (probably normalising the $\exp(-t / \tau_k)$... this would be a "no brainer" ML solution.... I don't have an insight on how useful it would be for your actual problem. ridge regression would also work, and is probably easier implemented (if you cannot just pick a R library - eg glmnet). – seanv507 Nov 06 '19 at 15:21
  • 1
    I am curious to try fitting the sigmoidal family of equations, would you please post an example data set (or a link to one)? – James Phillips Nov 06 '19 at 17:43
  • 1
    @JamesPhillips I have added an example dataset to the question. – Estacionario Nov 07 '19 at 12:50
  • @seanv507 Does it make sense to penalize the $a_k$ if they are constrained to be positive? Unlike in standard lasso/ridge there would no longer be symmetry about zero. – Estacionario Nov 07 '19 at 12:55
  • The example dataset appears to have y values only, would you please post the associated values of t? – James Phillips Nov 07 '19 at 14:18
  • 1
    The sampling is uniform in time so t <- 1:150 would be fine. – Estacionario Nov 07 '19 at 14:38
  • I believe it makes sense. glmnet (lower.limits=0) and penalized (positive=TRUE) R packages support it – seanv507 Nov 07 '19 at 19:15

1 Answers1

3

Thank you for posting the data. I found that several different sigmoidal equations each gave a good fit to the data. As one example here is the Weibull sigmoidal equation:

y = a - b * exp(-c * pow(x, d))

with the fitted parameters

a = 4.7926564984010420E-02
b = -1.1697977753141104E+00
c = 3.3040509493629389E-01
d = 4.1812936950380108E-01

yielding RMSE = 0.009899 and R-squared = 0.9954. Below is a fitted equation plot, a plot of regression error versus "y", and a histogram of the regression errors indicating the degree of error distribution normality.

plot

errors

enter image description here

  • 1
    could I ask how you fitted the curve? – seanv507 Nov 07 '19 at 19:17
  • 1
    @seanv507 I used the "function finder" from my online open source zunzun.com curve fitting web site to perform an equation search on the Sigmoidal family of equations. The site uses the Differential Evolution genetic algorithm to determine initial parameter estimates for non-linear equations, which is what makes the equation search possible. Of the many equations returned, I selected the Weibull sigmoid as being a well-known, named equation giving a good fit to the data. Basically I pasted the data into the function finder, chose the "Sigmoidal" family of equations, and hit the Submit button. – James Phillips Nov 07 '19 at 19:33
  • Very nice. You seem to be getting a similar goodness of fit to a sum of three exponentials, but with only two non-linear parameters instead of three. – Estacionario Nov 07 '19 at 19:56