How can I do a fit for negative $y$-data, which has exponential phenomena?

Question

Such as:

coefs = np.polyfit(xs,np.log(rs+abs(np.min(rs))+1), 1)
fit = np.exp(coefs[1])*np.exp(coefs[0]*xs)

But if I move $\log(y)$ to $\log(y+a)$, then do a fit, how can I go back to the original pre-fit $\log(y)$?

What confuses me is that can I know how polyfit or others treat the input. Should I do the adjusting somewhere else?

Something else I could do?

The pic describes everything prior to the model building. So I only have some $x$s and negative $y$s and want to infer whether it has "exponentiality" to it or parts of it.

https://blogs.sas.com/content/iml/2011/04/27/log-transformations-how-to-handle-negative-data-values.html

Suggests doing a transformation of the form $\log(y)=\log(Y+1-\min(Y))$. But again I wonder, whether, after doing the fit, the "back-transformation" would be algebraically accurate. That is, whether $y=\exp(y_{fit})-1+\min(y)$ or $y \approx \exp(y_{fit})-1+\min(y)$.

The data has no specific meaning (it's random generated), but the order has meaning.

How about doing $\log(|Y|)$ to mirror/flip the shape to the positive side, do the fit, then mirror it back?

But I wonder if flipping the pattern would alter the fit? That is, whether this should be "moved", rather than flipped? Flipping would make larger values smaller values. OR this might depend on the interpretation of "larger". What if it's "larger negative"?

Reproducible code:

https://pastebin.com/ncpzrN4M (with transforms)

Why are you doing this? Usually when I see people adding a to $y$ and taking $log(.)$, it's usually because they are trying to avoid $log(0)$ values of the response variable or negative values. If this is the case, there are better methods/models that are usually more appropriate for this situation. — StatsStudent, Dec 31 '18 at 21:35
@StatsStudent Because I have $y$ values, which are negative, but which have exponential phenomena, which i want to analyze by fitting. But the technique I read uses polyfit and $\log$. — mavavilj, Dec 31 '18 at 21:37
Perhaps I could use $y=A \exp(-Bx)$ in which case: $\log(y)=\log(A)-Bx$ and I think that polyfit would return the coefs as opposite sign and then I can just flip their sign? So I would build my fit like fit = np.exp(-coefs[1])*np.exp(-coefs[0]*xs). — mavavilj, Dec 31 '18 at 21:45
Can you also tell us a little bit about your data? Can you explain a bit more about what you mean when you say your y-data has exponential phenomena? I think you might be needing a generalized linear effects model, but we need some more information. — StatsStudent, Dec 31 '18 at 21:45
+1 to @StatsStudent's comment. The picture you just posted looks like it would be adequately fitted by a quadratic (or possibly cubic) curve: do you have reasons to believe that your data represent an "exponential phenomenon" ? — Ben Bolker, Dec 31 '18 at 21:52
by whom? Can you give us more context? It's not that this is impossible, but the increase in the y-value beyond x=5 isn't characteristic of an exponential curve (and it doesn't look like just noise ...) — Ben Bolker, Dec 31 '18 at 21:55
for what it's worth, subtracting the minimum (as in your edit) seems reasonable. This is more or less equivalent to what @Ben suggests in their answer. — Ben Bolker, Dec 31 '18 at 22:42
@BenBolker It might be worth fitting an offset parameter rather than subtracting the minimum. — James Phillips, Dec 31 '18 at 23:39
@BenBolker my point is that the minimum might not be the optimum value to subtract. — James Phillips, Dec 31 '18 at 23:42
@JamesPhillips It guarantees that the values will become non-negative. $+1$ is for avoiding $0$. — mavavilj, Dec 31 '18 at 23:42
I see. I was reading that as "offset" in the technical sense. Yes, I agree. @mavavlij: +1 (rather than some other positive value) only makes sense if the data are counts or something for which "1" has a special meaning. At this point deciding what's "best" depends on context we don't have yet. — Ben Bolker, Dec 31 '18 at 23:46
@BenBolker Why does that matter, because the back transformation would bring the original range back? — mavavilj, Dec 31 '18 at 23:48
Because an exponential decay is always declining to zero asymptotically. The fit will depend on the offset you use. Try it and see! — Ben Bolker, Dec 31 '18 at 23:49
@BenBolker I don't understand. The fit cannot be done for the negative $y$. So intuitively I'd think that in order to retain the relativity of the data, one'd ideally "mirror" or "flip" it to the positive axis? Would it be possible to adjust every point individually? Basically e.g. take $abs(y_i)$ instead of minimums. — mavavilj, Jan 01 '19 at 00:06
You can't add less than (-min(y)), but you could add more. I'm going to stop answering now sorry, because judging what the 'best' approach is depends on much more context that we don't have (the goal of the analysis, why you need to fit an exponential, your level of computational and statistical sophistication and that of your audience, etc. ...) — Ben Bolker, Jan 01 '19 at 00:09
Since you are using Python, post a link to the data and I can easily make - and post code for - a non-linear fitter so that you don't need to take any logs. I would use my zunzun.com "function finder" for equation search, with scipy's differential_evolution genetic algorithm providing initial parameter estimates for the non-linear solver code that I post. — James Phillips, Jan 01 '19 at 01:01
@JamesPhillips But I was interested in doing this using very basic least-squares. Otherwise I could use some other methods. Any input as to whether mirroring/flipping or scaling the data to positive leads to more accurate fit (w.r.t. the shape/curvature or sign of the functions used in fitting)? — mavavilj, Jan 01 '19 at 09:50
Without data for analysis I cannot directly answer this question. — James Phillips, Jan 01 '19 at 12:54
Why would you want to use "very basic least-squares" when you are already telling us the data is potentially exponential? Again, we need more context. — StatsStudent, Jan 02 '19 at 01:03

score 1 · Answer 1 · answered Dec 31 '18 at 21:46

1

Since your original shift was to deal with negative data, there is really no reason to try to back-transform this to an unshifted logarithmic scale. Logarithms of negative values are complex numbers, so even if you were to succeed in back-transformation, this would yield complex logarithmic values.

Perhaps a useful output would be to find an expression for the untransformed values. For a model with no regressors (which yours seems to be), this can be done via the fact that:

$$\log (Y_i + a) = f(\theta) + \varepsilon_i \quad \quad \quad \implies \quad \quad \quad Y_i = \exp( f(\theta) + \varepsilon_i) - a.$$

Like I said, if you take the logarithm of these values then some of them will be complex numbers, which is probably not particularly helpful. Presumably it will be more useful to have estimates of $y_i$ directly.

answered Dec 31 '18 at 21:46

Ben

124,856

Perhaps I could shift $y$s up by some constant, do the fit there and then move both $y$ and $y_{fit}$ back down such that $y$ is same as the original and then I would move $y_{fit}$ s.t. the distance $dist(y,y_{fit})$ is retained? But would this call for the fit to be "linear"? That is that it does not vary if one scales the data by scalar. – mavavilj Dec 31 '18 at 21:52
1

That is what this is already doing. We shift $y$ up by the constant $a$, fit the model, and then express the model back in terms of the original $y$. – Ben Dec 31 '18 at 22:00
What's $f(\theta)$? You mean that if I take polyfit on $\log(Y_i+a)$, then I would do the $\implies$ that you say in order to find an expression that relates it to the original $Y_i$? So then I could compute error by e.g. $\bigg(\bigg( \exp(f(\theta)+\epsilon_i)-a \bigg) - Y_i \bigg)^2$? – mavavilj Dec 31 '18 at 22:02
BTW, how can I "know" that the model doing the fitting does not violate some of the properties of $\exp$, $\log$, ... So that the algebra that's done here actually holds after the algorithm? That is, that $y_{fit}$ hasn't been attained through some other algebraic operations than the same as used in the transformation. – mavavilj Dec 31 '18 at 22:16
5

I think you need to take a step back and first describe what it is that you are trying to model and why before you start deciding on models to use. I don't think any of us has a good enough understanding of the context or why you are building a model all together. Once we know some more details, we can provide some additional -- and most importantly -- useful guidance. – StatsStudent Dec 31 '18 at 22:28

score 0 · Answer 2 · answered Jan 01 '19 at 15:37

0

Thank you for posting the link to your code. I ran your code several times, fitting to the equation "y = a * exp(bx) + Offset" and would like to point out something I observed. Each time the code yields different data, as it should, but I observed that the different data sets sometimes have a "U" shape near the minimum. Here are two example images, one where the data does have this and one where it does not. This meant that in some cases this type of equation, or a similar one where each side of the equation has the log taken for linearization, will sometimes not follow the expected shape. If you make multiple runs of the code that you posted you should also see this as well.

answered Jan 01 '19 at 15:37

James Phillips

1,216

I did the polyfit with the code linked in the question, but this time used transform and back transform. I wonder why I'm getting that lousy fit that gets more accurate, when $x$ passes. Something to do with polyfit? See question. – mavavilj Jan 01 '19 at 15:42
What did you use for fitting? – mavavilj Jan 01 '19 at 15:47
Or perhaps my lousy fit suggests that that the start points don't have consistent exponential pattern, whereas the latter ones have? – mavavilj Jan 01 '19 at 15:56
Per your question, I used my online Python open source curve fitting web site zunzun.com, specifically the fitter for this equation at http://zunzun.com/Equation/2/Exponential/Exponential%20With%20Offset/ – James Phillips Jan 01 '19 at 17:04
Per my latest code, I altered it by doing coefs = np.polyfit(xs,np.log(transform(np.array(rs))), 1) and this esems to align well with the data. I wonder if this is how it's supposed to be done. I thought that since I do transformation $\log(1+y-\min(y))$, then I wouldn't need to retake $\log$, but perhaps I ought to? Similarly I could remove "exponentiating" after taking the solution. Perhaps that's where my problem was. That I to inverse transformation AND exponentiation, i.e. two times exp. – mavavilj Jan 01 '19 at 17:49
I never bother with forcing logs for the sole purpose of using linear regression because I fit directly with non-linear regression and have no need for it - as you see from the link I posted. You might get good feedback by posting that as a separate question. – James Phillips Jan 01 '19 at 18:24

How can I do a fit for negative $y$-data, which has exponential phenomena?

2 Answers2