0

My situation, slightly simplified to reach more of a "minimal problem":

  • Data given as a Pandas DataFrame
  • Independent variables: t1, t2
  • Dependent variables: z1, z2
  • Fitting function: f(t1, t2, p, q) with p, q the parameters to determine.
  • The data series:
    • z1 depends on the parameters p and q, i.e. z1 = f(t1, t2, p, q)
    • z2 depends on the parameters p and r, i.e. z2 = f(t1, t2, p, r)
    • Note: p is common for z1 and z2, while q and r are most certainly different.
  • I have 204 data points for z2, while z1 is better covered (322 data points).

A toy model of my data (with somewhat realistic proportions of missing data) might start as follows:

t1    t2        z1      z2
1     0         1.20    30.0
1     1         0.64    
1     2         0.34    8.60
1     3         0.18    
2     0         1.60    40.0
2     1         0.86    21.4
2     2         0.46    11.5
2     3         0.25    6.13
3     0         2.00    
3     1         1.07    
3     2         0.57    
3     3         0.31    
4     0         2.40    60.0
...

The toy model used to create the data is f(t1, t2, p, q) = 1/q * (t1+1)*exp(-t2/p) with p=0.8 and q=1.25 to obtain z1; p=0.8 and q=0.05 to obtain z2.

As mentioned, z2 is missing certain measurements, which seems to complicate things...

Attempt 1: While I could filter out all data points where I don't have both data points, I'd lose quite a bit of information (in the example, all information where t1 = 3, for example). But then I could work with a objective function such as the following pseudocode:

def f(t1, t2, p, q):
    return 1/q * (t1+1)*np.exp(-t2/p)

def zero_function(z1, z2, t1, t2, p, q, r):
    err_1 = z1-f(t1, t2, p, q)
    err_2 = z2-f(t1, t2, p, r)
    return np.sqrt(err_1**2 + err_2**2)
    
z_model = Model(zero_function, 
                independent_vars=['z1', 'z2', 't1', 't2'],
                param_names=['p', 'q', 'r'],
                nan_policy='omit')
z_model.set_param_hint('p', value=1)
z_model.set_param_hint('q', value=1)
z_model.set_param_hint('r', value=1)
z_model.make_params()
z_model.fit(zero_array, z1=df['z1'], z2=df['z2'], t1=df['t1'], t2=df['t2']) 

(As an aside, I did not see any difference in behavior when changing nan_policy to 'raise'?).

If I don't equalize the length of z2 to t1 and t2, I get error on the lines of operands could not be broadcast together with shapes (204,) (322,)

Attempt 2: I tried to follow [https://pollackscience.github.io/multidim-fits.html], but that causes the same error as in attempt 1, again, because z2 is missing entries.

Attempt 3: Or I could create two models - one for z1, and one for z2 - but then I don't know how to ensure that the p parameter is supposed to be the same.

def f(t1, t2, p, q):
    return 1/q * (t1+1)*np.exp(-t2/p)

z_model = Model(f, independent_vars=['t1', 't2'], nan_policy='omit')
fit_z1 = z_model.fit(df['z1'], t1=df['t1'], t2=df['t2']) 
fit_z2 = z_model.fit(df['z2'], t1=df['t1'], t2=df['t2']) 

Doing this on the live data, the parameter p does not end up being the same, which is a problem for me.

Attempt 4: Or I could give up on using the Model() class, and go for minimize(), e.g. as done here [https://stackoverflow.com/questions/20339234/python-and-lmfit-how-to-fit-multiple-datasets-with-shared-parameters], but do I have to? At least, in Model() I think I understand how to use multiple independent variables, which I haven't quite understood in minimize() yet... The examples I've found about multiple independent variables have all used Model(), so it seems so far as I'm trading problems with multiple dependent variables for problems with multiple independent variables.

There got to be something obvious that I'm missing...?

Oh, did I say "slightly simplified"? Yeah, there are more than two parameters, and the function I'm using for the model is only defined implicitly, so in reality I use all of t1, t2, z1, z2 as independent variables, and use new variable z0 as the dependent variable - and that's constant 0. Then I can curve fit z1 - g(t1, t2, z1, p, q) to the zero function. Something like z1 - (t1+1)/q * np.exp((z1-t2)/p) to align it to my initial toy model. At least that part seems to work well. Where I'm stuck is to get z1 and z2 to fit to almost-the-same curve simultaneously.

Persilja
  • 21
  • 4

0 Answers0