Modelling data containing multiple function ranges in Python?

Question

Say, I have some data obtained from real-world measurements, which is therefore noisy, but otherwise shows multiple distinct "ranges" of operation. Each range could be seen as being best modeled by a specific function, so I call them also "functional ranges", in lack of a better term (apologies if I'm using wrong terminology throughout this Q, feel free to correct me).

I have attempted to model this kind of data with sim_signal_y, which demonstrates three ranges of operation, in the code below:

import math
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from scipy import signal


def get_constructed_function():
  flat_level = 0.07
  r1_len = 100
  region_1_x = np.linspace(0, r1_len-1, r1_len) # 100 steps for range [0, 99], default endpoint=True
  region_1_y = np.repeat(flat_level, len(region_1_x))

  r2_len = 200
  r2_fact = 0.039
  r2_offsx = math.log(flat_level, math.e)/r2_fact
  region_2_x = np.linspace(r1_len, r1_len+r2_len-1, r2_len)
  region_2_y = np.exp(r2_fact*(region_2_x+r2_offsx-r1_len))
  r2_ep = (region_2_x[-1], region_2_y[-1])
  r2_prep = (region_2_x[-2], region_2_y[-2])
  tanr2end = (r2_ep[1]-r2_prep[1])/(r2_ep[0]-r2_prep[0])

  r12_len = r1_len + r2_len
  r3_len = 700
  r3_fact = 0.067
  r3_xtan = ( tanr2end )/(-2*r3_fact*r3_fact)
  r3_y_xtan_nc = -(r3_fact*r3_xtan)**2
  r3_offsx = -r3_fact*( r2_ep[0] + abs(r3_xtan) )
  r3_yconst = r2_ep[1] - r3_y_xtan_nc
  region_3_x = np.linspace(r12_len, r12_len+r3_len-1, r3_len)
  region_3_y = r3_yconst-np.square(r3_fact*region_3_x+r3_offsx)

  all_x = np.concatenate( (region_1_x, region_2_x, region_3_x) )
  all_y = np.concatenate( (region_1_y, region_2_y, region_3_y) )
  return (all_x, all_y)

def noisify(in_y):
  # partly via https://stackoverflow.com/q/70216468
  num_elems = len(in_y)
  noise_orig_ampl = 1.0
  noise_y_first = np.random.default_rng().uniform(-noise_orig_ampl,noise_orig_ampl,num_elems)
  sos = signal.butter(N=2, Wn=1000, btype='lowpass', fs=4000, output='sos') #
  noise_y_filt = signal.sosfilt(sos, noise_y_first)
  noise_y_filt_scaled = ( noise_y_filt / np.max(noise_y_filt) )*0.05
  ret_y = np.add(in_y, noise_y_filt_scaled)
  return ret_y

def get_sim_signal():
  func_orig_x, func_orig_y = get_constructed_function()
  ax.plot(func_orig_x, func_orig_y, ls='dashed', color='black') # uncomment this line, to also plot the "original" constructed function
  ret_y = noisify(func_orig_y)
  return func_orig_x, ret_y



matplotlib.rc('axes.formatter', useoffset=False) # nowork for 1e125

fig, ax = plt.subplots()

arr_x, sim_signal_y = get_sim_signal()
ax.plot(arr_x, sim_signal_y, color='teal')

# attempt to fit sim_signal
order=30 #
z = np.polyfit(arr_x, sim_signal_y, deg=order)
p = np.poly1d(z) # class object for dealing with polynomials
ax.plot(arr_x, p(arr_x), color='darksalmon')

ax.get_yaxis().get_major_formatter().set_useOffset(False) # nowork for 1e125
ax.grid(True)
fig.canvas.draw_idle()

plt.show()

Basically, sim_signal_y in the example above, can be seen to demonstrate these three ranges of operation, given that these are apriori used in its construction:

Constant - for low x, the output values cluster around a constant value (taken to be 0.07 as example in the code)
- The noise is low enough, so it is perceptible that this region tends to be constant
Exponential - a region of fast exponential growth, followed by
Quadratic - a region of slower, quadratic growth

So, to remove the influence of noise, the code also shows, that I've tried to model this data with polyfit. Overall (seen throughout whole range), for order=30, the modelling looks quite good:

... however, the problem is that for the low values in constant range, the model is way off as it oscillates (green/teal is the signal; pink/orange is the model):

... and also, the oscillations are quite visible at the end of the final region:

I am aware, that it is likely impossible to model this kind of data in its entirety; however, I would like to find an approach that would allow me to specify different requirements - such as specifying the following:

Model the constant region with (as close as possible to) a "flat" line, with a value as close as possible to a possible "real" one (which in this example is apriori 0.07)
Model the final part of the data (say, from 950-1000 in the domain of the abscissa in this example) as close as possible (i.e. model line should pass through as many data points in this range as possible, and definitely land on the end point)
Model at least the start of the exponential growth region with decent (though not necessarily best) approximation
Larger errors in the other remaining regions are tolerable

Is there a modelling approach like this, that I could use in Python (matplotlib/numpy/scipy/etc)?

Modelling data containing multiple function ranges in Python?

0 Answers0