Good fitting but high variances ($SEIR $ type model)

Question

I am working on a susceptible ($S = S_1$ or $S=S_2$), exposed ($E$), infected ($I$), hospitalised ($H$), deceased ($D$) and recovered ($R$) model. Below is my model,

\begin{equation} \begin{split} \frac{dS_H}{dt} &= \sigma\Pi - \left( \lambda_1 + \lambda_2 + m_1 + \mu \right) S_H \\ \frac{dS_L}{dt} &= \left( 1-\sigma \right) \Pi - \left( \lambda_1 \tau_1 + \tau_2 \lambda_2 + m_2 + \mu \right) S_L \\ \frac{dV_1}{dt} &= m_1 S_H - \left( \epsilon \lambda_1 + \mu \right) V_1 \\ \frac{dV_2}{dt} &= m_2 S_L - \left( \epsilon \lambda_1 \tau_1 + \mu \right) V_2 \\ \frac{dV_3}{dt} &= \lambda_2 S_H - \left( \epsilon \lambda_1 + \mu \right) V_3 \\ \frac{dV_4}{dt} &= \tau_2 \lambda_2 S_L - \left( \epsilon \lambda_1 \tau_1 + \mu \right) V_4 \\ \frac{dE}{dt} &= \lambda_1 \left( S_H + \tau_1 S_L + \epsilon V_1 + \epsilon \tau_1 V_2 + \epsilon V_3 + \epsilon \tau_1 V_4 \right) - \left( \alpha + \mu \right) E \\ \frac{dI}{dt} &= \alpha E - \left( h \rho + \left( 1-h \right) \gamma + \mu \right) I \\ \frac{dH}{dt} &= h \rho I - \left(\eta + \mu \right) H \\ \frac{dD}{dt} &= f_1 \left(1-h \right)\gamma I - b D \\ % \frac{dB}{dt} &= \eta f_2 H + b D \\ \frac{dR}{dt} &= \left( 1-h \right) \left( 1-f_1 \right) \gamma I + \eta\left( 1-f_2 \right) H - \mu R \end{split} \end{equation}

where

\begin{eqnarray*} \lambda_1 &=& \frac{1}{N} \left( \beta_1 + ( \beta_0 - \beta_1)e^{-q_1 \frac{V_1+V_2}{N} - q_2 \frac{V_3+V_4}{N} } \right) \left(I + \delta D \right),\\ \lambda_2 &=& \frac{1}{N} \left( r_{l} I + r_d D \right). \end{eqnarray*} Further $S_H(0) = {S_H}_{,0},$ $S_L(0) = S_{L,0},$ $V_1(0) = V_{1,0},$ $V_2(0) = V_{2,0},$ $V_3(0) = V_{3,0},$ $V_4(0) = V_{4,0},$ $E(0) = E_0,$ $I(0) = I_0,$ $H(0) = H_0,$ $D(0) = D_0$ and $R(0) = R_0$.

I fit the cumulative cases variable and cumulative vaccinations of my model. $$ \int_{0}^t E(s) ds $$ and $$ \int_0^t \left( m_1 S_H(s) + m_2 S_L(s) + \lambda_2 S_H(s) + \lambda_2 \tau_2 S_L(s) \right) ds .$$

to the cumulative cases data (denoted by CumCasesData on my code) and the cumulative vaccinations (denoted by CumVaccData on the code), respectively.

The code:

%reset -f
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
from scipy import integrate, optimize
import pandas as pd
from sklearn.metrics import r2_score
Total population, N, and birth-rate Pi.
Total_pop  = 11701878;
mu = (1/(60365)); 
Pi = Total_popmu;
p   =  0.70
Calculating f1
Total_cases = 2976
Total_deaths = 1990
Total_hospitalised_persons = 681
Total_nonhospitalised_persons = Total_cases - Total_hospitalised_persons
CFR_hospitalised_persons = (0.351+0.497+0.335+0.513)/4
Total_deaths_among_hospitalised_persons =  Total_hospitalised_persons*CFR_hospitalised_persons
Total_deaths_among_nonhospitalised_persons = Total_deaths - Total_deaths_among_hospitalised_persons
CFR_nonhospitalised_persons = Total_deaths_among_nonhospitalised_persons/Total_nonhospitalised_persons
Days = np.array([  1,   8,  16,  22,  29,  36,  43,  50,  59,  64,  72,  78,  85,
         92,  99, 107, 114, 121, 128, 134,143,150,155,163,171,177,183,190,197,204]);
CumCasesData = np.array([  43,   57,  102,  111,  122,  132,  142,  150,  162,  181,  216,
        238,  268,  300,  333,  373,  421,  453,  500,  539, 585, 608, 625, 658, 699, 734, 785, 816, 840, 872]);
CumDeaths =  np.array([34,41,59,75,82,91,97,100,106,115,139,155,174,186,209,217,241,268,289,315,356,368,377,402,433,461,484,513,537,548])
CumVaccData = [0,0,0,4130,  6134,  8229,  9572, 11417, 13550, 15285, 17976, 20789,
       24142, 26687, 28727, 32625, 35958, 39845, 44447, 48119, 53610,
       54153, 56509, 60460, 64403, 69231, 73309, 77680, 80989, 83755]
Data = np.concatenate([CumCasesData,CumVaccData])
alpha = 1/10; gamma = 1/5.6; f1 = CFR_nonhospitalised_persons;  epsilon = 0.025;  b = 0.58;
f2 = (0.351+0.497+0.335+0.513)/4; 
rho = 1/5.5
h = Total_hospitalised_persons/Total_cases
sigma = 1797039/Total_pop #
a = 36397/11701878
c = 1 - a
g1 = 0
g2 = 0
def deriv(y,Days,r_l, r_d, beta0, beta1, eta, tau1, tau2, m1, m2, delta, q1, q2):
    S_H, S_L, V, dVhighGTdt, dVhighRingdt, dVhighdt, dVlowGTdt, dVlowRingdt, dVlowdt, V1, V2, V3, V4, V5, V6, E,  CumCases, I, H, CumDeaths, D, B, R, N  = y;
    lambda1 =  (beta1+(beta0-beta1) * np.exp( -((q1(V1+V2))/N)-((q2(V3+V4))/N) )) * ((I + deltaD)/N) 
    lambda2 = (r_lI + r_dD)  (1/N)
    dS_Hdt = sigmaPi - ( lambda1S_H  + lambda2S_H   + m1  S_H   + g1S_H)  - muS_H;
    dS_Ldt =  (1-sigma)Pi - ( lambda1 tau1 * S_L + lambda2 * tau2 * S_L + m2 * S_L + g2S_L  ) - muS_L
    dVdt =  m1S_H + m2S_L + lambda2S_H + lambda2tau2S_L + g1S_H + g2S_L;
    dVhighGTdt  = g1S_H;
    dVhighRingdt = m1S_H + lambda2S_H ; 
    dVhighdt = m1S_H + lambda2S_H + g1S_H;
    dVlowGTdt =  g2 S_L; 
    dVlowRingdt =   m2S_L + lambda2 tau2 S_L + g2S_L; 
    dVlowdt = m2S_L + lambda2 tau2 S_L + g2S_L;
    dV1dt =  m1S_H - epsilon lambda1 * V1 - muV1;
    dV2dt =  m2S_L - epsilon * lambda1 * tau1 * V2 - muV2;
    dV3dt = lambda2S_H - epsilon *  lambda1 * V3  - muV3;
    dV4dt =  tau2  lambda2 S_L - epsilon  lambda1 * tau1 * V4 - muV4; 
    dV5dt = g1S_H - epsilon * lambda1 *  V5  - muV5;
    dV6dt = g2S_L - epsilon * lambda1 * tau1*  V6  - muV6;
    dEdt =   lambda1(S_H+ tau1* S_L + epsilonV1 + epsilontau1V2 + epsilonV3 + epsilontau1V4 + epsilonV5 +  epsilon  tau1* V6 ) - alphaE - muE;
    dCumCasesdt = alphaE;

    dIdt =  alphaE - hrhoI - (1-h)gammaI - muI;
    dHdt = hrhoI - etaH - muH;
    dCumDeathsdt = f1(1-h)gammaI + etaf2H;
    dDdt = f1(1-h)gammaI  - bD;
    dBdt = etaf2H + bD
    dRdt = (1-h)(1-f1)gammaI + eta (1-f2)H - muR;
    dNdt = Pi- muN + (mu-b)D - etaf2*H
    return dS_Hdt, dS_Ldt, dVdt, dVhighGTdt, dVhighRingdt, dVhighdt, dVlowGTdt, dVlowRingdt, dVlowdt, dV1dt, dV2dt, dV3dt, dV4dt, dV5dt, dV6dt, dEdt, dCumCasesdt, dIdt, dHdt,  dCumDeathsdt, dDdt, dBdt, dRdt, dNdt
#Initial values
SH0, SL0, CumVacc0, CumVacchighGT0, CumVacchighRing0, CumVacchigh0, CumVacclowGT0, CumVacclowRing0, CumVacclow0, V10, V20, V30, V40, V50, V60, E0, CumCases0, I0, H0, CumDeaths0, D0, B0, R0, N0  = 1797039, Total_pop-1797039, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 43, 9, 0, 34, 0, 34, 0, Total_pop
# Initial conditions vector

y0 = np.array([SH0, SL0, CumVacc0, CumVacchighGT0, CumVacchighRing0, CumVacchigh0, CumVacclowGT0, CumVacclowRing0, CumVacclow0, V10, V20, V30, V40, V50, V60, E0, CumCases0, I0, H0, CumDeaths0, D0, B0, R0, N0]);
def fit_odeint(Days, r_l, r_d, beta0, beta1, eta, tau1, tau2, m1, m2, delta, q1, q2):
    ret = odeint(deriv, y0, Days,  args=(r_l, r_d, beta0, beta1, eta, tau1, tau2, m1, m2, delta, q1, q2))
    SH, SL, CumVacc, CumVacchighGT, CumVacchighRing, CumVacchigh, CumVacclowGT, CumVacclowRing, CumVacclow, V1, V2, V3, V4, V5, V6, E, CumCases, I, H, CumDeaths, D, B, R, N = ret.T
    Fitted1 = CumVacc
    Output = np.concatenate([CumCases,CumVacc])
    return Output
lower_bounds = [10, 26, 1.6, 0.9, 0.06, 0.02, 0.8, 0.7526008999628935e-07, 5.163295828430289e-06, 0.7, 1150, 500]
upper_bounds = [13, 31, 1.8, 1, 0.069, 0.03, 0.9, 0.9026008999628935e-7,6.463295828430289e-06, 0.9, 1200, 2000]
my_bounds = [lower_bounds,upper_bounds]
init_vals = np.array([11.92001541438225, 29.966685050861358, 1.70242164e+00, 9.26076365e-01,
       6.81429695e-02, 2.44124561e-02, 8.80134762e-01, 9.02595576e-08,
       5.86358993e-06, 8.01571823e-01, 1.19400783e+03, 1.57352147e+03])
popt, pcov = optimize.curve_fit(fit_odeint, Days,Data, init_vals,  bounds = my_bounds)
r_l, r_d, beta0, beta1, eta, tau1, tau2, m1, m2, delta, q1, q2 = popt
ret = odeint(deriv, y0, Days,  args=(r_l, r_d, beta0, beta1, eta, tau1, tau2, m1, m2, delta, q1, q2))
SH, SL, CumVacc, CumVacchighGT, CumVacchighRing, CumVacchigh, CumVacclowGT, CumVacclowRing, CumVacclow, V1, V2, V3, V4, V5, V6, E, CumCases, I, H, CumDeaths, D, B, R, N = ret.T
fig, (ax1,ax2) = plt.subplots(1,2, sharex=True, figsize=(15,5))
ax1.scatter(Days,CumCasesData,s=30, facecolors='none', edgecolors='k', label='Data')
ax1.plot(Days,CumCases, label='Model output')
ax1.set_xlabel('Time (days)')
ax1.set_ylabel('Cumulative EVD cases')
ax1.grid(False)
ax1.legend()
ax2.scatter(Days,CumVaccData,s=30, facecolors='none', edgecolors='b', label='Data')
ax2.plot(Days,CumVacc, label='Model output')
ax2.set_xlabel('Time (days)')
ax2.set_ylabel('Cumulative ring vaccinations')
ax2.grid(False)
fig.savefig('Fig.jpg')

The fitting looks perfect, the image is attached.

The problem is that the standard deviation extracted from the covariance matrix (pcov - on my code) is very large. Below are the quantities of the standard deviation of my estimates.

[296.1213779707799,1497.1467843074781,728.3991923017029,413.7925525621536,155533.19079498723, 84.0797571357458, 0.06605222504609057,
0.004324347985989968,  0.00072971899400099, 546.2799958303086
73085.07023277077, 19970.935418618858]

Now, these large standard deviation results in a very large confidence interval for each of the estimated parameter which is not what I wished for especially with this good looking fitting.

How come that I have this perfect-looking fitting but very large confidence intervals?

Also how to get small or reasonable confidence intervals?

Sextus Empiricus · Accepted Answer · 2022-12-10T21:52:34.557

Here is an early example of fitting an SIR model to a few data points. The variation in the parameters is large because the data caries little information about the parameters.

In your case you have a bit more data points, but with the many parameters in your model there is a very large range of settings that work for your data. The range of the data is only very small.

See also this question (Coronavirus growth rate and its possibly spurious resemblance to vapor pressure model) where a lot of models, even nonsense models can fit the points. What you have is an approximately exponential growth and any function that has some way to model exponential growth with a slight change of the growth rate, is able to reasonably fit the data. In the image below you see two models fitting the data well (green and black curve) and several other work as well too.

Another example is from this question Why would you perform transformations over polynomial regression? The data is a little cloud of points that can fit many functions because the range is only small

Another aspect is that your parameters can be correlated. The standard deviations that you have are for the marginal distributions and it might be that with some reparameterisation, a combination of parameters, you get a small standard deviation.

In the first image (from this question: Fitting SIR model with 2019-nCoV data doesn't conververge), the parameters $\beta$ and $\gamma$ had large deviations, but the parameter $c = \beta-\gamma$ (the growth rate) did not have a large standard deviation.

Another example from this question Why and how does adding an interaction term affects the confidence interval of a main effect? shows the distribution of the slope and intercept parameters. The naive way would be to conclude that both parameters have large standard deviations and there is no significant effect. But, this ignores the correlation. With a reparameterisation the intercept term is clearly significant.

Good fitting but high variances ($SEIR $ type model)

Total population, N, and birth-rate Pi.

Calculating f1

1 Answers1