0

I have the following equation $y(x) = \int_a^b g(u) f(u - x) du$, where I observe values of $y(x)$, and I know the functional form of the density $f(u-x)$. Also, $x, a$ and $b$ are known. $[a,b]$ is a compact subset, and all functions are smooth and well-behaved. $g(\cdot)$, however, is unknown, and I would like to estimate it given my knowledge of the other primitives.

To this end, I thought that the best option might be to use a restricted spline (similar to what is shown here) to transform $$g(u) = \alpha_0 + \alpha_1 u + \sum_{i=1}^{K-2} \beta_i B_i(u),$$ where K are the knots for the spline and $B_i(x)$ are the bases as defined in the link above. To estimate $\{\hat{\beta}_i\}_{i=1}^{K-2}$ and $\{\hat{\alpha}_1,\hat{\alpha}_2\}$, I then

  1. Take the summation out of the integral and compute all terms in $\int_a^b g(u) f(u - x) du$, i.e., $A_0(x) = \int_a^b f(u - x) du$, $A_1(x) = \int_a^b u f(u - x) du$, $\Gamma_1(x) = \int_a^b B_1(u) f(u - x) du$, ...

  2. Run the regression $y = \alpha_0 A_0 + \alpha_1 A_1 + \beta_1 \Gamma_1 + ... + \beta_{K-2} \Gamma_{K-2} + \epsilon$, where $\epsilon$ is an error term.

However, when I do so, I find that the integrated bases, $\Gamma_1, ..., \Gamma_{K-2}$, are highly correlated with each other (+70%), which could bias the estimation of $\{\hat{\beta}_i\}_{i=1}^{K-2}$. Some correlation obviously arises because of the integration step; however, it seems very high.

Would you know a way to reduce this correlation with different bases? Or would you suggest a different approach?

Yves
  • 5,358
Andrew
  • 213
  • 1
  • 8
  • 2
    You could use a basis of B-splines rather than truncated power splines. – Yves Feb 01 '23 at 17:07
  • 3
    Just because the bases are correlated doesn't mean there will be any bias in the estimation of the spline as a whole, or indeed any of the individual terms. Bias is caused by model misspecification or correlation of the residual with the features, not by correlation between the features. – jbowman Feb 01 '23 at 17:07
  • @yves would you have a reference for a B-splines? I like the restricted splines because it uses few coefficients (K-2) since then I need to use IVs in the estimation and I might not have enough for many knots – Andrew Feb 01 '23 at 17:08
  • @jbowman am afraid of multicollinearity here (the actual regression model is more complicated). – Andrew Feb 01 '23 at 17:10
  • 2
    You could always generate an orthogonal basis from the splines. I doubt it would do anything for you, because multicollinearity among the spline components is likely irrelevant, no matter what kind of spline you are using. If you're testing a spline for significance, all that matters is (in some sense) its "group collinearity" with all the other regressors. This would be measured as the potential inflation in the p-value for the associated "chunk" F-test. – whuber Feb 01 '23 at 17:13
  • 1
    @Andrew There are so many possible references on B-splines... You should have a look to the books you have access to. You should also care for the boundary conditions you want although the convolution will obscure this question. – Yves Feb 01 '23 at 18:36
  • @Yves - My understanding is that B-splines require the estimations of many parameters, while the restricted splines seem to work ok already with 4 or 5 knots. Is there an alternative to B-splines? – Andrew Feb 02 '23 at 10:29
  • 1
    From this answer it seems that restricted splines are B-splines completed with functions that are linear. You could get an equivalent B-spline basis by repeating some knots in a basis of B-splines see my comment to this question. – Yves Feb 02 '23 at 13:33
  • @whuber, could you please explain me why multicollinearity among the integrated bases wouldn't affect the final regression estimates? I'm sorry but that's not clear to me. thank you. – Andrew Feb 02 '23 at 15:33
  • 2
    The spline is a curve constructed as a linear combination of basic pieces. It doesn't matter what those pieces might be or what their coefficients are, provided you get the same curve. Thus, examining and testing spline coefficients is ordinarily meaningless. – whuber Feb 02 '23 at 19:28
  • 1
    Thank you @whuber, it's the first time I use splines, but I see what you mean. My problem is that step 2 (regression), there are also other covariates, which could be correlated with the integrated bases through $x$. All these variables are endogenous, and I'll need IVs. I believe that if the bases are multicollinear and correlated with the other covariates could create problems. Perhaps creating orthogonal basis as you suggested might help. Thank you – Andrew Feb 02 '23 at 22:20

0 Answers0