13

My understanding is that fractional polynomials and restricted cubic splines serve similar purposes. However, cubic splines are much more widely used outside statistics, and I have a better (mathematical) intuition for them. And indeed, there are currently 14 questions here tagged [fractional-polynomial] and 600+ tagged [splines].

Are there any circumstances in which one would prefer fractional polynomials to restricted cubic splines?

Mohan
  • 865

2 Answers2

20

Fractional polynomials and cubic splines each have the defects of their virtues.

The point of cubic splines is to be local and flexible and smooth and able to approximate any smooth curve.

The point of fractional polynomials is to not be able to approximate arbitrary smooth curves easily, on the belief that arbitrary smooth curves are not a statistically relevant class of functions. They are not local and much less flexible, again on the assumption that too much flexibility and localness is bad.

In settings where it's sensible to believe in fractional polynomials, they will be superior because they don't have unnecessary flexibility and they are sensitive to data across the whole range of $x$ in fitting $f(x_0)$. In settings where it's not sensible to believe in them, they will be inferior because they don't have necessary flexibility and they are sensitive to data across the whole range of $x$ in estimating $f(x_0)$.

You're unlikely to get agreement about which settings are which, but as an example:

  • modelling the relationship between wind and air pollution, I might be happy to use a fractional polynomial (perhaps after log transformation), because there could be fairly simple physical relationships to mixing volume of the atmosphere
  • modelling the relationship between time of year and air pollution, I would want some generic smoother such as a regression spline, because I don't think there's a simple relationship of any sort.
Thomas Lumley
  • 38,062
  • 2
    I really appreciate this answer, especially the points you draw out at the end, as well as the notion that neither is likely to win a best-approach-always award. – Alexis Nov 23 '23 at 06:47
  • 7
    +1. Interestingly what I mentioned as a shortcoming ("...have distortions to their overall shape by values at the tails...") you mention as a feature. I suppose the "it's not a bug, it's a feature" holds in Statistics too. – usεr11852 Nov 23 '23 at 10:47
  • May I used both fp() and natural spline transformation in one multivariate model? – Mikołaj Mar 07 '24 at 00:42
16

I can think of some, not-too-compelling circumstances where fractional polynomials (FPs) would be preferable to restricted cubic splines (RCSs):

  1. Direct interpretation of the functional form is more important. A spline model with multiple RCSs is harder to interpret than FPs. Especially in some physical modelling applications, FPs might be more palpable, providing a "simpler" way of describing the model fit.
  2. The number of observations is limited. FPs have (usually) fewer parameters to estimate compared to RCSs. Especially if one is concerned with the degrees of freedom used, employing FPs might have significant savings.
  3. We do not want to guarantee smoothness. Splines are guaranteed to be smooth, FPs are not.

A final thing to note is that FPs are much younger than RCSs. RCSs were quite mature and already offering competitive solutions to very similar problems as FPs for decades by the time FPs came about. Royston & Altman's seminal FPs paper: "Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modelling" (1994) was published almost two decades after Wahba et al. 's work on "smoothing by splines". Similarly, specialist books like: "The Theory of Splines and Their Applications" (1967) by Ahlberg et al. were already around from the late '60s. Simply put: FPs were a bit late to the regression party!

None of the above points is particularly... riveting so FPs haven't become a big draw against spline models. Ultimately, the choice between the two will depend on the specific modelling task.

Richard Hardy
  • 67,272
usεr11852
  • 44,125
  • 1
    (FPs have some more apparent shortcomings than RCSs too, for example, they might have distortions to their overall shape by values at the tails of the covariate distribution; splines, due to their piecewise nature are more robust in that sense. Your question asked about advantages though so I don't include these points in my answer.) – usεr11852 Nov 23 '23 at 03:54
  • 3
    +1 I think the first point, interpretability, is a smidge questionable… or at least narrow. While there is a straightforward mathematical interpretation of exponents, I find intuitive interpretation of fractional polynomials to be lacking (e.g., "Oh! That curve is clearly a function the sum of a variable raised to the 5/7ths and the same variable raised to the 3 halves!" isn't an actual thing said by research-tribe humans). In contrast, I find splines easy to communicate: the curve goes like this, until here when it inflects like that, until here when it inflects again, etc. – Alexis Nov 23 '23 at 06:43
  • 3
    Caveat to my previous comment: mathematical theories (e.g. in physics, ecology, etc.) may posit specific and theoretically meaningful fractional polynomial functional forms: that's not what my point is about. My point above is more about the semi-automated fractional polynomial fitting algorithms presented by Royston &Co which are likely to produce exponents not particularly theorized by researchers. – Alexis Nov 23 '23 at 06:50
  • The FP paper you cite says: "we feel that many users do not require such sophistication but do need models which are reasonably flexible, easy to understand, parsimonious and, perhaps above all, are simple and quick to fit using standard multiple-regression software". I don't think the 'above all' point has aged well! – Mohan Nov 23 '23 at 09:05
  • 1
    @Alexis: I agree, it is narrow. As I said both at the beginning and the end of my answer, I don't think any of these points are particularly strong. I think in certain applications of physical systems (especially when "time" is at play) they might make some "more" sense. (Thank you for your comment, I will mildly amend that point.) – usεr11852 Nov 23 '23 at 10:31
  • 1
    @Mohan: Both were medical statisticians, not forecasters. :) – usεr11852 Nov 23 '23 at 10:33
  • 6
    FP have two major, related disadvantages. They don’t work when $x \leq 0$ and the fitted function is sensitive to the origin of $x$. I.e. if you added a constant to $x$ the shape of the fit can change. – Frank Harrell Nov 23 '23 at 11:17
  • May I used both fp() and natural spline transformation in one multivariate model? – Mikołaj Mar 07 '24 at 00:43
  • @Mikołaj: In principle, yes, but I would suggest you make a separate question to get proper answers. – usεr11852 Mar 07 '24 at 01:28
  • @usεr11852 - one continuous variable has negative values; thus, I thought about transforming that variable using ns() while other try to find the best fractional polynomials – Mikołaj Mar 07 '24 at 09:03
  • Please create a separate question on this. This isn't something to answer in comments. (not least because it won't be searchable from other users) – usεr11852 Mar 07 '24 at 12:05