Rank of smoothing matrix

Question

The solution to the minimizing problem: $$RSS(f, \lambda) = \sum_{i=1}^n (y_i - f(x_i))^2 + \lambda \int (f''(t))^2 dt$$ is written in matrix form as $f = N\theta = \sum_{j = 1}^n N_j(x_i) \theta_j$ where the $N_j(x)$ are a $n$-dimensional set of basis functions representing a natural cubic spline. Together with a choice of $\lambda$ we have a smoothing spline with the estimate $$\hat f = S_\lambda y$$ where $$S_\lambda = N(N^tN + \lambda \Sigma_N)^{-1}N^t$$ and $\Sigma_{N_{ij}} = (\int N''_i(t)N''_j(t) dt)_{ij}$ and $N_{ij} = N_j(x_i)$.

Why is $\mathrm{Rank}(S_\lambda) = n$, or equivalently $N$ invertible?

My answer to a slightly different question at https://stats.stackexchange.com/a/180360/919 gives one explanation. Because $S_\lambda$ is recognizable as a "Ridge regression hat matrix," many other posts are also directly relevant. Note that $\lambda \gt 0$ is necessary to conclude $S_\lambda$ is of full rank. — whuber, Jan 04 '23 at 15:25
Your notations are very confusing: "$N$" is used for denoting the number of observations and a matrix simultaneously. Could you please clarify that more clearly? — Zhanxiong, Jan 04 '23 at 17:05
@whuber thank you for the comment. I couldn't really figure out how to use your answer to deduce that the rank is full for this matrix. I know that $N^tN + \lambda \Sigma_N$ is invertible and so full rank. But not why the kernel after applying first $N^t$, then the invertible and lastly $N$ is 0.
To your note: if $\lambda \to 0$ then $S_\lambda = I_n$ which has rank $n$ so full rank? — Lejoon, Jan 04 '23 at 20:27
@Zhanxiong thanks for the comment, I did some clarification of the question. — Lejoon, Jan 04 '23 at 20:27
My explanation applies directly, with no additional reasoning required: $S_\lambda$ is the product of three matrices and all have rank $n:$ $N$ and $N^\prime$ by stipulation and $(N^\prime N + \lambda \Sigma_N)^{-1}$ due to its explicit expression as an inverse. — whuber, Jan 04 '23 at 21:00
Your notation is still not that clean. Here is my guess: you have an $n \times p$ matrix $N$ with $\operatorname{rank}(N) = p$, $\lambda > 0$, $\Sigma$ in a $p \times p$ positive definite matrix. And your goal is to prove $S := N(N'N + \lambda\Sigma)^{-1}N'$ is of rank $p$. Can you confirm? If $n > p$, it's impossible for $S$ having rank $n$ (i.e., invertible), because the rank is at most $p$. — Zhanxiong, Jan 04 '23 at 21:09
@Zhanxiong everything correct except that $N$ is $n \times n$. — Lejoon, Jan 04 '23 at 21:17
@Lejoon OK. But if so, why are you still asking the question "how do we know $N$ is invertible?" You are given that $N$ is invertible (if it is of full column rank). — Zhanxiong, Jan 04 '23 at 21:17
@Zhanxiong How do we know the basis functions can't be linearly dependent? — Lejoon, Jan 04 '23 at 21:30
@Lejoon Can you provide the reference where your question comes from? I really want to get all the notations and background straight (sorry! I really had some difficulty in understanding your post...) — Zhanxiong, Jan 04 '23 at 22:32
@Zhanxiong $N$ is not likely invertible, because it usually isn't square. But nobody claims $N$ is invertible. The assumption in the question is that it is of full rank. Lejoon: by definition a basis is linearly independent and by definition an "$n$-dimensional set" spans a space of $n$ dimensions. This would be a good time to consult a linear algebra textbook. — whuber, Jan 05 '23 at 00:34
@whuber That was what I thought. I said "$N$ is invertible" because I asked OP to confirm if "$N \in \mathbb{R}^{n \times p}$ is of full rank" is correct. He said yes "except $N$ is $n \times n$" (which probably turned out to be a false claim as you pointed out). If the problem is just to prove $\operatorname{rank}(S_\lambda) = p$, it is quite straightforward. — Zhanxiong, Jan 05 '23 at 00:43
@whuber no need to be condescending with "consult a linear algebra textbook" to read on what a basis is.
Why do the "basis functions" of the natural cubic splines constitute a basis? According to the authors' (Hastie et al) "basis functions" are just constitutents of a model $f(X) = \sum_m^M \beta_m h_m(X)$, there's a priori no requirement for them to actually be a basis for a vector space. You can for example, a priori, have repeating $h_m$ for different factors? This setting is of course more specific as it pertains to a specific minimisation problem. — Lejoon, Jan 05 '23 at 11:54
@Zhanxiong This is from Elements of Statistical Learning by Hastie et al on page 151 and forward. It is available freely here: https://hastie.su.domains/ElemStatLearn/. — Lejoon, Jan 05 '23 at 11:54
No condescension is intended or implied: your question and comments indicate you really need to consult a good account of the terminology, definitions, and basic concepts of linear algebra, because they answer everything. Your subsequent comments continue to demonstrate that. — whuber, Jan 05 '23 at 15:08
@Lejoon After checking ESL and did some calculation, I agree that proving $N$ is invertible is non-trivial. $N$ is similar to a Vandermonde matrix but considerably more complicated. My thought is to substitute (5.4) to $N$ and to show $\det(N) \neq 0$. But this is, of course, demands tons of calculation. — Zhanxiong, Jan 05 '23 at 18:35
I still disagree, because your characterization of $N$ as an "$n$-dimensional set of basis functions" settles the matter by its very definition. I suspect @Zhanxiong has a different interpretation of what you mean, suggesting some clarification might be useful here. — whuber, Jan 05 '23 at 21:20
@whuber As you may know, many statistics text lacks sufficient math rigor -- ESL authors just claimed, but without giving a proof, that "$N_1, \ldots, N_n$" are basis functions. To be rigorous, this needs to be proved, as Lejoon is seeking for. — Zhanxiong, Jan 05 '23 at 21:24
@Zhanxiong My understanding, when somebody calls something a set of basis functions, is that they intend the usual definition: namely, that the set is linearly independent and spans the stipulated vector space. If something needs proving, then it would concern the construction of $N,$ not its characterization. Now this very well might be an issue with ESL (which, because it was written for people already familiar with a great deal of statistics and mathematics, has many lacunae), but it's not in evidence (yet) in this question. — whuber, Jan 05 '23 at 21:31
@whuber Moreover, even if we accepted that $N_1, \ldots, N_n$ are basis functions, it is still different from claiming $N = (N_{j}(x_i)) \in \mathbb{R}^{n \times n}$ is invertible (probably the latter is true too, but extra work needs to be done). — Zhanxiong, Jan 05 '23 at 21:31
@Zhanxiong That latter claim is elementary and immediate from basic concepts of linear algebra. Indeed, many textbooks will explicitly state a series of propositions relating equivalent characterizations of invertible matrices, their corresponding linear transformations, and their determinants. But -- I need to re-emphasize this, apparently -- nobody is making claims about "invertibility" of the generally non-square matrix $N$. The formula in this question involves the inverse of a regularized version of the cross-product of $N.$ Any matrix expressed as an inverse is obviously invertible. — whuber, Jan 05 '23 at 21:33
@whuber For better discussion, I hope you can take a look at the ESL and post your complete answer (if you think it is not that hard). For your convenience, I will disclose a partial answer that I haven't finished. — Zhanxiong, Jan 05 '23 at 21:39
@whuber For your last comment, consider the vector space $F_n[x]$, the space of polynomials with degree less than $n$. It is way easier to prove ${1, x, \ldots, x^{n - 1}}$ is a basis of $F_n[x]$ (it just a direct check of definition) than proving the matrix $(x_i^{j})$ is invertible for distinct $x_1, \ldots, x_n$ (it needs to compute a Vandermonde determinant). And the implication from the former of the latter is not that obvious, at least to me. — Zhanxiong, Jan 05 '23 at 21:46
@whuber $N$ is indeed a square matrix, and is claimed invertible (because the author said $S_\lambda$ is full rank) -- that's why I suggested you checking the book or my answer below (as at the very beginning I also thought $N$ is rectangular, but it is not the case). — Zhanxiong, Jan 05 '23 at 21:50
@whuber "basis functions" is not a linear algebra term to start with. It has no meaning, a priori, in linear algebra. A "basis" in statistics doesn't need to be set of basis elements of a vector space. This isn't the only place in the statistical literature that uses that description loosely. Not only in statistics either but the term basis is used widely in non-algebraic settings in mathematics. Basis outside of linear algebra often means a subset that generates a larger set, without the "linearly independency" requirement as there usually is none equivalent. — Lejoon, Jan 05 '23 at 21:52
Also I guess I don't understand why you say that is not a square matrix @whuber and hence the equivalence of the statements, care to elaborate? — Lejoon, Jan 05 '23 at 21:52
@whuber I would actually mean that the functions do not form a vector space basis in a sense that's helpful to this problem. The functions $N_i(x)$ are not even all, a priori, polynomials only piece-wise. So what is the vector space where they form a basis that is applicable to this problem of showing that $N$ is full rank/invertible according to you?
Only when you evaluate them at specific points they turn into a polynomial basis over a certain sub-vector space of $\mathbb{R}[x_1, \ldots, x_n]$. — Lejoon, Jan 05 '23 at 22:03
@Lejoon I think calling $N_j(x)$ "basis functions" is still fine, as long as they have been proved to be linearly independent (i.e., they form a subset of a basis of the spline space). The real problem is, the authors didn't clearly define what the "spline space" is, let it alone providing a basis in linear algebra sense (I am not even sure what the dimensionality of the vector space is -- I did some googling, but no satisfactory answer popped out). — Zhanxiong, Jan 05 '23 at 22:10
@Zhanxiong Spline algorithms are constructed to guarantee linear independence. The "spline" ultimately is the collection of column vectors of $N,$ generating a subspace. There's really nothing to prove. — whuber, Jan 06 '23 at 01:05
@whuber Again, basis functions are linearly independent doesn't easily imply $N$ is invertible. I see your point that you insist "there is nothing to prove", but if something is really true, it should be able to be proved (that is, it won't be unprovable). There used to be an exactly same question on our site, unfortunately, the "proof" there had some mistake. — Zhanxiong, Jan 06 '23 at 01:12
@Zhanxiong I'm sorry, we don't seem to be communicating. The rank of $N$ can be literally anything, even zero. The only thing that needs to be invertible is the internal expression in parentheses -- I think I've said this three times already -- and, because it is expressly given as an inverse, it's invertible. That guarantees the rank of $S_\lambda$ is the rank of $N,$ as the post I originally linked to explains. So, understanding "n-dimensional" and "basis" in the most elementary, conventional way, there's nothing to show. — whuber, Jan 06 '23 at 01:17
@whuber No, it can't be anything, because $N$ is a given matrix (i.e., every entry of it has a determined value). So its rank is a unique value. I believe both me and OP understood the middle expression is invertible. The real difficulty is to prove $N$ is invertible. If you have checked ESL, the author clearly stated that rank of $S_\lambda$ is $n$, not other values. — Zhanxiong, Jan 06 '23 at 01:23
@Zhanxiong This is where our interpretations differ. I read ESL as stating the rank of $N$ is $n$ and you, apparently, do not. That's OK; but then you will need to decide exactly how $N$ might fail to be of full rank and that's going to require some assumptions that I think aren't in ESL. Thus, I think you are over-analyzing this. — whuber, Jan 06 '23 at 01:27
@whuber Can you enter the chat room to avoid extended discussions? — Zhanxiong, Jan 06 '23 at 01:34

whuber · Accepted Answer · 2023-01-09T14:27:13.537

A spline is a method of fitting a function to a set of points $(x_1,y_1),$ $(x_2,y_2),$ $\ldots, (x_n,y_n).$ The answer to the question is because natural cubic splines exist.

The full explanation appears in the last two sentences of this post. The intervening material consists of standard definitions, constructions, and observations in linear algebra to establish notation and fix the concepts. It is included because (a) the comment thread to the question indicates there is a risk of misinterpreting some of the terminology and (b) taking this abstract approach will demonstrate we need to know only that splines exist, but we don't need any details of their construction. As such, the results are both easy to obtain and they apply to any kind of spline (in the extremely general sense defined below).

The vector space interpretation is also helpful for bringing to the fore two fundamental operations associated with splines: (i) using splines to approximate arbitrary functions and (ii) passing splines through finite sets of points. I will address these in turn, but wish to emphasize there's little to demonstrate and nothing to calculate.

Linear algebra

To set the stage, we will be working with functions $f:X\to \mathcal R$ defined on the set $X$ of all possible values of the data $x_i.$ The codomain is a field $\mathcal R$ (usually the real or complex numbers). The set of these functions, $\mathbb R^X,$ is naturally a vector space under pointwise addition of functions and scalar multiplication. That is, given any $x\in X,$ for all $f,g\in \mathcal R^X$ and $\alpha,\beta\in\mathcal R,$

$$(\alpha f+\beta g)(x) = \alpha f(x)+\beta g(x).$$

A spline is, first of all, a vector subspace $\mathbb G \subset \mathcal R^X:$ we intend to use the functions in $\mathbb G$ to approximate arbitrary functions in $\mathcal R^X.$

Any finite collection of distinct "knots" $\Xi=(\xi_1, \xi_2, \ldots, \xi_p)\in X^p$ determines a linear transformation $\iota_\Xi: \mathcal R^X \to \mathcal R^\Xi$ via restriction to the domain $\Xi,$

$$\iota_\Xi(f)(\xi_j) = f_{|\Xi(\xi_j)} = f(\xi_j)$$

for $j= 1, 2, \ldots, p.$ The codomain $\mathcal R^\Xi$ is canonically isomorphic to the standard $p$-dimensional vector space $\mathcal R^p:$ simply write the function $\iota_\Xi(f)$ as the vector $\mathbf y = (f(\xi_1),f(\xi_2),\ldots,f(\xi_p)).$ I will therefore treat these as the same vector space without further comment.

The image $\iota_\Xi(\mathbb G) \subset \mathcal R^\Xi$ is the set of values that spline functions can take on $\Xi.$ The assertion "splines exist" means that whenever $\Xi$ is a sequence of knots, $\iota_{\Xi}$ is a surjection: to every possible set of values $\mathbf y = (y_1,y_2,\ldots, y_p),$ there is a spline function $f\in\mathbb G$ attaining those values at the knots.

The foregoing characterizes a spline as a function space $\mathbb G$ with an existence property. I am not aware of any such characterization in the literature, but believe it applies to every type of spline that has been invented.

Existence is usually proven by exhibiting a right inverse for arbitrary $\iota_\Xi.$ That is, given $\mathbf y,$ a function $g\in\mathbb G$ is constructed for which $\iota_\Xi(g) = \mathbf y.$ This chapter of ESL includes several such constructions for natural cubic splines defined on real numbers $X=\mathbb R,$ so I take the existence as not in question here.

Let $\iota_\Xi^*$ be such a right inverse: that is, $\iota_\Xi^*$ is a vector space isomorphism between $\mathcal R^p$ and a subspace of $\mathbb G$ where, for all $\mathbf y \in \mathcal R^p,$

$$\iota_\Xi(\iota_\Xi^*(\mathbf y)) = \mathbf y.$$

Because $\iota_\Xi$ is an isomorphism, any basis $\mathcal E$ of $\mathcal R^p$ (that is, a linearly independent spanning subset) corresponds to a basis $\iota_\Xi^*(\mathcal E)$ of the image. The latter is what ESL means by a basis of spline functions. As we have seen, these really are functions and it really is a basis of a special finite-dimensional subspace of functions.

The circumstances of the question

That's it for the linear algebra preliminaries. Next, let's address the unstated assumptions of the question. These are

$N$ is the matrix determined by a spline with "knots at the data values."
The $x_i$ are distinct.

These indicate the matrix $N$ in the question is a special one where $p=n$ (assumption (1)) and the data values $x_i$ are the knots $\xi_i$ (which must be distinct, assumption (2)).

Pick any basis $\mathcal E = (\mathbf e_1, \ldots, \mathbf e_n).$ Each $\mathbf e_j = (e_{ij}, j = 1, \ldots, n)$ is an $n$-vector. The matrix $N$ in the question is constructed from the $n$ knots $\Xi \subset \mathcal R^X$ and the basis $\mathcal E \in \left(\mathcal R^n\right)^n$ by applying each of the corresponding basis spline functions to the data:

$$N = (n_{ij}) = \iota_\Xi^*(\mathbf e_j)(x_i) = \iota_\Xi^*(\mathbf e_j)(\xi_i) = (e_{ij}).\tag{*}$$

(In the notation of the question, the basis functions are $N_j = \iota_\Xi^*(\mathbf e_j)$ and $N = (n_{ij}) = N_j(x_i).$)

Solution

The identity $(*)$ shows $N$ is a square $n\times n$ matrix equal to the matrix of $\mathcal E.$ Because $\mathcal E$ is linearly independent, this matrix has rank $n$ and so does its transpose. Equivalently, $N$ is invertible.

The question concerns the rank of a matrix of the form

$$S_\lambda = N^\prime A_\lambda^{-1} N.$$

The interior matrix in this product is invertible (its inverse is $A_\lambda$). Because natural cubic splines exist, $N$ is invertible, whence $S_\lambda$ (which represents a composition of three invertible linear transformations) also is invertible and therefore must have the same rank $n$ as its components, QED.

Thank you @whuber. Just checking if I've understood correctly, should it be: "The codomain \mathcal R^\Xi is canonically isomorphic to the standard $\mathcal R^n$." or $\mathcal R^p$? — Lejoon, Jan 09 '23 at 08:45
Thank you for the close reading -- you are correct. I'll fix that typo right away. — whuber, Jan 09 '23 at 14:26

Zhanxiong · Answer 2 · 2023-01-06T01:47:01.883

This is a partial answer to be completed, I posted it just for clarification and drawing more attention.

I have consulted the ESL section. Let me first clarify the notations. The problem is 1D function estimation so the training data is $(x_i, y_i): i = 1, \ldots, n$ (as I pointed out in the comment, using $N$ to denote the number of observations is an unfortunate confusion, even in ESL the spline matrix is boldfaced. So here I will still use $n$ to denote the number of observations). The spline matrix $N$ is defined as \begin{align} N = \begin{bmatrix} N_1(x_1) & N_2(x_1) & \cdots & N_n(x_1) \\ N_1(x_2) & N_2(x_2) & \cdots & N_n(x_2) \\ \vdots & \vdots & \ddots & \vdots \\ N_1(x_n) & N_2(x_n) & \cdots & N_n(x_n) \end{bmatrix}. \tag{1} \end{align} Evidently, if $x_i = x_j$ for some $i \neq j$, then clearly $N$ is not invertible (because then the $i$-th row and the $j$-th row are identical). So in order $\operatorname{rank}(S_\lambda) = n$ (a claim made by ESL, p.153), $x_1, \ldots, x_n$ must be distinct (without loss of generality, assume $x_1 < x_2 < \cdots < x_n$).

According to the author:

$N_j(x)$ are an $n$-dimensional set of basis functions for representing this family of natural splines...

By calling $\{N_1(x), \ldots, N_n(x)\}$ "basis functions" (to be fair, doing so without first proving they really formed a basis of a vector space -- which was not clearly stated in advance, is loose), they are necessarily linearly independent in the vector space $V$ of spline functions (known as "spline space", more accurately, given knots $x_1 < x_2 < \cdots < x_n$ and only cubic splines are under consideration, I think $V$ is a real vector space spanned by $\{1, x, x^2, x^3, (x - x_1)_+^3, \ldots, (x - x_n)_+^3\}$). As you are interested in, this statement, of course requires a mathematical proof, which the text omitted.

Recall $N_j(x)$ is defined as (Equation (5.4) in ESL): \begin{align} N_1(x) = 1, N_2(x) = x, N_{k + 2}(x) = d_k(x) - d_{n - 1}(x), k = 1, \ldots, n - 2, \tag{2} \end{align}
where \begin{align} d_k(x) = \frac{(x - x_k)_+^3 - (x - x_n)_+^3}{x_n - x_k}. \tag{3} \end{align}

Plug $(2)$ and $(3)$ into $(1)$, then $\det(N)$ (denoted by $D_n$) is \begin{align} & D_n = \begin{vmatrix} 1 & x_1 & 0 & 0 &\cdots & 0 \\ 1 & x_2 & (x_2 - x_1)^3/(x_n - x_1) & 0 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n - 1} & (x_{n - 1} - x_1)^3/(x_n - x_1) & (x_{n - 1} - x_2)^3/(x_n - x_2) & \cdots & (x_{n - 1} - x_{n - 2})^3/(x_n - x_{n - 2}) \\ 1 & x_n & (x_n - x_1)^2 - (x_n - x_{n - 1})^2 & (x_n - x_2)^2 - (x_n - x_{n - 1})^2 & \cdots & (x_n - x_{n - 2})^2 - (x_n - x_{n - 1})^2 \end{vmatrix} \\ & ???? \neq 0. \end{align}

When $n = 1, 2, 3$, \begin{align} & D_1 = 1 \neq 0, \\ & D_2 = x_2 - x_1 \neq 0, \\ & D_3 = 2(x_2 - x_1)^2(x_3 - x_2) \neq 0, \end{align} all confirming $N$ is invertible. However, for $n \geq 4$, it is more difficult to make a conclusion. I have calculated \begin{align} D_4 &= (x_4 - x_1)^{-1}(x_2 - x_1)(x_3 - x_2)^2(x_2 + x_3 - 2x_1)(2x_4 - x_2 - x_3) \\ &+ (x_4 - x_2)^{-1}(x_2 - x_1)(x_3 - x_2)^3[(x_1 - x_2)^2 + (x_3 - x_4)^2 - (x_4 - x_1)^2], \end{align} but it is not immediate to see if $D_4 \neq 0$.

Rank of smoothing matrix

2 Answers2

Linear algebra

The circumstances of the question

Solution