To formalize the discussion, let me clarify notations and problem statement first.
Here we are studying the classical linear model $y = X\beta + \varepsilon$, where $y \in \mathbb{R}^n$ is the vector of response variables, $X \in \mathbb{R}^{n \times (p + 1)}$ is the design matrix, $\beta = (\beta_0, \beta_1, \ldots, \beta_p)^\top \in \mathbb{R}^{p + 1}$ is the vector of parameters (with $\beta_0$ intercept), and $\varepsilon \sim N(0, I_{(n)})$ (without loss of generality, here I assume $\sigma^2 = 1$ for convenience).
The problem is to find the distribution of the $F$-statistic
\begin{align*}
F = \frac{(TSS - RSS)/p}{RSS/(n - p - 1)} \tag{1}\label{1}
\end{align*}
under the alternative hypothesis $H_a$ for the testing problem:
\begin{align*}
H_0: \beta_1 = \cdots = \beta_p = 0 \text{ v.s. }
H_a: \text{not all of } \beta_1, \ldots, \beta_p \text{ are zero}.
\end{align*}
If you understand how to derive the distribution of $F$ under $H_0$, the task of deriving its distribution under $H_a$ is not that different. So let's go over this process, with the help of some linear algebra. To this end, let me introduce two more standard notations: the hat matrix $H = X(X^\top X)^{-1}X$ (we assume $\operatorname{rank}(X) = p + 1$), and the vector of $n$ ones $e = (1, \ldots, 1)^\top$.
With the above setup, we can express $RSS$ and $TSS$ in $\eqref{1}$ in terms of quadratic forms as follows:
\begin{align*}
& TSS = y^\top(I - n^{-1}ee^\top)y, \tag{2.1}\label{2.1} \\
& RSS = y^\top(I - H)y. \tag{2.2}\label{2.2}
\end{align*}
Hence
\begin{align*}
TSS - RSS = y^\top(H - n^{-1}ee^\top)y. \tag{3}\label{3}
\end{align*}
The distribution of $RSS$ is always $\chi_{n - p - 1}^2$, regardless the specification of $\beta$. This is because $(I - H)X = X - HX = 0$, whence $(I - H)y = (I - H)(X\beta + \varepsilon) = (I - H)\varepsilon$, which then implies that
\begin{align*}
RSS = \varepsilon^\top(I - H)\varepsilon \sim \chi_{n - p - 1}^2. \tag{4}\label{4}
\end{align*}
The last assertion is of course a consequence of that the matrix $I - H$ is idempotent with rank $n - p - 1$, and $\varepsilon \sim N(0, I_{(n)})$. For details, see the general result at the end of this answer.
Moving to the distribution $TSS - RSS$ in $\eqref{3}$, it is slightly more involved, due to that it normally does not reduce to a quadratic form of $\varepsilon$ only. However, under $H_0$, it does -- because $H_0$ entails $X\beta = \beta_0e$, whence
\begin{align*}
& (H - n^{-1}ee^\top)y \\
=& (H - n^{-1}ee^\top)(X\beta + \varepsilon) \\
=& \beta_0He - \beta_0n^{-1}ee^\top e + (H - n^{-1}ee^\top)\varepsilon \\
=& \beta_0e - \beta_0e + (H - n^{-1}ee^\top)\varepsilon \\
=& (H - n^{-1}ee^\top)\varepsilon.\tag{5}\label{5}
\end{align*}
It then follows by the idempotency of the matrix $H - n^{-1}ee^\top$ that
\begin{align*}
TSS - RSS = \varepsilon^\top(H - n^{-1}ee^\top)\varepsilon \sim \chi_p^2. \tag{6}\label{6}
\end{align*}
Combining $\eqref{4}$ and $\eqref{6}$ immediately gives $F \sim F_{p, n - p - 1}$ under $H_0$ (I will leave the check of independence between $RSS$ and $TSS - RSS$ to you, which is straightforward, thanks to the normality assumption, by verifying that $\operatorname{Cov}((I - H)\varepsilon, (H - n^{-1}ee^\top)\varepsilon) = 0$).
After reviewing the classical proof above, it is clear that the only place that the distribution of $F$ might be twisted from $H_0$ to $H_a$ is the algebra in $\eqref{5}$, which needs to be updated to
\begin{align*}
(H - n^{-1}ee^\top)y := P(\mu + \varepsilon) \tag{7}\label{7}
\end{align*}
where $P = H - n^{-1}ee^\top, \mu = X\beta$. Since $\mu + \varepsilon \sim N(\mu, I_{(n)})$, it can be shown that (see a short proof at the end this answer)
\begin{align*}
y^\top(H - n^{-1}ee^\top)y = (\mu + \varepsilon)^\top P (\mu + \varepsilon) \sim \chi^2_{p; \mu^\top P\mu} =
\chi^2_{p; \|(I - n^{-1}ee^\top)X\beta\|^2} \tag{8}\label{8}
\end{align*}
$\eqref{4}$ and $\eqref{8}$, together with the definition of noncentral F-distribution then imply that under $H_a$,
\begin{align*}
F = \frac{(TSS - RSS)/p}{RSS/(n - p - 1)} \sim F_{p, n - p - 1; \|(I - n^{-1}ee^\top)X\beta\|^2}.
\end{align*}
That is, $F$ is distributed as noncentral $F$ distribution with degrees of freedom $p$, $n - p - 1$, and the noncentrality parameter $\|(I - n^{-1}ee^\top)X\beta\|^2 = \beta^\top X^\top(I - n^{-1}ee^\top)X\beta$.
Many conclusions in the above answer are based on the following classical result:
Suppose $\xi \sim N(\mu, I_{(n)})$, $A$ is an idempotent and symmetric matrix with $\operatorname{rank}(A) = r$, then the quadratic form $\xi^\top A \xi$ has a noncentral $\chi^2$ distribution with degrees of freedom $r$ and noncentrality parameter $\mu^\top A \mu$.
Here is a short proof based on the canonical form of $A$:
Since $A$ is symmetric, idempotent and $\operatorname{rank}(A) = r$, there exists an order $n$ orthogonal matrix $O$ such that $A = O^\top\operatorname{diag}(I_{(r)}, 0)O$. Denote $O\xi$ by $\eta = (\eta_1, \ldots, \eta_n)^\top$, then by the linear transformation property of MVN, we have $\eta \sim N(O\mu, I)$, it then follows by the definition of noncentral chi-squared distribution that
\begin{align*}
\xi^\top A \xi = \xi^\top O^\top \operatorname{diag}(I_{(r)}, 0) O\xi = \eta^\top \operatorname{diag}(I_{(r)}, 0) \eta =
\eta_1^2 + \cdots + \eta_r^2 \sim \chi^2_{r; \mu^\top A \mu},
\end{align*}
where we used $\eta_i \sim N(e_i^\top O\mu, 1), i = 1, \ldots, r$, whence (here $e_i$ is the $n$-long vector with $i$-th component $1$ and all remaining components $0$)
\begin{align*}
\sum_{i = 1}^r(e_i^\top O\mu)^2 = \sum_{i = 1}^r \mu^\top O^\top e_ie_i^\top O\mu = \mu^\top O^\top\operatorname{diag}(I_{(r)}, 0)O\mu = \mu^\top A \mu.
\end{align*}