Thank you so much for all the input and for pointing out my mistakes. I made some corrections and try to add more details but ended up trying to answer my problem based on other people's posts. Please let me know if there are any mistakes.
Why is a T distribution used for hypothesis testing a linear regression coefficient?
Proof that the coefficients in an OLS model follow a t-distribution with (n-k) degrees of freedom
I was trying to derive the variance of ${\hat{\beta} = (X^TX)^{-1}\sigma^2}$, I was not so sure about what is this ${\sigma^2}$ and how to estimate ${\sigma^2}$
Given, ${f(X) = X\beta + u}$, and minimizing the sum of square of residuals ${\Sigma(y_i - f(x_i))^2}$. Thus, the unique solution of ${\beta}$:
\begin{equation} {\hat{\beta} = (X^TX)^{-1}X^Ty} \end{equation}
Further, we have ${\hat{y} = X\hat{\beta} = X(X^TX)^{-1}X^Ty}$
I was reading about the derivation of the variance of ${\hat{\beta}}$:
\begin{equation} {y = X \beta + u} \end{equation}
\begin{equation} {\hat{\beta} = X(X^TX)^{-1}X^T(X \beta + u)} \end{equation}
\begin{equation} Var(\hat{\beta}) = E[\hat{\beta} - \beta)(\hat{\beta} - \beta)^T] \end{equation}
\begin{equation} Var(\hat{\beta}) = (X^TX)^{-1}X^TE[uu^T]X(X^TX)^{-1} \end{equation}
\begin{equation} Var(\hat{\beta}) = E[uu^T](X^TX)^{-1} = {\sigma}^2 (X^TX)^{-1} \end{equation}
Thus, \begin{equation} \frac{\hat{\beta_i} - \beta_i}{\sigma \sqrt{X^TX_{ii}}} \sim N(0,1) \end{equation}
In reality, ${s^2 = \frac{RSS}{n-p}}$ is an unbiased estimator for ${\sigma^2}$ (I got very confused between ${s}$ and ${\sigma}$)
\begin{equation} RSS = \frac{\sum(y_i - \hat{y_i})^2}{n-p} = (n-p)s^2 \sim \chi_{n-p}^2 \end{equation}
\begin{equation} \frac{(n-p)s^2}{\sigma^2} \sim \chi_{n-p}^2 \end{equation}
\begin{equation} \frac{s}{\sigma} \sim \sqrt{\frac{\chi_{n-p}^2}{(n-p)}} \end{equation}
A t-distribution is formed, where${N(0,1)}$-distribution divided by ${\sqrt{\chi^2(s)/s}}$-distribution