Help me understand this relatively simple (I think) concept:
The covariance of the intercept ($\beta_0$) and the slope ($\beta_1$) in simple linear regression.
Furthermore, what range of values can this covariance take on?
Help me understand this relatively simple (I think) concept:
The covariance of the intercept ($\beta_0$) and the slope ($\beta_1$) in simple linear regression.
Furthermore, what range of values can this covariance take on?
To answer your question as asked, if you adopt the frequentist view of statistics then $\beta_0$ and $\beta_1$ are not random variables and thus have no covariance. They are fixed (and unobserved) values that describe the true relationship between your $Y$ and $X$ variables. The covariance between them is undefined in the sense that the covariance between $4.5$ and $\pi$ is undefined; they're not random variables, they're just numbers. If you adopt the Bayesian view of statistics, and you thus view $\beta_0$ and $\beta_1$ as random variables themselves, I imagine you could model them as covarying somehow. Maybe someone can elaborate on this in the comments as I'm not really sure of this.
However, I suspect you're asking something different; namely, what is the covariance between the estimates of these coefficients (sometimes called $\hat{\beta_0}$ and $\hat{\beta_1}$, sometimes called $b_0$ and $b_1$). This is answered very well by the top answer here. If you look at the off-diagonal elements of the variance-covariance matrix of the estimates (equation 6.78a in the textbook the OP posted), you will see
$$\operatorname{Cov}(\hat{\beta_0},\hat{\beta_1}) = \frac{-\bar{X}\sigma^2}{\sum{(X_i-\bar{X})^2}} = -\bar{X}\operatorname{Var}(\hat{\beta_1})$$
where $\sigma^2$ is the variance of the error terms.
To answer your question of what range of values it can take on, let's look at the equation. This equation shows that as the spread of $X$ values increases, the magnitude of the covariance decreases (ie, as the denominator gets larger, the expression gets smaller in magnitude). As the error term variance $\sigma^2$ increases, so does the magnitude of the covariance. Additionally, the sign of the covariance is the opposite sign of $\bar{X}$.
So it can actually take on any range of values from $-\infty$ to $0$ if $\bar{X}>0$ and any range of values from $0$ to $\infty$ if $\bar{X}<0$. The magnitude of the value it takes on depends on the spread of your $X$ values and the variance of your error terms.
edit: I added $-\bar{X}\operatorname{Var}(\hat{\beta_1})$ in the equation to further help with intuition. The variance of our slope estimate, $\mathrm{Var}(\hat{\beta_1})$, is a measure of how precise that estimate is; in a perfect world, we want this variance to be small so that our estimate is very precise. In light of this, I don't really think that the covariance between the intercept and slope estimates is a very useful or enlightening concept on its own. As far as I can tell, it is the negative of the product of two more easy-to-interpret values: $\bar{X}$ and $\mathrm{Var}(\hat{\beta_1})$.