Context: This question is about concepts discussed in Chapter 7: "Model Assessment and Selection" in The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2008) by T. Hastie, R. Tibshirani and J. Friedman.
ELS discusses whether model selection according to AIC is the same as model selection according to predictive performance on new data. This relationship holds exactly for additive models with squared error; it holds approximately for other models, incl. logistic regression.
Eq. (7.29) leads to Eq. (7.30) only in the case of squared error loss and Gaussian likelihood with known variance. The $\hat{\sigma}^2_\epsilon$ term is the error variance in the Gaussian model and doesn't exist in logistic regression.
For an additive error model $Y = f(X) + \epsilon$ with $d$ parameters ($d$ inputs and/or basis functions) fit under squared error loss (basically, linear regression), the $C_p$ statistic is given by:
$$
C_p = \overline{\text{err}} + 2\frac{d}{N}\sigma^2_{\epsilon}
$$
where $\overline{\text{err}}$ is the training error averaged over $N$ training examples.
Now say we are considering a set of Gaussian models, all of the form $Y = f_\alpha(X) + \epsilon$, where $\alpha$ is a tuning/hyper-parameter and the errors are iid Normal with mean 0 and variance $\sigma^2_\epsilon$. Each model has effective number of parameters $d(\alpha)$. So a model with splines has more parameters than a model that is linear in all variables even if both models take exactly the same inputs.
We estimate the error variance $\sigma^2_\epsilon$ from the "largest" (with most parameters) model in the set. Due to the bias-variance tradeoff we expect the largest model to have the smallest bias. From this point on we treat $\hat{\sigma}^2_{\epsilon}$ as known: even though we obtain the estimate from one specific model, we assume that all models in the set have the same error variance $\hat{\sigma}^2_{\epsilon}$.
Now we are working with a set of models indexed by $\alpha$, we modify the formula above accordingly.
$$
\operatorname{AIC}(\alpha) = \overline{\text{err}}(\alpha) + 2\frac{d(\alpha)}{N}\hat{\sigma}^2_{\epsilon}
$$
We probably should call this $C_p(\alpha)$ but to make difficult concepts even harder, ESL refers to $C_p$ and AIC "collectively" as AIC. So keep in mind that the formula for $\operatorname{AIC}(\alpha)$ is derived from the $C_p$ statistics.
Math simplifies nicely when the likelihood is Gaussian with known variance and the loss function is the sum of squared errors.
$$
\begin{aligned}
\frac{\operatorname{AIC}(\alpha)}{\hat{\sigma}^2_\epsilon}
&= \frac{\overline{err}(\alpha)}{\hat{\sigma}^2_\epsilon} + 2\frac{d(\alpha)}{N} \\
&= \frac{2}{N}\sum_{i=1}^n\frac{\left(y_i-\hat{f}_\alpha(x_i)\right)^2}{2\hat{\sigma}^2_\epsilon} + 2\frac{d(\alpha)}{N} \\
&= \operatorname{const} -\frac{2}{N}\operatorname{loglik}(\alpha) + 2\frac{d(\alpha)}{N}
\end{aligned}
$$
where the constant depends on $\hat{\sigma}^2_\epsilon$ and $N$ but not on $\alpha$. That's Eq. (7.29) in the last line.
We've shown that model selection with $\operatorname{AIC}(\alpha)$ in Eq. (7.30) is equivalent to model selection with $\operatorname{AIC}$ in Eq. (7.29) in the case of Gaussian likelihood with known variance and squared error.
Otherwise the relationship between AIC and expected error is approximate, not exact.
Chapter 7 mentions this several times. For example, Figure 7.4 is about fitting logistic regression with log-likelihood (ie. entropy) loss and 0-1 loss. The caption explains:
Although the AIC formula does not strictly apply here, it does a reasonable job.
Source: The Elements of Statistical Learning
In the left panel (entropy loss) AIC agrees well with a test-sample estimate of model error for all models but the extremely over-paremetrized one; the link between AIC and model performance falls apart in the extreme case. In the right panel (0-1 loss) AIC is not as good an estimate of model error (the green line doesn't track the blue line closely) but you will still select the a good model by minimizing AIC (the one with 16 basic functions); the best model according to the test-sample error is the one with 32 basis functions.