5

I am currently self-studying the normal linear model ($X_i \sim N(A \beta, \sigma^2I)$), and have learned about estimating $\beta$ and $\sigma^2$, testing certain types of hypothesis (although not too complicated ones), creating confidence intervals, and making predictions (my study has been mathemathically rigorous, so even though most of these things are explained in basic courses, my understanding is fairly deep).

My question is, what comes "next" within this framework? Should I move on to other parts of statistics, focus on other models with different distributions, or is there still more depth in the normal setup? I'm having a hard time figuring out what more I can get out of it other than the things I already mentioned.

Markelo
  • 51
  • 1
    Welcome to CV. In one sense, statistics is like karate training...you're always reviewing the basics and that is the foundation for higher levels of achievement. One piece that may expand your horizons is this Edge article about Richard Nisbett's Crusade Against Multiple Regression. https://www.edge.org/conversation/richard_nisbett-the-crusade-against-multiple-regression-analysis – user78229 Apr 12 '16 at 09:46
  • 2
    You could focus on generalized linear models now – adaien Apr 12 '16 at 10:13

1 Answers1

7

This is by no means authoritative, but hopefully it gives you somewhere to start.

You've studied the model $Y = X\beta + \varepsilon$ with $\varepsilon \sim \mathcal N(0, \sigma^2 I)$.

There are many extensions of this model. Some are:

  1. Explore different assumptions on $\varepsilon$. Maybe you only assume $E(\varepsilon) = 0$, $Var(\varepsilon) = D$. What happens if $D \neq \sigma^2 I_n$? If $D$ is diagonal you can get weighted least squares; if $D$ is not even diagonal then you can use generalized least squares. Also, what happens if $\varepsilon$ has specified but not normal distributions (Laplace, for example)? This turns out to be closely related to considering loss functions other than squared loss.
  2. Consider other distributions of $Y_i$. This can lead to GLMs
  3. Explore penalized regressions. This is a big area. I'd start with the lasso and ridge regression.
  4. Ridge regression and the lasso involve tuning parameters. This leads you to learning about cross validation. Or maybe you'd prefer to use information criteria like the AIC and BIC. There's a lot to study here, such as where exactly they come from and asymptotic relationships with different kinds of cross validations. Significance testing also gets tricky. Maybe you'll need to use the bootstrap more often.
  5. Ridge regression and the lasso also have Bayesian interpretations. This can lead you to going full Bayesian and investigating the effects of various priors and multilevel models. This might require the use of MCMC which is a big thing in and of itself.
  6. Very closely related to the previous point is the delightful world of mixed effects models, which is also very closely related to generalized least squares
  7. Maybe you've got continuous predictors and want to use polynomials in them, but you don't like global polynomials. You can consider basis expansion methods like splines. I'd suggest you take a look at the Elements of Statistical Learning (available free online).
  8. From here you could study kernel methods in general; this includes methods like the support vector machine.

Now you've got both simple interpretable tools like iid normal-errors linear regression, nonparametric flexible methods like SVM, and many things in between.

jld
  • 20,228