I'm trying to work out the order of operations when building a GLM.
I have 9 variables I could use as inputs for the model, however I may find that some are irrelevant to the independent variable and so I won't use them.
I was thinking I'd start off by using all 9 variables to pick a distribution and link function for the GLM. Looking at the dependent variable's data has told me that a Inverse Gaussian or Gamma distribution would probably be best, and I'm planning on testing the following link functions: Identity, Log, Inverse and [only for Inverse Gaussian] $\frac{1}{\mu^2}$.
So, I plan on comparing the 7 different models with all 9 possible independent variables (4 Inverse Gaussian models, 3 Gamma models), to see which distribution and link function performs the best. I believe it'd be best to compute AICs or BICs as they're useful measures to compare models.
From there, I'd work on finding the best subset of independent variables from the 9 to predict my dependent variable. I know there's a lot of different ways to do this, but I plan on using the bestglm() function (from the "bestglm" R package) which with my data will use complete enumeration (Morgan and Tatar, 1972). Again, the BICs or AICs of the models will be used to compare and choose the best.
I was just wondering if this is the best order to do things in, based on AIC or BIC:
- Pick distribution and link function using all 9 possible dependent variables,
- Perform independent variable subset selection with GLMs using distribution and link function picked in Step 1.
Morgan and Tatar, 1972 DOI: 10.1080/00401706.1972.10488918