I've never even heard the term "Goodness of Fit" in my class. I looked through the textbook and can't find it there either.
What exactly is "goodness of fit"? Is there a formula to calculate it?
I've never even heard the term "Goodness of Fit" in my class. I looked through the textbook and can't find it there either.
What exactly is "goodness of fit"? Is there a formula to calculate it?
Goodness of fit is, roughly, the ability of a model to generate high quality predictions.
To calculate the goodness of fit, the predicted values are called expected value $E$, and the outcome is called an observed value $O$. There are a number of tests based on comparing the observed values to expected values. They are generally some form of the statistic: $(E-O)^2/variance)$ where $variance$ depends on the type of outcome being modeled. This statistic trends toward a $\chi^2$ random variable with $q-p$ degrees of freedom. The numerator $q$ is the number of parameters needed to give the "observed" quantities in a model" and $p$ is the number of parameters in the candidate model. If the test achieves a $p$-value less than $\alpha$ we conclude that the model has poor fit and that the observed values are too inconsistent with the expected values.
The most commonly used goodness-of-fit test is the Pearson Chi-square test for categorical data which is calculated by:
$$ T = \sum_{i,j} \left( O_{ij} - E_{ij} \right) ^2 / E_{ij} $$
for a contingency table having row/columns indexed by $i$ and $j$. The Wikipedia article has a good discussion of these models. However, a Poisson model has a different goodness of fit metric, like $\sum O_i/E_i$ which Rockhill used to show calibration of a cancer incidence prediction model in 2001.
In an ANOVA with discrete fixed effects, there is a finite number of possible expected values, based on all observed combinations of covariates, so the test statistic of the form $\sum_{i=1}^k (\bar{Y}_k - \hat{Y}_k)^2/(\sigma^2/n_k)$ has an asymptotic $\chi^2_{k-p}$ distribution (please correct me if this is wrong). Here $\bar{Y}_k$ is the sample mean for the $k$-th combination of covariates whereas $\hat{Y}_k$ is the predicted mean and $\sigma^2/n_k$ is the expected variance of the $k$-th sample mean when homoscedasticity holds.
This method has been highly criticized.
You need to calculate the coefficient of determination (R square) which is the most common goodness of fit index in multiple regression and (multiplied by 100) denotes the percent of the variation of dependent variable explained by the 4 predictors participating in your model. I guess that you have a textbook to consult. If not then after a simple Google search you will find many excellent sources.
If you want to open a question I can hold forth further!
– Patrick Malone Apr 13 '18 at 15:25