0

I am trying to assess the goodness of fit of a surface I've developed (i.e., model predicts a variable y based on 2 variables x1 and x2). My model has 5 parameters which are estimated (and is likely not a linear model).

I need a way to quantify the goodness of fit, and the sources below suggest "Standard Error":

The standard error is a function of adjusted R2:

R2_adj = 1 - (1-R2)(n-1)/(n-p-1)

However, none of the sources I've found so far seem to be clear on (or I lack the background knowledge to understand) what the parameter p (or k) represents:

For example, Wikipedia says p is "the number of explanatory variables" BUT the footnote says "Assuming p+1 parameters are estimated" which is not true in my case.

I would like to use a metric that people generally understand, so R2 and S seem reasonable to use, but I would like to do my calculation correctly (even if my large sample size means p does not significantly impact R2_adj).

Erik
  • 31
  • Please confirm the model you are fitting is for a continuous response and uses least squares. – AdamO Jan 18 '24 at 14:33
  • $p+1$ vs $p$ usually boils down to analysts forgetting that the intercept is a parameter in the model. The degrees of freedom for the residual error is $n-1$ in the unconditional model (no parameters except the intercept, trivially), whereas if there are, say, $p$ parameters derived from the variables, the degrees of freedom is $n-p-1$. – AdamO Jan 18 '24 at 14:34
  • Lastly, non-linearity is ameliorated by deriving additional factors. For instance, if you have $x$ you might add $x^2$ as a factor - so 1 variable but 2 terms (which adds 2 to p on account of this $x$) - it doesn't increase the complexity of the inputs to the model, but does increase the quality of fit at the cost of additional complexity to the model process. By adding too many breakpoints and polynomial terms, you can overfit the model with only the $p$ as a cost term to enforce parsimony. – AdamO Jan 18 '24 at 14:38
  • Literally every account of multiple regression will explain what "$p$" means, if only in a simplified sense. (A rigorous account will refer to the rank of the model matrix.) It's difficult to interpret what you mean by "not true in my case," given you have stated 5 parameters are estimated. If you are counting correctly (an intercept counts but the error variance does not), then you can equate "5" with "p+1." As far as "linear model" goes, that term might not mean what you think. See https://stats.stackexchange.com/questions/148638. – whuber Jan 18 '24 at 14:38
  • @AdamO My model is continuous, and (to my knowledge) im using least squares to do the fit (im using nlinfit function in MATLAB). I dont have access to any of the optimisation or statistics toolboxes. My model is based on a physical understanding of the process, so its not a simple polynomial. assuming my params are a, b, c, etc and vars are x1, x2, my model is y = a(1 - ((x1/b)^c)((x2/d)^e))/(1+f*(x1/b)^c)) – Erik Jan 18 '24 at 15:08
  • @HarveyMotulsky That is in fact my question, where I initially thought i should use p = 5 (for my example) but the wording used in e.g., wikipedia confused me. – Erik Jan 18 '24 at 17:45
  • If I understand correctly, here is a short version of the question. The example has two independent variables and fits five parameters. When computing the adjusted R2, is p (or K) equal to 2 or 5? I believe the correction is based on the number of parameters (5 for the example), not the number of independent variables. – Harvey Motulsky Jan 18 '24 at 18:31
  • @Harvey That is correct -- it's how the Matlab fitting functions work. They (of course) employ Maximum Likelihood estimates assuming an iid Gaussian response. As I recall, they will (when requested) return a "degrees of freedom" value corresponding to $p+1.$ It gets interesting when fitting with constraints, because the df is reduced by $1$ for every applicable constraint -- sort of. The results are a little inconsistent in my experience. – whuber Jan 19 '24 at 14:31

0 Answers0