1

Is there a goodness of fit metric similar to R^2 that can be used to evaluate a weighted regression? In my particular case, my data points are cities and I weight a regression of this data by city population. My concern is that the R^2 value does not take into account the weight (population) of the data (cities) so it views a residual on New York City the same as a residual on Cincinnati.

  • 1
    You may want to consider transforming your data pre-regression, normalizing whatever data you have by a proportional population metric. It is difficult to determine without more information about your data set. – ERT Aug 02 '18 at 15:15
  • 1
    The outcome metric is a quantity per 1000 residents and the independent variables are characteristics of each city such as median rent, unemployment rate, etc. I would like to keep the regression in these terms while weighting on city population and I am looking for a metric similar to R^2 that will account for the weighting in the data. Does that help? – Jesse Cambon Aug 02 '18 at 15:29
  • There are different ways of looking at this, and one of them is: you have created a mathematical model, and now want to run data through that model. To create the model you weighted data. If you run data through the model for a city that was not part of the regression, will you have the weights for that city? If the answer to that question is no, consider unweighted fit statistics to give you a clearer understanding of how the model will function on new city data where you do not have weights. – James Phillips Aug 02 '18 at 15:33
  • Hi James, since the weight is the population of the city, I think I would always have that information for any city I put into the model. – Jesse Cambon Aug 02 '18 at 15:42
  • 1
    Why not use population as an additional covariate? – ERT Aug 02 '18 at 16:06
  • If you supply weights in standard regression software, the R^2 value would ideally take the weights into account. The weights will alter the regression prediction and as long as your model is a linear regression, R^2 remains the squared correlation between your predictions and the response value. – Heteroskedastic Jim Aug 02 '18 at 16:47
  • I am using the glmnet R package so that I can apply the LASSO method. At least to my knowledge this package does not supply R^2 values so I have been calculating R^2 on the back end. As for adding population as a covariate, that is a possible solution that I did consider. However, at least until this point I have opted to keep the outcome variable on a per capita basis to make the analysis more straightforward. I am looking at the number of immigrants coming to these cities so the per capita rate is ultimately more interesting and informative than the total. – Jesse Cambon Aug 02 '18 at 23:34
  • I think the question completely changes with the fact that you're using lasso. I think this is more consequential for computing goodness of fit than the fact that your observations are weighted. This other question may interest you: https://stats.stackexchange.com/questions/7225/estimating-r-squared-and-statistical-significance-from-penalized-regression-mode – Heteroskedastic Jim Aug 03 '18 at 12:36

0 Answers0