1

I've got a rather hard question concerning my regression.

My data has the following properties.

  1. Dependent variable is count data and is overdispersed and consist of repeated measurements within multiple groups
  2. Two independent variables that are categorical data (5 point likert-scale)

My goal was to quantify how well my independent variable predicts my dependent variable.

Based on these properties i must run a generalized linear mixed model with negative binomial distribution and log link function with random intercept for groups. Now, at first my idea was to calculate how much of the explained variance in the dependent variable can be attributed to the independent variable using SPSS. So far, i could only find a possible analysis in R using the lme4 package. But since i do not work with R this is not an option.

Due to the the properties of my data the explained variance can not be calculated in SPSS i am now looking for a way that comes as close to such an analysis. Pseudo-r-squareds aren't an option.

A likelihood ratio test is possible in SPSS, however for this i must run a generlized linear model which means i would ignore the fact that the my data is clustered.

Does anyone have thoughts on this?

Kind Regards,

E

  • 1
    $R^2$ representing the “proportion of variance explained” only applies to special scenarios, and yours does not appear to be one of those. Then pseudo $R^2$ values can represent dramatically different notions than proportion of variance explained (e.g., McFadden’s $R^2$ for a logistic regression has nothing to do with squared residuals). // I’m not so convinced you need a random effect in your model. Why do you believe you do? – Dave Feb 22 '23 at 11:49
  • Thank you for answering! Good point! based on previous research the slopes are expected to be the same across groups. This would mean that a fixed effect is the preferred choice. Bases on the properties of my data is it not possible to run an analysis that would give me the explained variance ? – Eli Jong Feb 22 '23 at 12:16
  • would efron's pseudo r squared be a solition for instance? – Eli Jong Feb 22 '23 at 12:25
  • I only know Efron’s pseudo $R^2$ from logistic regression, where it is $1-\frac{SSRes}{SSTotal}$ (so the usual $R^2$). What would your definition be for negative binomial regression? – Dave Feb 22 '23 at 12:30
  • Sorry, i meant to type ordinal logistic regression as an option. So the dependent variable would be the likert scale data and the independent variable would be the count data. – Eli Jong Feb 22 '23 at 13:15
  • You already have that the usual $R^2$ only describes the "proportion of variance explained" in special scenarios that exclude what you're doing. Consequently, what do you really want to know about your data and modeling? If you literally want the proportion of variance explained, then it could be argued (I think rather convincingly) that $1-\frac{SSReg}{SSTotal} = 1-\frac{\sum(\hat y_i -\bar y)^2}{\sum(y_i - \bar y)^2}$ is the proportion of variance explained by your model (it comes from the linked $SSTotal$ decomposition). But that need not have much relationship to the model. – Dave Feb 22 '23 at 16:10
  • My aim was as follows: The dependent variable and independent variable both measure exposure. The dependent uses objective measurements. This is the overdispersed count data. The independent also measures exposure but using a likert scale (Exposed everytime I am there”, “Exposed usually”, “exposed sometimes”, “Usually not exposed”, “Never exposed when I am there"). If the independent variable explained the majority of the variance of the dependent variable, than exposure could be measures using only the independent variable in future projects. – Eli Jong Feb 22 '23 at 16:18
  • Please edit your original post to mention Your goal of quantifying how well one variable predicts the other. Variance could be a reasonable measure of that but might not be. – Dave Feb 22 '23 at 16:27
  • I have edited the original post. – Eli Jong Feb 22 '23 at 16:55
  • It still seems that you are mixing up the goal and the method. You ask how to quantify how well one variable predicts the other (reasonable, since a strong correspondence could let you use a variable that is easier to measure with minimal loss of information). Then you say you are running a negative binomial GLM with a log-link and want to quantify how much of the variance such a model explains. This assumes, however, that a negative binomial model and reduction in variance are appropriate to address your question. You might have better luck asking how to measure the effectiveness of a proxy. – Dave Feb 22 '23 at 17:11

0 Answers0