3

I am very new to the world of statistics and I need some help. I am currently doing an experiment and I have to analyze the data, but I honestly do not know anything about statistics (I have tried to attend some courses at the university, but they only deal with descriptive stat; I've learned a bit from some books but in many cases a really starting course is difficult to find). However, from what I have learnt I have understood that I need to run a mixed effect logistic regression model. I have a binary dependent variable (yes/no), and different predictors:

  • Group age (with three levels: 20, 30, 40)
  • Preference (with two levels: a, b)
  • Season (with two levels: summer, winter)
  • Weather (with two levels: sunny, rainy)

As random effects I have both the ID and the item number (there are 25 observations for each participants, and all the items are the same).

How can I choose the correct model? In my prediction, the dependent variable is influenced by group age, season and weather, as well as the interaction between season and weather. Instead, I do not think that the variable preference would affect the result, but I would like to control for this.

Is it possible to use the package glmulti in R to find the correct model? I have read about the necessity to compare some index such as BIC and AIC. But generally, what is a BIC value for example according to which the model is okay? For example, is a BIC of 2700 too high? And what about if different models share the same BIC/AIC? After having run the model, what should contain a complete statistical analysis?

I am really sorry for the stupid questions, but I think that a bull point about the passages step by step would help me a lot since I do not have a "methodology" (or I do not know where I can learn it).

Katherine
  • 165
  • 5
  • 1
    Hi @Katherine, welcome to CV! "How can I choose the correct model?" - the first question is always, what are you trying to obtain from this analysis? Do you have a formal hypothesis you want to test? Are you trying to understand your data set? ... - the ultimate aim(s) of your experiment will help you, and us, figure out the best way to model the data. – Alex J Mar 08 '24 at 01:52
  • Re AIC/BIC - try searching the site about how to use. https://stats.stackexchange.com/a/84077/369002 is a good answer describing how to use AIC – Alex J Mar 08 '24 at 01:55
  • "Is it possible to use the package glmulti" - I don't know, I'm not familiar with this package. Some packages I know can do GLMMs are lme4, glmmTMB, mgcv – Alex J Mar 08 '24 at 01:59
  • I agree with @AlexJ, lme4 is the most user-friendly package for mixed effects models. What is your reasoning for included random effects in the model? Usually this is because hierarchal or longitudinal data. – Jack Mar 09 '24 at 07:19
  • Thank you! The reason for including random effects is because there are 40 obs for each participant, and participants show the same items. – Katherine Mar 09 '24 at 09:08
  • However, I'm so confused. Do I have to make something like a normalization of my data? And, what is Turkey post hoc analysis? Can you suggest me a guide or some sort of it to learn everything from the basics? – Katherine Mar 09 '24 at 09:08
  • When you say you have "25 observations for each participants, and all the items are the same", do you mean you have 25 (yes/no) item responses for each individual in your sample? In that case, you might want to look at item responses theory models. – Durden Mar 09 '24 at 23:08
  • It's impossible to get the correct model with the wrong data. Age and season must be continuous variables in order to get an interpretable result that is free of residual confounding. – Frank Harrell Mar 10 '24 at 12:26

1 Answers1

3

Clearly here it would be useful to get a lot of knowledge under your belt before running wild with mixed modeling. What may be beneficial for now is to go through the lectures here on mixed modeling and this article as well (I haven't found a lot of great books on mixed modeling other than Gelman's original book...but that was written in 2007 and is in great need of an update). My personal take is that you need a strong background in regression before running mixed models, so I recommend Regression and Other Stories if you need a book that explains things in plain English with lots of practical examples in R.

I believe it was noted in the comments that you should clarify your hypothetical model that you are trying to describe before running anything. You list your variables as some important predictors of your outcome. Why? You list that there is a binary dependent variable of "yes"/"no" questions but have not explained what this tests either. It would be important to know what exactly you are testing to get an exact answer in terms of modeling this.

You note the following:

How can I choose the correct model? In my prediction, the dependent variable is influenced by group age, season and weather, as well as the interaction between season and weather. Instead, I do not think that the variable preference would affect the result, but I would like to control for this.

What are your models to begin with? Model comparison is only necessary if you have competing models that you believe are of potentially better explanatory power. For example, it may be the case that only one of your predictors has a substantial influence while the others are minor. Then you could make the case for a more parsimonious model being better, and could test that with the usual tools. This part is fairly simple in my opinion. Simply fit the models you want, then run the AIC/BIC of each to see if there are major differences (with lower AIC/BIC scores being better). One can run a likelihood ratio test (LRT) as well, but I personally do not find that option as attractive because it replaces one's thinking about the models with binary decision making based on null hypotheses.

For example, is a BIC of 2700 too high? And what about if different models share the same BIC/AIC?

There is no "ideal" number for AIC/BIC. They are relative, and only used to see if one model has a lower AIC/BIC than another.

Additionally:

Is it possible to use the package glmulti in RStudio to find the correct model?

I generally fit mixed models with lme4 and mgcv, but I will say that I would caution against automatic model selection tools. Some methods are better and some are worse (with stepwise likely being at the bottom of the list), but I again believe that model comparison and selection should be based on a combination of statistics and logic. As the famous adage goes, "all models are wrong", so one can only select a small world to explain a large and complex world (see below figure which illustrates this from Box's article linked in this answer). If there are some small worlds which are better at explaining phenomena than others, than one can tentatively select them with the implicit understanding that they can always be wrong. I prefer not to automate that thinking in any way, but others may potentially disagree.

enter image description here

After having run the model, what should contain a complete statistical analysis?

I advise checking out this article, which provides a best practices guide for mixed modeling.