Update: Added more meaningful example data
Setting:
In my study, each of three randomly chosen readers (A, B,C) applies three different qualitative scores (score1, score2, score3, each an ordinal scale) to the same set of 95 cases. score1 for example is a score how the reader rates the severity of a case (1 = not severe, 10 = extremely severe).
Example: Reader A rates case 1 with score1=1, score2=7 and score3=8, Reader B rates case 1 with score1=.., score2=.. and score3=.. ASO
For some of the cases (in the example data case 92-95) they apply the score multiple times (i.e., at different time points without a special event between the time points).
Example data:
library("lme4")
library("tidyverse")
example data
set.seed(1)
df <- data.frame(reader=rep(c("A", "B", "C"),
each=100), case=rep((c(rep(1:91), 92, 92,
93, 93, 94, 94, 95, 95, 95)), 3),
class=sample(0:1, 300, replace=TRUE,
prob=c(2/3, 1/3)))
approx. 66% are class 0 and 33% are class 1
set.seed(1)
df %>%
rowwise() %>%
mutate(
score1=case_when(class==1 ~ sample(5:10, 1),
TRUE ~ sample(1:6, 1)),
score2=case_when(class==1 ~ sample(1:7, 1),
TRUE ~ sample(4:10, 1)),
score3=sample(1:10, 1)) -> df
df <- data.frame(df)
str(df)
#> 'data.frame': 300 obs. of 6 variables:
#> $ reader: chr "A" "A" "A" "A" ...
#> $ case : num 1 2 3 4 5 6 7 8 9 10 ...
#> $ class : int 0 0 0 1 0 1 1 0 0 0 ...
#> $ score1: int 1 1 6 10 3 9 9 1 5 4 ...
#> $ score2: int 7 9 4 7 4 1 4 8 6 9 ...
#> $ score3: int 8 2 1 3 2 7 10 4 1 7 ...
Created on 2022-03-12 by the reprex package (v2.0.1)
Created on 2022-03-12 by the reprex package (v2.0.1)
Aim:
Now I would like to investigate the association between the scores1-3 (independent variables) and a class (dependent variable, 1 or 0). Class could change for cases that are scored repeated times (i.e., case 92-95).
I could do this with a simple logistic regression in R:
glm(class ~ score1 + score2 + score3,
family="binomial", data = df)
However, since the data is groped/nested, I think I get too low p-values for the independent variables.
Question:
What analysis is most appropriate to meet the level of groping/nesting of my data?
My solutions:
Averaging
Average eachscore1-3 among the readers and among the cases with multiple measurements and perform a simple logistic regression as mentioned above.
Use a mixed-effects model
I found some advice for nested data: Mixed Effects Model with Nesting and What is the difference between fixed effect, random effect and mixed effect models?
However, since I am new to mixed-effects models I am sure which variable should be considered as fixed and random:
Only reader as random effect
mod1 <- glmer(class ~ score1 + score2 + score3 +
(1|reader), family="binomial", data = df)
#> boundary (singular) fit:
# seehelp('isSingular')
Reader and case as random effect
mod2 <- glmer(class ~ score1 + score2 + score3 +
(1|reader/case), family="binomial",
data = df)
#> boundary (singular) fit: see
# help('isSingular')
I think I get the warning boundary (singular) fit: see help('isSingular') because the effects are very small in the test data.
classis a binary variable that has been applied to each case in terms of a test. it has been applied beforereader A-Crated the cases and it is not linked toscore1-3– ava Mar 15 '22 at 11:17