I have a dataset test_data that measures mortality in response to dosage of a pesticide. I used a probit model that evaluates the efficacy of a single pesticide. Where we would want to determine, for example, the median lethal dosage at of pesticide A at 3 days. The model would be as follows, x=log10(dose) and y=mortality (0-100%) while being weighted out of the total individuals tested.
My goal is to determine the reliability of the model and compare the reliability to other models. For example, e.g. the median lethal dosage at 3 days for pesticide A vs. pesticide B. I am able to calculate the goodness of fit, but would like an additional test. I think pseduo-R^2 might be a good option.
Here is a reproducible example:
> dput(test_data)
structure(list(trt = c("A", "A", "A", "A", "A", "A", "B", "B",
"B", "B", "B", "B"), dose = c(5L, 50L, 500L, 5000L, 50000L, 500000L,
5L, 50L, 500L, 5000L, 50000L, 500000L), proportion_dead = c(0,
0.016666667, 0.25, 0.583333333, 0.916666667, 1, 0, 0.041666667,
0.05, 0.416666667, 0.833333333, 1), total = c(120L, 120L, 120L,
120L, 120L, 120L, 120L, 120L, 120L, 120L, 120L, 120L)), class = "data.frame", row.names = c(NA,
-12L)
Here I build a model for pesticide A and B.
In order to calculate McFadden's Pseduo-R^2, I calculate the 1 - residual deviance / null deviance calculated in the model m1 or m2. I believe that is correct for in-sample pseudo-R^2? But my question is: Can I use in-sample pseudo R^2? I think I could because the model is based on data that was collected in the past.
m1<- glm(proportion_dead ~ log10(dose), weights=total, data=test_data[test_data$trt=='A',], family=binomial(link='probit'))
m2<- glm(proportion_dead ~ log10(dose), weights=total, data=test_data[test_data$trt=='B',], family=binomial(link='probit'))
pr21 <- 1 - m1$deviance / m1$null.deviance
pr22 <- 1 - m2$deviance / m2$null.deviance
output:
> pr21
[1] 0.9932642
> pr22
[1] 0.9715011
I am learning statistics, so any suggestions would be great. Thanks!