I'm having trouble with finding a reliable P-value for my beta regression in R. I thought it was as easy as reading it from the summary table, but as it turns out it is not. This website https://rcompanion.org/handbook/J_02.html tells me that the appropriate test to find the p-value for a betareg model is the lrtest from the lmtest package. According to other forums and websites the p_value (sjstats package), joint_tests (emmeans package), and nagelkerke (rcompanion package) could also be used.
Let me first introduce you to my model. I am investigating the influence of percentage overhead cover on the proportion of birds foraging. For this question I have made up three data sets, 1) realistic data, 2) random data, and 3) extreme data. I made these data sets because I wanted to check the P-values (for which I assume only the realistic data must have a significant P-value, if any). These data + their betareg models look like this.
My R script + output looks like this.
df <- data.frame(ProportionBirdsScavenging = c(0.666666666666667, 0.40343347639485, 0.7, 0, 0, 0.0454545454545455, 0.25, 0.707070707070707, 0.629213483146067, 0.882352941176471, 0.942857142857143, 0.451612903225806, 0.0350877192982456, 0.5, 0.484375, 0, 0.0208333333333333, 0.240740740740741, 0.804568527918782, 0.666666666666667, 1, 1, 1, 0.552238805970149, 0.2, 0, 0, 0, 0, 0, 0.12972972972973, 0.0894117647058824, 0.576158940397351, 0, 0),
pointWeight = c(3,233,10,89,4,22,44,99,89,17,35,341,57,36,128,39,144,54,394,12,46,229,55,67,5,28,2,160,124,294,555,425,302,116,48),
OverheadCover = c(0.7, 0.671, 0.6795, 0.79, 0.62, 0.62, 0.6413, 0.089, 0.4603, 0.04, 0.0418, 0.46, 0.5995, 0.532, 0.65, 0.6545, 0.74, 0.74, 0.02, 0.02, 0, 0, 0, 0.45, 0.8975, 0.92, 0.898, 0.89, 0.86, 0.69, 0.755, 0.775, 0.585, 0.585, 0.55),
Random_OverheadCover = c(0.3021, 0.6397, 0.7437, 0.9408, 0.9532, 0.4218, 0.8518, 0.2117, 0.5945, 0.531, 0.7656, 0.8952, 0.3127, 0.5513, 0.5619, 0.6332, 0.0051, 0.0388, 0.1877, 0.964, 0.2192, 0.4307, 0.5684, 0.062, 0.071, 0.0733, 0.6637, 0.5746, 0.4933, 0.4182, 0.8617, 0.7269, 0.7009, 0.655, 0.7696),
Extreme_OverheadCover = c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1))
# REALISTIC DATA
df_realistic <- df
df_realistic$ProportionBirdsScavenging <- (((df_realistic$ProportionBirdsScavenging*(length(df_realistic$ProportionBirdsScavenging)-1))+0.5)/length(df_realistic$ProportionBirdsScavenging)) # Transform the data so all data is (0,1).
mybetareg_realistic <- betareg(ProportionBirdsScavenging ~ OverheadCover, data = df_realistic, weights = pointWeight, link = "logit")
summary(mybetareg_realistic)
# Call:
# betareg(formula = ProportionBirdsScavenging ~ OverheadCover, data = df_realistic, weights = pointWeight,
# link = "logit")
#
# Standardized weighted residuals 2:
# Min 1Q Median 3Q Max
# -27.4104 -6.9053 1.5775 5.3971 27.4439
#
# Coefficients (mean model with logit link):
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 1.69853 0.02563 66.26 <2e-16 ***
# OverheadCover -4.28684 0.04365 -98.22 <2e-16 ***
#
# Phi coefficients (precision model with identity link):
# Estimate Std. Error z value Pr(>|z|)
# (phi) 8.7909 0.1861 47.23 <2e-16 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Type of estimator: ML (maximum likelihood)
# Log-likelihood: 3294 on 3 Df
# Pseudo R-squared: 0.6607
# Number of iterations: 12 (BFGS) + 1 (Fisher scoring)
parameters::p_value(mybetareg_realistic)
# Parameter p
# 1 (Intercept) 0
# 2 OverheadCover 0
# 3 (phi) 0
nagelkerke(mybetareg_realistic)
# $Models
#
# Model: "betareg, ProportionBirdsScavenging ~ OverheadCover, df_realistic, pointWeight, logit"
# Null: "betareg, ProportionBirdsScavenging ~ 1, df_realistic, pointWeight, logit"
#
# $Pseudo.R.squared.for.model.vs.null
# Pseudo.R.squared
# McFadden -2.69615e+00
# Cox and Snell (ML) 1.00000e+00
# Nagelkerke (Cragg and Uhler) -1.53209e-24
#
# $Likelihood.ratio.test
# Df.diff LogLik.diff Chisq p.value
# -1 -2587.3 5174.6 0
#
# $Number.of.observations
#
# Model: 35
# Null: 35
#
# $Messages
# [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
#
# $Warnings
# [1] "None"
#
# Warning message:
# In betareg.fit(X, Y, Z, weights, offset, link, link.phi, type, control) :
# no valid starting value for precision parameter found, using 1 instead
lrtest(mybetareg_realistic)
# Likelihood ratio test
#
# Model 1: ProportionBirdsScavenging ~ OverheadCover
# Model 2: ProportionBirdsScavenging ~ 1
# #Df LogLik Df Chisq Pr(>Chisq)
# 1 3 3546.9
# 2 2 959.6 -1 5174.6 < 2.2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Warning message:
# In betareg.fit(X, Y, Z, weights, offset, link, link.phi, type, control) :
# no valid starting value for precision parameter found, using 1 instead
joint_tests(mybetareg_realistic)
# The output is just 3 white lines.
# RANDOM DATA
df_random <- df
df_random$ProportionBirdsScavenging <- (((df_random$ProportionBirdsScavenging*(length(df_random$ProportionBirdsScavenging)-1))+0.5)/length(df_random$ProportionBirdsScavenging))
mybetareg_random <- betareg(ProportionBirdsScavenging ~ Random_OverheadCover, data = df_random, weights = pointWeight, link = "logit")
# Warning message:
# In betareg.fit(X, Y, Z, weights, offset, link, link.phi, type, control) :
# no valid starting value for precision parameter found, using 1 instead
summary(mybetareg_random)
# Call:
# betareg(formula = ProportionBirdsScavenging ~ Random_OverheadCover, data = df_random, weights = pointWeight,
# link = "logit")
#
# Standardized weighted residuals 2:
# Min 1Q Median 3Q Max
# -24.6579 -7.4615 0.8881 5.0596 31.2095
#
# Coefficients (mean model with logit link):
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -0.07481 0.04363 -1.715 0.0864 .
# Random_OverheadCover -0.73184 0.06930 -10.561 <2e-16 ***
#
# Phi coefficients (precision model with identity link):
# Estimate Std. Error z value Pr(>|z|)
# (phi) 1.37018 0.02466 55.57 <2e-16 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Type of estimator: ML (maximum likelihood)
# Log-likelihood: 1015 on 3 Df
# Pseudo R-squared: 0.01213
# Number of iterations: 8 (BFGS) + 2 (Fisher scoring)
parameters::p_value(mybetareg_random)
# Parameter p
# 1 (Intercept) 8.638446e-02
# 2 Random_OverheadCover 4.509897e-26
# 3 (phi) 0.000000e+00
nagelkerke(mybetareg_random)
# $Models
#
# Model: "betareg, ProportionBirdsScavenging ~ Random_OverheadCover, df_random, pointWeight, logit"
# Null: "betareg, ProportionBirdsScavenging ~ 1, df_random, pointWeight, logit"
#
# $Pseudo.R.squared.for.model.vs.null
# Pseudo.R.squared
# McFadden -5.73902e-02
# Cox and Snell (ML) 9.57020e-01
# Nagelkerke (Cragg and Uhler) -1.46624e-24
#
# $Likelihood.ratio.test
# Df.diff LogLik.diff Chisq p.value
# -1 -55.073 110.15 9.1055e-26
#
# $Number.of.observations
#
# Model: 35
# Null: 35
#
# $Messages
# [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
#
# $Warnings
# [1] "None"
#
# Warning message:
# In betareg.fit(X, Y, Z, weights, offset, link, link.phi, type, control) :
# no valid starting value for precision parameter found, using 1 instead
lrtest(mybetareg_random)
# Likelihood ratio test
#
# Model 1: ProportionBirdsScavenging ~ Random_OverheadCover
# Model 2: ProportionBirdsScavenging ~ 1
# #Df LogLik Df Chisq Pr(>Chisq)
# 1 3 1014.69
# 2 2 959.62 -1 110.15 < 2.2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Warning message:
# In betareg.fit(X, Y, Z, weights, offset, link, link.phi, type, control) :
# no valid starting value for precision parameter found, using 1 instead
joint_tests(mybetareg_random)
# Again, just 3 white lines as output.
# EXTREME DATA
df_extreme <- df
df_extreme$ProportionBirdsScavenging <- (((df_extreme$ProportionBirdsScavenging*(length(df_extreme$ProportionBirdsScavenging)-1))+0.5)/length(df_extreme$ProportionBirdsScavenging))
mybetareg_extreme <- betareg(ProportionBirdsScavenging ~ Extreme_OverheadCover, data = df_extreme, weights = pointWeight, link = "logit")
# Warning message:
# In betareg.fit(X, Y, Z, weights, offset, link, link.phi, type, control) :
# no valid starting value for precision parameter found, using 1 instead
summary(mybetareg_extreme)
# Call:
# betareg(formula = ProportionBirdsScavenging ~ Extreme_OverheadCover, data = df_extreme, weights = pointWeight,
# link = "logit")
#
# Standardized weighted residuals 2:
# Min 1Q Median 3Q Max
# -21.1256 -8.0601 0.2244 5.1278 31.6603
#
# Coefficients (mean model with logit link):
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -0.43280 0.02689 -16.09 < 2e-16 ***
# Extreme_OverheadCover -0.11404 0.03667 -3.11 0.00187 **
#
# Phi coefficients (precision model with identity link):
# Estimate Std. Error z value Pr(>|z|)
# (phi) 1.33763 0.02393 55.91 <2e-16 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Type of estimator: ML (maximum likelihood)
# Log-likelihood: 964.5 on 3 Df
# Pseudo R-squared: 0.01569
# Number of iterations: 8 (BFGS) + 2 (Fisher scoring)
parameters::p_value(mybetareg_extreme)
# Parameter p
# 1 (Intercept) 2.868274e-58
# 2 Extreme_OverheadCover 1.871710e-03
# 3 (phi) 0.000000e+00
nagelkerke(mybetareg_extreme)
# $Models
#
# Model: "betareg, ProportionBirdsScavenging ~ Extreme_OverheadCover, df_extreme, pointWeight, logit"
# Null: "betareg, ProportionBirdsScavenging ~ 1, df_extreme, pointWeight, logit"
#
# $Pseudo.R.squared.for.model.vs.null
# Pseudo.R.squared
# McFadden -5.03454e-03
# Cox and Snell (ML) 2.41241e-01
# Nagelkerke (Cragg and Uhler) -3.69602e-25
#
# $Likelihood.ratio.test
# Df.diff LogLik.diff Chisq p.value
# -1 -4.8312 9.6625 0.0018807
#
# $Number.of.observations
#
# Model: 35
# Null: 35
#
# $Messages
# [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
#
# $Warnings
# [1] "None"
#
# Warning message:
# In betareg.fit(X, Y, Z, weights, offset, link, link.phi, type, control) :
# no valid starting value for precision parameter found, using 1 instead
lrtest(mybetareg_extreme)
# Likelihood ratio test
#
# Model 1: ProportionBirdsScavenging ~ Extreme_OverheadCover
# Model 2: ProportionBirdsScavenging ~ 1
# #Df LogLik Df Chisq Pr(>Chisq)
# 1 3 964.45
# 2 2 959.62 -1 9.6625 0.001881 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Warning message:
# In betareg.fit(X, Y, Z, weights, offset, link, link.phi, type, control) :
# no valid starting value for precision parameter found, using 1 instead
joint_tests(mybetareg_extreme)
# model term df1 df2 F.ratio p.value
# Extreme_OverheadCover 1 Inf 9.668 0.0019
As far I can get from these outcomes we have the following P-values.
Realistic data
summary<2e-16
p_value0
nagelkerke0
lrtest< 2.2e-16
joint_testsno output
Random data
summary<2e-16
p_value4.509897e-26
nagelkerke9.1055e-26
lrtest< 2.2e-16
joint_testsNo output
Extreme data
summary0.00187
p_value1.871710e-03
nagelkerke0.0018807
lrtest0.001881
joint_tests0.0019
According to these outcomes, all P-values are significant (<.05). I think this is quite curious, given the fact that I've deliberately made up data sets which either do or do not show a relation. Hence, the following questions.
1) Which test is appropriate to find the p-value for my model? Is the pseudo R2 from the summary table reliable, or should I take that from an other test as well?
2) Should I look for a P-value, or is there a better coefficient which tells me the goodness of the model? Which coefficients should I report when writing up this research?
3) What does the following warning mean? Why does it only sometimes pop up? For example, if I run my entire script first and then do the tests again without removing the objects from the environment, the warning does not pop up.
Warning message:
In betareg.fit(X, Y, Z, weights, offset, link, link.phi, type, control) :
no valid starting value for precision parameter found, using 1 instead
4) Why is there only an output for the joint_tests for the extreme data?
Sorry for the many questions, but I am in the deep here. Can somebody shed some light on this and help me?

joint_testsreports results only for categorical factors as far as I know. – Sal Mangiafico Dec 13 '19 at 01:45betareghandlesweightsat this Cross Validated page. It may be that these case weights aren't what you want, but instead that you want to use proportional weights. So, using something like follows, you will get the same coefficients in the results, but different p values:df_random$Weight = df_random$pointWeight / sum(df_random$pointWeight). – Sal Mangiafico Dec 13 '19 at 15:49