I performed a multinomial model with bootstraping to predict the probability of the cows perform different behaviours over time period

Question

I found an answer to model my data in order to predict behaviours over time. The question answered was here I have zero inflated data, with discrete variables. Is it possible to use zero inflated poisson model?

However, besides of it I would need to find the significant difference between the gap of the behaviour on the different treatments (see the graph on the other question).

So, besides the probability of the behaviour happen in each treatment over time, I need to find if there a significance between the treatments in the hours that there is a gap between the prob graph. Is it possible to do?

Please refer to the other question for the code used.

Thank you.

The emmeans objects generated from the modeled data on the page that you link include p-value estimates for treatment differences for all behaviors and times. The answers there have been updated to illustrate that directly. Is there something else that you need? — EdM, Nov 25 '22 at 16:23
Is there anyway to ilustrate it on the graph, only the moments that we had significative differences? — jkc186, Nov 27 '22 at 07:09
See the last graph in this answer, based on your actual data rather than the synthetic data of the other answer on that page. That graph shows the estimated sh-ns difference as a function of time for the 3 behaviors, along with 95% CI. When the 95% CI don't include 0, the difference can be considered "significant." — EdM, Nov 27 '22 at 14:51
so if its above 0 it is significantly higher the behaviour comparing teratments and if its below its negatively significant? did I understand right? Also, how the model calculate in quarter hour if I didnt even had this data, I just had the data per hour? If I change the data set for another farm, do I have to change the parameters from the model? Thank you. — jkc186, Nov 28 '22 at 00:48
The plot shows sh-ns differences. So above zero means the behavior is more probable in sh than in ns. The model in that answer treats time as continuous. In the ref_grid_ I specified predictions at each 0.25 hour, to make a smoother curve. The bootstrap sampling specified the individual cow id values, so you would have to set those values appropriately to the data from a new farm. If you fit a model with a different number of coefficients, you also have to adjust the Terms for the Wald test, as noted in the answer. — EdM, Nov 28 '22 at 01:42
yeah, I understand the probability, but I am talking about significance. So that was the main question. How do I know from the differences graph the significant differences? or its not possible? — jkc186, Nov 29 '22 at 02:06

score 0 · Accepted Answer · answered Dec 05 '22 at 17:14

Think carefully before you jump to evaluating "significance" based on a choice of probability-value cutoff. The basic emmeans vignette provides some useful guidance.

That's particularly tricky in a situation like yours, where you want to make multiple comparisons over time (modeled continuously) and behaviors, but the probabilities of all 3 behaviors are constrained to sum up to 1. You have to make some adjustment for multiple comparisons, and the choice of adjustment will in turn determine the apparent "significance." Furthermore, a "statistically significant" result might have limited practical importance.

The plots of treatment differences in the linked answer show pointwise confidence bands uncorrected for multiple comparisons. To evaluate "significance" properly you have to make careful choices about how to correct for multiple comparisons.

Here's an example of how to proceed with one such choice about multiple comparisons. The model, fit as a continuous function of time with a circular spline, will be evaluated at each hourly value. That's how the data were collected. For behaviors, we'll follow the guidance from the emmeans vignette that "it’s usually reasonable to regard each 'by' group as a separate family of tests for purposes of adjustment" and treat the behaviors separately.

Make sure to load packages and run the code as in the linked answer first. In particular, you must have a multinomial regression model mn1, a bootstrap-based variance-covariance matrix cov for that model, and the emmeans package loaded.

Set up a reference grid for the problem restricted to hourly values and do treatment versus control (sh - ns here) calculations for each combination of time and Behavior. This is similar to what was done in the linked answer, but with fewer time values as we don't need a smooth plot here.

rgH <- ref_grid(mn1,vcov.=cov,at=list(time=seq(0,23,by=1)))
emmH <- emmeans(rgH,~trt.vs.ctrl1~trt|time+Behavior,adjust="none")

Then do a multiple-comparison correction. For this type of data modeled continuously in time, the "multivariate-t" correction probably makes the most sense. It's less conservative than the Bonferroni or Šidák corrections. As noted above, corrections are done across all 24 hourly values, but separately within each Behavior. As random sampling from a multivariate distribution is involved in this correction, specify a seed first. Return the output as a data frame.

set.seed(2002)
testH <- test(emmH$contrasts, by="Behavior", adjust="mvt", as.df=TRUE)

Then you can display the results restricted to the combinations of time and Behavior that pass the standard p < 0.05 significance level after correction.

print(testH[testH$p.value<0.05,c(1:5,8)],digits=2)
#    contrast time Behavior estimate     SE p.value
# 1   sh - ns    0   Active   -0.145 0.0234 2.3e-05
# 2   sh - ns    1   Active   -0.086 0.0189 2.1e-03
# 6   sh - ns    5   Active    0.050 0.0141 1.9e-02
# 7   sh - ns    6   Active    0.057 0.0143 7.5e-03
# 8   sh - ns    7   Active    0.041 0.0113 1.7e-02
# 19  sh - ns   18   Active    0.049 0.0129 1.2e-02
# 23  sh - ns   22   Active   -0.111 0.0221 5.5e-04
# 24  sh - ns   23   Active   -0.149 0.0244 1.2e-04
# 25  sh - ns    0    Lying    0.116 0.0250 1.7e-03
# 26  sh - ns    1    Lying    0.072 0.0206 2.0e-02
# 43  sh - ns   18    Lying   -0.106 0.0243 2.8e-03
# 44  sh - ns   19    Lying   -0.098 0.0244 6.8e-03
# 48  sh - ns   23    Lying    0.108 0.0257 4.3e-03
# 49  sh - ns    0 Standing    0.029 0.0086 2.7e-02
# 65  sh - ns   16 Standing    0.035 0.0104 2.6e-02
# 66  sh - ns   17 Standing    0.047 0.0124 1.1e-02
# 67  sh - ns   18 Standing    0.057 0.0145 7.2e-03
# 68  sh - ns   19 Standing    0.060 0.0151 6.7e-03
# 69  sh - ns   20 Standing    0.059 0.0140 4.0e-03
# 70  sh - ns   21 Standing    0.055 0.0123 2.7e-03
# 71  sh - ns   22 Standing    0.049 0.0111 2.7e-03
# 72  sh - ns   23 Standing    0.041 0.0102 6.1e-03

The estimate values are the sh - ns differences expressed in terms of probability differences. The displayed p values incorporate the multivariate-t corrections. You need to apply your understanding of the subject matter to determine the practical significance of the "statistically significant" treatment differences.

I performed a multinomial model with bootstraping to predict the probability of the cows perform different behaviours over time period

1 Answers1