0

I am interested in plotting the PDF of a zero-one-inflated beta distribution so that I can overlay an empirical density function of the observed data, with a PDF using the parameters I estimated from a regression fit using the brms package in R.

I understand how to generate a PDF for a discrete zero-inflated distribution but I am having trouble wrapping my head around how this would be possible for a continuous distribution like the beta distribution. Does anyone have any pointers to either an existing implementation of this in R or elsewhere, or could provide some example code for how to do this?

I can provide more specific details if needed, but I think what I generally need is a method to create a PDF for a continuous distribution with one or more "inflation" components, that I could apply to create a zero-one-inflated beta probability density function.

model fit summary

Here is what displays in the console when calling summary(fit). The model allows all parameters of the zero-one-inflated beta to vary as a function of treatment. My goal is to plot a PDF of the modeled response distribution for each treatment.

Family: zero_one_inflated_beta 
  Links: mu = logit; phi = log; zoi = logit; coi = logit 
Formula: Proportion ~ Treatment 
         phi ~ Treatment
         zoi ~ Treatment
         coi ~ Treatment
   Data: simple_data (Number of observations: 360) 
  Draws: 4 chains, each with iter = 4000; warmup = 3000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept -0.91 0.10 -1.09 -0.72 1.00 3242 2746 phi_Intercept 1.58 0.14 1.30 1.85 1.00 3300 2788 zoi_Intercept -1.19 0.22 -1.62 -0.78 1.00 2793 2637 coi_Intercept -6.50 3.10 -14.37 -2.63 1.00 1820 1331 TreatmentAtG -0.28 0.14 -0.54 -0.01 1.00 3470 2874 TreatmentAPG -0.19 0.14 -0.46 0.08 1.00 3213 2973 phi_TreatmentAtG 0.33 0.21 -0.09 0.73 1.00 3265 2826 phi_TreatmentAPG 0.19 0.20 -0.23 0.58 1.00 3019 3015 zoi_TreatmentAtG 0.65 0.28 0.11 1.21 1.00 2969 2872 zoi_TreatmentAPG 0.34 0.29 -0.24 0.92 1.00 2766 2872 coi_TreatmentAtG 2.60 3.25 -2.23 10.50 1.00 1863 1404 coi_TreatmentAPG -0.25 4.53 -9.77 8.95 1.00 1423 1523

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).

Here is the output of dput(summary(fit)) for the model that contains, among other things, the median values from the posterior distributions of all the parameters.

structure(list(formula = structure(list(formula = Proportion ~ 
    Treatment, pforms = list(phi = phi ~ Treatment, zoi = zoi ~ 
    Treatment, coi = coi ~ Treatment), pfix = list(), resp = "Proportion", 
    family = structure(list(family = "zero_one_inflated_beta", 
        link = "logit", linkfun = function (mu) 
        link(mu, link = slink), linkinv = function (eta) 
        inv_link(eta, link = slink), dpars = c("mu", "phi", "zoi", 
        "coi"), type = "real", ybounds = c(0, 1), closed = c(TRUE, 
        TRUE), ad = c("weights", "subset", "index"), include = "fun_zero_one_inflated_beta.stan", 
        specials = "sbi_zi_logit", normalized = "", link_phi = "log", 
        link_zoi = "logit", link_coi = "logit"), class = c("brmsfamily", 
    "family")), mecor = TRUE), class = c("brmsformula", "bform"
)), data_name = "simple_data", group = character(0), nobs = 360L, 
    ngrps = NULL, autocor = NULL, prior = structure(list(prior = character(0), 
        class = character(0), coef = character(0), group = character(0), 
        resp = character(0), dpar = character(0), nlpar = character(0), 
        lb = character(0), ub = character(0), source = character(0)), class = c("brmsprior", 
    "data.frame"), row.names = integer(0)), algorithm = "sampling", 
    chains = 4L, iter = 4000L, warmup = 3000L, thin = 1L, sampler = "sample(hmc)", 
    fixed = structure(list(Estimate = c(-0.90528752625, 1.580811685, 
    -1.19292657275, -6.499043965, -0.27649975441075, -0.187851774786015, 
    0.32542871341475, 0.18643767712385, 0.6463566213525, 0.3447500927745, 
    2.60338112814775, -0.2461269624225), Est.Error = c(0.0959046589040605, 
    0.13945355554377, 0.215237395928849, 3.1009864359503, 0.135865577346364, 
    0.136497497315428, 0.207714190070951, 0.204924681244251, 
    0.28040277149041, 0.293531084311733, 3.25075636844655, 4.52740348744564
    ), `l-95% CI` = c(-1.093054, 1.29537825, -1.6244415, -14.3712875, 
    -0.539085225, -0.4575207, -0.08814557, -0.2328505, 0.112203525, 
    -0.235284349999999, -2.2301125, -9.76533725), `u-95% CI` = c(-0.715749925, 
    1.851207, -0.78194595, -2.63277625, -0.0126142324999999, 
    0.0835585625000001, 0.72997525, 0.582225925, 1.2074075, 0.91722755, 
    10.502935, 8.9491615), Rhat = c(1.0015884207194, 0.999917786584758, 
    1.00037997753762, 1.00176907634062, 1.0003540915247, 1.00059282672043, 
    1.00052829336817, 1.00026305806547, 1.00124237305807, 1.00075322541757, 
    1.00089230002969, 1.00201414758976), Bulk_ESS = c(3242.04442608349, 
    3300.32028285236, 2792.94823881214, 1819.53559579282, 3469.94101433756, 
    3212.6066785496, 3264.71227200253, 3018.95934804462, 2968.56213796343, 
    2765.6524712156, 1863.16701706442, 1423.10214690165), Tail_ESS = c(2746.32327946702, 
    2788.22346690149, 2637.07590677314, 1330.5596854849, 2873.96456716223, 
    2973.27203895491, 2825.68060612819, 3014.71081254269, 2871.57181775515, 
    2871.90247589758, 1404.12421696219, 1523.09010264777)), row.names = c("Intercept", 
    "phi_Intercept", "zoi_Intercept", "coi_Intercept", "TreatmentAtG", 
    "TreatmentAPG", "phi_TreatmentAtG", "phi_TreatmentAPG", "zoi_TreatmentAtG", 
    "zoi_TreatmentAPG", "coi_TreatmentAtG", "coi_TreatmentAPG"
    ), class = "data.frame"), spec_pars = structure(list(Estimate = numeric(0), 
        Est.Error = numeric(0), `l-95% CI` = numeric(0), `u-95% CI` = numeric(0), 
        Rhat = numeric(0), Bulk_ESS = numeric(0), Tail_ESS = numeric(0)), row.names = character(0), class = "data.frame"), 
    cor_pars = structure(list(Estimate = numeric(0), Est.Error = numeric(0), 
        `l-95% CI` = numeric(0), `u-95% CI` = numeric(0), Rhat = numeric(0), 
        Bulk_ESS = numeric(0), Tail_ESS = numeric(0)), row.names = character(0), class = "data.frame")), class = "brmssummary")
qdread
  • 295
  • Can you edit your post with the fitted model? Ideally, just paste in the output from dput(your_fitted_model). – Stephan Kolassa Jul 22 '22 at 17:19
  • The fitted model is >5MB in size because of all the posterior samples it contains, so I will add some output with just the summary of the parameter estimates from a simplified version of the model. Will do that in just a minute. – qdread Jul 22 '22 at 17:31
  • 1
    If you have in principle a spike at zero, how are you going to determine the empirical density function? Kernel estimation at least has great difficulty with bounds and even greater difficulty with spikes. – Nick Cox Jul 22 '22 at 17:37
  • @NickCox Yes, that's part of the problem! – qdread Jul 22 '22 at 17:40
  • Ah, thanks. That is indeed rather complicated. Can you instead give us the fitted parameters of the zero inflated beta for a particular predictor setting? Sorry for the confusion. – Stephan Kolassa Jul 22 '22 at 19:38
  • 1
    The following might be helpful: https://stats.stackexchange.com/questions/335547/how-does-one-graph-the-pdf-of-a-variable-having-a-mixed-discrete-continuous-dist. – JimB Jul 22 '22 at 22:18
  • An empirical quantile plot or cumulative distribution plot could help. – Nick Cox Jul 23 '22 at 10:07

0 Answers0