Modeling probability of bird species detection across the day

Question

I am looking for guidance on constructing a statistical model for the following problem, motivated by bird watching.

There is a very large database of bird "checklists". For the sake of this discussion, we are ignoring seasonality, so assume each checklist is a vector of a start time and end time of day (with 15 minute granularity) and a binary value for each species indicating whether it was detected (there may be multiple detection events, but we only know whether zero or more than zero birds were detected). For example, on list could say that from 07:00 to 08:00, American Robin and Blue Jay were detected; another could say that from 07:15 to 07:30 only American Robin was detected.

The goal is to determine the probability of detecting each species during each 15 minute interval of the day. One naive approach would be to say that for each species and each 15 minute interval, the probability is the number of lists overlapping that interval that included the species, divided by those that did not.

There are at least two problems with this approach.

First, we would expect the variance of these values to be much higher when there aren't many checklists available for a given time interval. It would be better to indicate (with a probability mass function) high uncertainty, rather summarize the data with an extreme point estimate.

Second, consider the example lists given above. One list detected American Robin over the course of an hour, the other over fifteen minutes. But both lists would contribute equally to the naive measure of detecting American Robin from 07:15 to 07:30. The hour list should contribute less, because it is quite possible that American Robin was only detected from 07:45 to 08:00. We might even want to assume that for lists of duration greater than one period, the probability of detection in any period should be assumed equal until there is evidence to the contrary.

Based on my very limited research, this sounds like a problem that could be handled with "Bayesian updating". Is that so? If so, how would one approach this? Is this related to any standard Bayesian inference problem?

The appropriate model seems to be multiple (i.e., one for each species) latent Poisson or NB processes with right-censoring. This is definitely something that is doable with JAGS or Stan. For a textbook treatment I'd recommend chapter 4.4 of Cameron & Trivedi. — Durden, Mar 23 '24 at 20:00
Whether you use a Bayesian or a Frequentist approach, you'll need an underlying model for which you can then assess the contribution of each observation. Your "naive" approach (using your labeling) is a data manipulation recipe and not a model. Are you aware of the literature associated with the R package "unmarked"? — JimB, Mar 23 '24 at 21:00

Modeling probability of bird species detection across the day

0 Answers0