2

My problem

I would like to infer the relative importance of "habitat connectivity" compared to "habitat quality" to explain tits breeding success in a urban area (more details below). My dataset contains $n_{total}$ = 426 observations (one observation is a [great or blue] tits reproduction event observed in a given nestbox for a given year), and that's a fairly decent sample size for an ecological study. However, I have doubts regarding the independance of my observations:

  • Some observations ($n$ = 188) were made in the same nestbox but on different years and involved different tits couples. Theoretically, I should thus add nestbox_id as a random effect in my models. But since the birds aren't the same, I wonder if it's relevant (all nestboxes are identical so their characteristics should not cause strong variations in tits breeding success)?
  • Some of my nestboxes are clustered (mostly for logistical reasons). A cluster_id random factor could thus be useful as well. Yet I do not think nestbox clustering is a problem because i) my covariates should account for most locally correlated environmental effects, and ii) I am somehow interested in spatial autocorrelation as I am studying the effect of "habitat connectivity" (which is partially linked to spatial and environmental proximity).
  • As my observations are made across different years, possible year's effects should definitely be accounted for. But since year only has 4 levels, I should include it as a fixed effect. To avoid loosing too many degrees of freedom, I wanted to split year into a period factor that would indicate if the reproduction event occured at the beginning, the middle or the end of the breeding season for a given year (e.g. "middle_2022" or "start_2019"). I would thus have enough groups to use period as a random effect, but I would lose some information regarding year's effect outside of these 3 periods of the breeding season, is that right?
  • Finally, you should know that all my explanatory variables and covariates are static* in time (except one; i.e. temperature), meaning that for a given nestbox, the values of all these variables will be the same every year. Is it pseudoreplication? This post makes me think my data are not truly pseudoreplicated but I would like to be sure.

Given these elements, and considering that some think we include too many terms in regression models (see also here), could you help me decide which effects (among nestbox_id, cluster_id, year and period) I should include in my models (in addition to the variables I am truly interested in)?
Any helpful comment/answer regarding my question or other aspects of my study will be appreciated.

*: having temporally static variables is certainly a limit but we are fairly confident that most environmental variables did not vary much across the 4 consecutive years of the study, and collecting environmental variables every year would have been prohibitively expansive.

Study context

I'm studying some of the consequences urban life has on the population dynamics of two closely related bird species (great tits and blue tits).
To do so, we installed a network of 372 nestboxes throughout a large city. For 4 years, we monitored all nestboxes during the breeding period and, whenever a nestbox was occupied by a tits couple, we measured variables related to their reproductive success (e.g. number of layed eggs, number of nestlings, number of fledlings, nestlings morphometric features). These are my response variables.
For each nextbox, we also collected variables that are theoretically important to explain tits reproduction success. These variables are either related to the local "habitat quality" (e.g. vegetation quantity; air, noise and light pollution; minimal temperature; disturbances) or to the larger scale "habitat connectivity" (can individuals easily move through the landscape). These (proxies) are my explanatory variables or covariates.

My goal is to test whether habitat connectivity increases urban tits breeding success by doing a LR test to compare nested models including "habitat quality" variables with or without "habitat connectivity" proxies. At least, that's the general idea.

Fanfoué
  • 631
  • 5
  • 16

0 Answers0