I have proportion data which includes 0s and 1s (the ri column below, calculated from the ud/days_lib)). I have found a few threads related to modelling this issue. A few of which (e.g., this thread) suggest transforming the data, using the following formula, y * (n−1) + 0.5) / n, where n is the sample size, from Smithson & Verkuilen (2006).
I'm not entirely sure how to apply this however. Does this get applied to all proportions, or just 0s and 1s? And I'm also unsure of what the sample size actually means, is this the number of rows in the data set or the denominator used to create the proportion?
I have included a sample of my dataset below and code for a reproducible example.
In the below example would I use 6 as the sample size or use the days_lib (31, 30 or 28) on the ri to transform it?
If there have been any further updates on how to model these datasets without transformation I would be keen to hear about them. Ideally with a package that can be used with dredge in MuMIn as I will be using a information theoretic approach on this data.
station monthyear ud days_lib ri species SE_score
372 PB03 2016/01 3 31 0.0968 Silvertip Shark 0.35
2054 SAUWM01 2018/01 27 31 0.8710 Grey Reef Shark 0.15
1054 PB26 2014/11 22 30 0.7333 Grey Reef Shark 0.17
1847 SA06 2015/02 28 28 1.0000 Silvertip Shark 0.30
1055 PB26 2014/11 24 30 0.8000 Silvertip Shark 0.17
316 PB02 2016/01 2 31 0.0645 Grey Reef Shark 0.54
Date set below
structure(list(station = structure(c(13L, 53L, 35L, 50L, 35L,
12L), levels = c("BE01", "BE02", "BEUWM01", "BL01", "BL02", "GCB01",
"GCB02", "GCB03", "NI01", "NI01b", "PB01", "PB02", "PB03", "PB04",
"PB05", "PB06", "PB07", "PB09", "PB10", "PB11", "PB12", "PB13",
"PB14", "PB15", "PB16", "PB17", "PB18", "PB19", "PB20", "PB21",
"PB22", "PB23", "PB24", "PB25", "PB26", "PB27", "PB28", "PB29",
"PB30", "PB4G01", "PB4G02", "PBUWM01", "PBUWM02", "SA01", "SA02",
"SA02b", "SA03", "SA04", "SA05", "SA06", "SA07", "SA11", "SAUWM01",
"SB01", "SB02/AR02", "SB04/AR06", "VB01", "VB02", "VB03", "VB04"
), class = "factor"), monthyear = structure(c(25L, 49L, 11L,
14L, 11L, 25L), levels = c("2014/01", "2014/02", "2014/03", "2014/04",
"2014/05", "2014/06", "2014/07", "2014/08", "2014/09", "2014/10",
"2014/11", "2014/12", "2015/01", "2015/02", "2015/03", "2015/04",
"2015/05", "2015/06", "2015/07", "2015/08", "2015/09", "2015/10",
"2015/11", "2015/12", "2016/01", "2016/02", "2016/03", "2016/04",
"2016/05", "2016/06", "2016/07", "2016/08", "2016/09", "2016/10",
"2016/11", "2016/12", "2017/01", "2017/02", "2017/03", "2017/04",
"2017/05", "2017/06", "2017/07", "2017/08", "2017/09", "2017/10",
"2017/11", "2017/12", "2018/01", "2018/02", "2018/03", "2018/04",
"2018/05", "2018/06", "2018/07", "2018/08", "2018/09", "2018/10",
"2018/11", "2018/12"), class = "factor"), ud = c(3L, 27L, 22L,
28L, 24L, 2L), days_lib = c(31, 31, 30, 28, 30, 31), ri = c(0.0968,
0.871, 0.7333, 1, 0.8, 0.0645), species = structure(c(2L, 1L,
1L, 2L, 2L, 1L), levels = c("Grey Reef Shark", "Silvertip Shark"
), class = "factor"), SE_score = c(0.35, 0.15, 0.17, 0.3, 0.17,
0.54), region = structure(c(5L, 6L, 5L, 6L, 5L, 5L), levels = c("Benares",
"Blenheim", "Grand Chagos Bank", "Nelsons Island", "Peros Banhos",
"Saloman", "Speakers Bank", "Victory Bank"), class = "factor"),
month = c(1, 1, 11, 2, 11, 1), season = structure(c(2L, 2L,
2L, 2L, 2L, 2L), levels = c("dry.season", "wet.season"), class = "factor"),
year = structure(c(3L, 5L, 1L, 2L, 1L, 3L), levels = c("2014",
"2015", "2016", "2017", "2018"), class = "factor")), row.names = c(372L,
2054L, 1054L, 1847L, 1055L, 316L), class = "data.frame")