I am trying to see if some anthropic variables (e.g., Population density, Population growth, and Roads) explain animals' distribution. My dependent variable is the percentage of area occupied (continuous variable, range: 0-100; e.g., 87.9). Each species has also a status (discrete variable; i.e., 0/1) and an Order (more than one species can be of the same Order; e.g., Rodentia), and I expect to have different influences of each independent variable, based on species' status. For instance, for a species with status 0, I expect no influence of Roads, while for a species with status 1, I expect an influence of Roads.
The values of the independent variables are measured throughout the species distribution range. Some of these independent variables can have a negative value (Population Growth).
I first looked into GLMM as I thought they were more flexible. However, I don't really understand which could be my grouping variable or my random effect(s). If I understood correctly, the grouping variable may be something "grouping" my data, that is: based on this grouping, the response variable shall behave differently. In my case, it could be the Country (as different countries may have different population densities or number of roads), but I don't have this information (as the species' distributions encompass more than one country), and also the independent variables are not measured at a country level but are averaged over each species' distribution range.
I switched to GLM (package stats, I don't know if there are better options) thinking it would be better. I think I should use a Gamma distribution, based on the response variable (that is > 0 and is continuous), but I am wondering if my choices are correct and how to deal with negative values in the independent variables.
So far I've tried this:
glm(range_perc ~
PopdensityAvg + PopgrowthAvg + Railways + Roads,
start = c(range(myData$range_perc)[1],
range(myData$PopdensityAvg)[1],
range(myData$PopgrowthAvg)[1],
range(myData$Railways)[1],
range(myData$Roads)[1]),
data = myData,
family = Gamma)
This is a sample of my data:
Binomial Order Establishment range_perc PopdensityAvg PopgrowthAvg Railways Roads
1 Apodemus_sylvaticus Rodentia 1 20.04902 4.908014 0.2019391 0.0000000 2.983818
2 Apodemus_sylvaticus Rodentia 0 100.00000 36.353451 1.2507490 0.2885747 2.728217
3 Axis_axis Cetartiodactyla 0 100.00000 61.042892 0.2815404 0.0000000 4.964841
4 Callosciurus_erythraeus Rodentia 1 97.82241 329.174194 10.8665762 1.5212460 2.964914
5 Callosciurus_erythraeus Rodentia 0 100.00000 338.821411 11.4654551 1.5289692 3.000512
6 Callosciurus_finlaysonii Rodentia 0 100.00000 155.620636 1.4710270 2.0869565 6.450338
7 Capra_aegagrus Cetartiodactyla 1 24.25978 142.892624 13.3291273 0.5820163 2.345860
8 Capra_aegagrus Cetartiodactyla 0 100.00000 78.786888 5.9212909 0.1487832 3.228113
9 Capra_ibex Cetartiodactyla 1 77.20798 27.929804 -0.3536499 0.2243542 2.430885
10 Capra_ibex Cetartiodactyla 0 100.00000 26.661007 -0.1254798 0.3145299 2.327852
I am using R 4.0.3.
myData$perc_rangehave a value of 100). From an answer here https://stats.stackexchange.com/questions/31300/dealing-with-0-1-values-in-a-beta-regression , "if y also assumes the extremes 0 and 1, a useful transformation in practice is (y * (n−1) + 0.5) / n where n is the sample size", but this transforms my 100%myData$range_percin a 95%... – LT17 Aug 11 '22 at 08:50myData$range_perc. But isn't logistic regression for 0 and 1 response variables (success/failure, Y/N, male/female...)?myData$Establishmentisn't my response variable. I would try to transformmyData$range_percin a proportion (ranging from 0 to 1 included) and perform aGLMwith binomial distribution, but even if it is a proportion and it ranges from 0 to 1, it is not a proportion of successful cases, so I guess it is not appropriate... – LT17 Aug 11 '22 at 09:43