I have come across a statistical modelling approach where the changes in nesting date over time are compared between 9 distinct populations of one single species. The method is described like this:
we used simple linear regression of nest initiation date as a function of year to determine whether there were overall unidirectional trends in nesting date over time.
Looking at the datasets, I have noticed that the data density per day is of a totally different shape between one population to another, e.g.:
n=230 observations
versus
n=200 observations
The histograms show the number of data (y-axis) per day (x-axis) over the whole dataset period.
Is there any impact of these structure differences on comparing the results between the two populations when using a linear regression approach? Why or why not?
And I will add this small comment even if it is not maths; to me, it is also strange that the same biological event looks like that different in structure between two populations of the same species.
Update:
For both models, the linear model was done as following:
Population 1 (n=230)
> summary(lm(nestDay~Year, data=.))
Call:
lm(formula = nestDay ~ Year, data = .)
Residuals:
Min 1Q Median 3Q Max
-117.54 -81.01 -11.20 81.94 146.98
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5813.447 4822.082 1.206 0.229
Year -2.788 2.396 -1.164 0.246
Residual standard error: 83.6 on 228 degrees of freedom
Multiple R-squared: 0.005905, Adjusted R-squared: 0.001545
F-statistic: 1.354 on 1 and 228 DF, p-value: 0.2458
Then:
resid=residuals(lm(nestDay~Year, data=.))
xyplot(resid~data$Year, panel=function(x,y){panel.loess(x,y,span=0.5,col=1); panel.xyplot(x,y,col=)})
Finally:
> AIC(lm(nestDay~Year, data=.))
[1] 2692.674
> AIC(lm(nestDay~1, data=.))
[1] 2692.036
versus
Population 2 (n=200)
> summary(lm(nestDay~Year, data=.))
Call:
lm(formula = nestDay ~ Year, data =.)
Residuals:
Min 1Q Median 3Q Max
-22.730 -6.730 -1.280 5.507 43.619
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -433.4591 571.0403 -0.759 0.449
Year 0.2751 0.2839 0.969 0.334
Residual standard error: 10.19 on 198 degrees of freedom
Multiple R-squared: 0.004719, Adjusted R-squared: -0.0003073
F-statistic: 0.9389 on 1 and 198 DF, p-value: 0.3338
Then:
FJresid=residuals(lm(nestDay~Year, data=.))
xyplot(FJresid~data$Year, panel=function(x,y){panel.loess(x,y,span=0.5,col=1); panel.xyplot(x,y,col=)})
Finally:
> AIC(lm(nestDay~Year, data=.))
[1] 1499.997
> AIC(lm(nestDay~1, data=.))
[1] 1498.943



