I am well aware, that this is a FAQ, but other questions could not provide me answers to my question. Also, I hope this will not be considered a double post, since I have posted this issue with a different question (see here).
I have data that I fit into a gls model from the nlme package in R. This is what the data looks like
fid|Treatment|NDVI_mean|date |Management1
1 G 0.7918969 2018-10-11 Control
2 J 0.1899928 2018-10-11 Clearcut
3 H 0.2378630 2018-10-11 Clearcut
4 C 0.3631807 2018-10-11 PR1/2
5 D 0.3294372 2018-10-11 PR1/2
6 E 0.4611494 2018-10-11 PR3/4
7 F 0.3681659 2018-10-11 PR3/4
8 I 0.3020894 2018-10-11 Clearcut
9 B 0.3247002 2018-10-11 NR
10 K 0.3109387 2018-10-11 NR
11 L 0.8071204 2018-10-11 Control
12 G 0.8191044 2018-11-28 Control
13 J 0.1952865 2018-11-28 Clearcut
14 H 0.2209233 2018-11-28 Clearcut
15 C 0.3942993 2018-11-28 PR1/2
16 D 0.3285107 2018-11-28 PR1/2
17 E 0.4192142 2018-11-28 PR3/4
18 F 0.3856550 2018-11-28 PR3/4
19 I 0.2522038 2018-11-28 Clearcut
20 B 0.3300189 2018-11-28 NR
21 K 0.2990551 2018-11-28 NR
22 L 0.8361506 2018-11-28 Control
23 G 0.8274411 2018-12-05 Control
...
479 E 0.5348337 2022-06-15 PR3/4
480 F 0.4959483 2022-06-15 PR3/4
481 I 0.5295775 2022-06-15 Clearcut
482 B 0.4168112 2022-06-15 NR
483 K 0.4540260 2022-06-15 NR
484 L 0.6940871 2022-06-15 Control
The data recorded a NDVI (vegetation index i.e. greenness/vigor) time series over the span of four years. The Treatment are different study sites and Mangement1 different treatments. Here is a plot for a better understanding:
I want to find out, how Management1 is influencing the NDVI_mean values over time. Therefore, I have fit the data in a gls model:
gls_model <- gls(NDVI_mean ~ Management1 + date, correlation = corCAR1(form = ~date | Treatment), data = Data_2018_2022)
Generalized least squares fit by REML
Model: NDVI_mean ~ Management1 + date
Data: Data_2018_2022
AIC BIC logLik
-1337.034 -1303.677 676.517
Correlation Structure: Continuous AR(1)
Formula: ~date | Treatment
Parameter estimate(s):
Phi
0.9850594
Coefficients:
Value Std.Error t-value p-value
(Intercept) -1.7199592 0.28986021 -5.933754 0
Management1NR -0.4107973 0.02250663 -18.252277 0
Management1PR1/2 -0.4161073 0.02250663 -18.488207 0
Management1PR3/4 -0.3806264 0.02250663 -16.911743 0
Management1Clearcut -0.4365251 0.02054565 -21.246592 0
date 0.0001345 0.00001567 8.588285 0
Correlation:
(Intr) Mng1NR M1PR1/ M1PR3/ Mngm1C
Management1NR -0.039
Management1PR1/2 -0.039 0.500
Management1PR3/4 -0.039 0.500 0.500
Management1Clearcut -0.043 0.548 0.548 0.548
date -0.998 0.000 0.000 0.000 0.000
Standardized residuals:
Min Q1 Med Q3 Max
-2.7290950 -0.7069599 -0.1141165 0.6488433 3.2727331
Residual standard error: 0.07468751
Degrees of freedom: 484 total; 478 residual
I find that the estimates look somewhat unreasonable looking at the plots of my data.
If I fit the model in the following way: gls_model <- gls(NDVI_mean ~ Management1 + fid, correlation = corCAR1(form = ~ date | Treatment), data = Data_2018_2022)
The summary() seems to be more accurate for what I understand:
Generalized least squares fit by REML
Model: NDVI_mean ~ Management1 + fid
Data: Data_2018_2022
AIC BIC logLik
-1322.891 -1289.534 669.4456
Correlation Structure: Continuous AR(1)
Formula: ~date | Treatment
Parameter estimate(s):
Phi
0.9870169
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.6889661 0.021376286 32.23039 0
Management1NR -0.4110561 0.025436529 -16.16007 0
Management1PR1/2 -0.4148831 0.025436178 -16.31075 0
Management1PR3/4 -0.3794014 0.025436108 -14.91586 0
Management1Clearcut -0.4346909 0.023219982 -18.72055 0
fid 0.0002791 0.000042286 6.60065 0
Correlation:
(Intr) Mng1NR M1PR1/ M1PR3/ Mngm1C
Management1NR -0.592
Management1PR1/2 -0.596 0.500
Management1PR3/4 -0.595 0.500 0.500
Management1Clearcut -0.653 0.548 0.548 0.548
fid -0.540 -0.006 0.002 -0.001 0.003
Standardized residuals:
Min Q1 Med Q3 Max
-2.80656156 -0.80840445 -0.07411791 0.62982805 3.34187612
Residual standard error: 0.0794082
Degrees of freedom: 484 total; 478 residual
Also, if I fit the model gls_model <- gls(NDVI_mean ~ Management1, correlation = corCAR1(form = ~date | Treatment), data = Data_2018_2022), the output of the summary() seem to be valid, however, the standardized residuals vs. fitted values plot has a pattern.
Question 1: Do I interpret the output of the second model correctly by saying, that the (Intercept) value is the mean value of the NDVI_mean in Management1Control when Management1 was equal to 0 and e.g. Management1NR would be equal to the Intercept minus -0.4227368 ?
Question 2: How is it possible, that the model seems to produce a (from my understanding) valid output only if I include the fid in the model ?
EDIT
I followed the suggestion of @EdM and fitted the model gls_model <- gls(NDVI_mean ~ Management1 * date, correlation = corCAR1(form = ~date | Treatment), data = Data_2018_2022)
The summary() now is
eneralized least squares fit by REML
Model: NDVI_mean ~ Management1 * date
Data: Data_2018_2022
AIC BIC logLik
-1289.247 -1239.312 656.6233
Correlation Structure: Continuous AR(1)
Formula: ~date | Treatment
Parameter estimate(s):
Phi
0.9801462
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.598611 0.5475773 1.093199 0.2749
Management1NR -2.691958 0.7743913 -3.476225 0.0006
Management1PR1/2 -2.344346 0.7743913 -3.027341 0.0026
Management1PR3/4 -2.427570 0.7743913 -3.134811 0.0018
Management1Clearcut -4.913910 0.7069193 -6.951162 0.0000
date 0.000009 0.0000296 0.306608 0.7593
Management1NR:date 0.000123 0.0000419 2.944852 0.0034
Management1PR1/2:date 0.000104 0.0000419 2.488846 0.0132
Management1PR3/4:date 0.000111 0.0000419 2.640983 0.0085
Management1Clearcut:date 0.000242 0.0000383 6.331952 0.0000
Correlation:
(Intr) Mng1NR Mn1PR1/2 Mn1PR3/4 Mngm1C date Mn1NR: M1PR1/2: M1PR3/4:
Management1NR -0.707
Management1PR1/2 -0.707 0.500
Management1PR3/4 -0.707 0.500 0.500
Management1Clearcut -0.775 0.548 0.548 0.548
date -1.000 0.707 0.707 0.707 0.774
Management1NR:date 0.707 -1.000 -0.500 -0.500 -0.548 -0.707
Management1PR1/2:date 0.707 -0.500 -1.000 -0.500 -0.548 -0.707 0.500
Management1PR3/4:date 0.707 -0.500 -0.500 -1.000 -0.548 -0.707 0.500 0.500
Management1Clearcut:date 0.774 -0.548 -0.548 -0.548 -1.000 -0.775 0.548 0.548 0.548
Standardized residuals:
Min Q1 Med Q3 Max
-2.44410198 -0.65724392 -0.07158163 0.69272402 2.54105053
Residual standard error: 0.06669597
Degrees of freedom: 484 total; 474 residual

Treatmentvariable.Treatment, notdate, should be after the "|" symbol in your formulas. To handle autocorrelations over time, you might include something like~ date|Treatment, withdatekept as a fixed predictor. For the best ways to formulateglsmodels, see Chapter 7 of Frank Harrell's Regression Modeling Strategies. – EdM Feb 29 '24 at 15:49datedoesn't allow for the slope with respect todateto differ amongManagement1values, yet the plots show very different slopes. You need aManagement1*dateinteraction term for that. Thefidnumbers are just arbitrary indices of the order of measurements and shouldn't be used at all. – EdM Feb 29 '24 at 16:56