3

I have run 16 parsimonious models to understand how a combination of different land management practices affect different soil health measures. As a result, I have a spreadsheet of AIC values, some of which are negative, and some of which are positive. I'm unsure how to interpret my results. The guidance I have seen is that I should report on the differences between AIC values, which I have calculated based on the actual, not absolute values. Someone advised me that I should use absolute numbers, but in another reference I was advised to use actuals.

Question 1: Do I use the actual or the absolute AIC values to make inferences in my analysis?

Assuming I should use the actuals, I created the following dataframe that shows the differences between the lowest value and all other values within each row. This is the first time I'm running this kind of assessment and am not sure what to make of the results.

Question 2: How should I interpret these results?

 structure(list(Model.. = 1:16, Model.Elements = c("YrsProd", 
"YrsProd", "YrsProd", NA, "YrsProd", NA, "YrsProd", NA, NA, "YrsProd", 
NA, NA, NA, "BacBio", "BacBio", "BacBio"), X = c("SBScore", "SBScore", 
NA, "SBScore", "SBScore", "SBScore", NA, NA, "SBScore", "SBScore", 
NA, NA, NA, "FunBio", "FunBio", "FunBio"), NA. = c(NA, "pH", 
"pH", "pH", NA, NA, NA, "pH", "pH", "pH", NA, NA, NA, NA, NA, 
NA), NA..1 = c(NA, NA, NA, NA, "Perennial", "Perennial", "Perennial", 
"Perennial", "Perennial", "Perennial", NA, NA, NA, "BenProt", 
NA, NA), NA..2 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "SOM", 
"SOM", NA, "SOM", "SOM", NA), NA..3 = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, "Com", NA, "Com", NA, NA, NA), SOM = c(1314.472767, 
1095.132025, 1096.183549, 1130.296273, 554.3351257, 548.7542621, 
553.7356533, 360.7382493, 360.8527872, 360.9928373, NA, NA, NA, 
NA, NA, 428.9640882), POXC = c(1947.105014, 1720.483655, 1719.949753, 
1777.700697, 698.0485938, 692.9240173, 697.2242748, 508.0114938, 
507.0777913, 506.8496062, 1731.421306, 1737.507492, 1782.597714, 
604.9772789, 605.6472327, 621.9737525), BacBio = c(1165.911856, 
950.049657, 952.1720648, 981.5008085, 487.6220312, 482.5516256, 
486.2963584, 296.6523054, 296.9486512, 296.4908118, 973.6762375, 
976.5909365, 980.0057965, NA, NA, NA), FunBio = c(1076.981965, 
854.9213743, 854.9612835, 883.2867531, 466.6207348, 461.058372, 
465.2552243, 273.9343312, 273.7010246, 272.9846919, 884.1551883, 
885.8864885, 884.2480296, NA, NA, NA), BenProt = c(1370.360309, 
1150.892243, 1149.013681, 1192.804548, 553.3829013, 547.8081346, 
554.66664, 361.1708171, 359.8737408, 359.5486555, 1188.534113, 
1188.872887, 1190.34951, NA, 981.1080351, 983.6783776), Com = 
c(1544.896872, 
1328.54915, 1326.561632, 1374.90775, 584.131621, 579.8991869, 
582.7350149, 393.9127021, 394.1995661, 392.9527655, NA, NA, NA, 
NA, NA, NA), NH4N = c(1202.98266, 979.5922112, 977.9649284, 1010.272336, 
531.4933145, 526.398481, 530.0851454, 334.7520171, 334.7021201, 
333.8647815, 994.1367046, 1006.445894, 1005.598376, 398.1351779, 
399.2918983, 400.06385), NO3N = c(1420.656559, 1203.959615, 1201.995317, 
1242.990423, 548.4628961, 543.4650557, 548.7632352, 358.518352, 
357.8238832, 357.4414132, 1226.24666, 1230.067054, 1233.105651, 
446.1347363, 445.8043376, 449.177169), S = c(46.6842825, 29.2123693, 
29.4736843, 30.2041342, 135.1626264, 134.5115966, 133.7507888, 
74.14012141, 72.94037996, 73.03398012, 30.4361496, 30.4086801, 
30.432426, 16.8381396, 16.5618659, 18.5675833), k = c(215.1684396, 
182.39603, 180.4375982, 186.9924906, 0, 0, 0, 0, 0, 0, 181.7135375, 
182.3013514, 181.8805866, 0, 0, 0), Bg = c(0, 0, 0, 0, 115.3009234, 
115.2786248, 113.9470485, 66.63794081, 65.55926991, 65.38614804, 
0, 0, 0, 9.2854468, 9.7059587, 12.2505873), Br = c(194.4223058, 
115.3166956, 114.1268919, 126.9869616, 184.6176987, 182.2309577, 
185.4722954, 80.71052635, 80.63817699, 80.30511454, 131.9172933, 
133.731912, 131.7243248, 86.22643541, 85.81626691, 85.85771194
)), class = "data.frame", row.names = c(NA, -16L))

Here is the code I used to generate differences in AIC values

Create a new data frame to store the differences

diff_AIC <- data.frame()

install.packages("dplyr")

library(dplyr)

List of numeric column names to calculate differences for

numeric_columns <- c("SOM", "POXC", "BacBio", "FunBio", "BenProt", "Com", "NH4N", "NO3N", "S", "k", "Bg", "Br")

Calculate differences for each numeric column using dplyr

diff_AIC <- AICData %>% mutate(across(all_of(numeric_columns), list(Difference = ~c(NA, diff(.x)))))

Calculate the minimum value within each row (handling non-numeric and missing values)

min_values <- apply(AICData, 1, function(row) { min(as.numeric(row), na.rm = TRUE) })

Create a data frame to store the within-row differences

within_row_diff_AIC <- data.frame()

Loop through each row and calculate within-row differences

for (i in 1:nrow(AICData)) { row_values <- as.numeric(AICData[i, ]) row_diff <- ifelse(is.na(row_values), NA, row_values - min_values[i]) within_row_diff_AIC <- rbind(within_row_diff_AIC, row_diff) }

Print the data frame with within-row differences

print(within_row_diff_AIC)

EBH
  • 31

1 Answers1

1

Here are my thoughts on your problem:

AIC is based on log likelihood, which can be negative. Interpretation for log likelihood is to take relative measure, not absolute. Hence the same should be applied to AIC

See, e.g.: https://www.codecademy.com/learn/how-to-choose-a-linear-regression-model-course/modules/choosing-a-linear-regression-model-course/cheatsheet

See also the question about whether log likelihood can be negative: https://stats.stackexchange.com/a/346726/364471

So to answer your questions:

Question 1: Use actual (relative) AIC values.

"Someone advised me that I should use absolute numbers." -- Ask this person for rationale or a proof that you should look at absolute AIC values when assessing the goodness of your models.

In general, if you are comparing 16 models, and you have AIC for all of them, then the lowest (relative value) represents the best model, e.g.:

Model | AIC
----------
M1    | -4
M2    | -2
M3    |  1
M4    |  5

Then M1 is the best one.

Question 2: See answer to question no. 1. My general advice is that you should use bar plot to represent AIC for particular models. Based on that you will see the relative differences and it will be easier for you to decide which model is best and by how far.

  • 2
    I agree that taking the absolute value of AIC would make the results meaningless (if any were negative to start with). But I worry the word "relative" will be misunderstood. You want to look at the differences (subtraction) between AIC values, not ratios (division). Some might misinterpret "relative value" to mean the ratio of two AIC values. – Harvey Motulsky Oct 20 '23 at 19:40
  • AIcs are on a log scale so differences are ratio-like. I think of them as log-likelihoods in sheep’s clothing. – DWin Oct 21 '23 at 02:39
  • @HarveyMotulsky, I completely agree, of course I was thinking about subtraction, not division, by all means. Thank you for spotting the potential ambiguity and for clarifying the interpretation. – Mikolaj Buchwald Oct 21 '23 at 20:40
  • @DWin, you are also right, but these ratio-like measures should not be further compared with each other in a ratio-like style (e.g., by division), but one should rather look at subtraction differences between AICs, as Harvey pointed out in his comment. – Mikolaj Buchwald Oct 21 '23 at 20:43
  • Agree completely. Your expression of the principles is clearer than mine was. If you are on the log-likelihood scale then only consider differences. AICs are likewise on a log scale. – DWin Oct 22 '23 at 04:00