I am running a linear regression model in R where soya beans yield is predicted by soya varieties (spike,maksoy,uyole and local), type of fertilizer used (DAP,NPK,unfertilized) and use of inoculant for improved Nitrogen fixing.
The model runs but it lets me know that Coefficients: (2 not defined because of singularities). A little research tells me that this is due to multicollinearity.
Can anyone put this in laymans language in the context of my data? Below is a runnable version of my code.
soya_data = readr::read_csv(paste0(
"https://raw.githubusercontent.com/datakilimba/S38---Soya-Demos/master/",
"soya_demo_data.csv"))
soya_yield_model = lm(yield ~ spike+maksoy+uyole+local+DAP+NPK+unfertilized+
inoculant, data = soya_data)
summary(soya_yield_model)
summaryhas already answered your question by identifying two such variables. When your software has a definite error message, it's always a good idea to search for it. That shows you other cases where the same message appeared and the question has an answer. – whuber Apr 19 '22 at 14:39varietyandfertilizer_typeseparately. When you have predictors like that, combine all levels into a single multi-level categorical predictor. – EdM Apr 19 '22 at 14:54