0

I am running a linear regression model in R where soya beans yield is predicted by soya varieties (spike,maksoy,uyole and local), type of fertilizer used (DAP,NPK,unfertilized) and use of inoculant for improved Nitrogen fixing.

The model runs but it lets me know that Coefficients: (2 not defined because of singularities). A little research tells me that this is due to multicollinearity.

Can anyone put this in laymans language in the context of my data? Below is a runnable version of my code.

soya_data = readr::read_csv(paste0(
  "https://raw.githubusercontent.com/datakilimba/S38---Soya-Demos/master/",
  "soya_demo_data.csv"))

soya_yield_model = lm(yield ~ spike+maksoy+uyole+local+DAP+NPK+unfertilized+ inoculant, data = soya_data)

summary(soya_yield_model)

  • This isn't multicollinearity: it is collinearity, pure and simple. Two of your variables are perfect linear combinations of the remaining ones. The output of summary has already answered your question by identifying two such variables. When your software has a definite error message, it's always a good idea to search for it. That shows you other cases where the same message appeared and the question has an answer. – whuber Apr 19 '22 at 14:39
  • You apparently dummy coded all levels for each of variety and fertilizer_type separately. When you have predictors like that, combine all levels into a single multi-level categorical predictor. – EdM Apr 19 '22 at 14:54

0 Answers0