0

I'm trying to create a robust logistic regression model, because I detected outliers in my data, but I'm receiving this error:

Error in solve.default(crossprod(X, DiagB * X)/nobs, EEq) : 
  system is computationally singular: reciprocal condition number = 1.38186e-16

I already search about, a lot of people said that it occurs because of multicollinearity between variables, but even if I remove a variable with -0.82 of correlation, the error continues.

  • I'm running two models, because I want to analyse two different outcomes and its variables.

My code:


dadosSRAG_final <- subset(dados_SRAGmut, select = c(UTI, OBITO, 
    VACINA, VACINA_COV, ANTIVIRAL, INFLUENZA, OUTROS_VIRUS, 
    OUTRO_AGENTE, NAO_ESPECIF, COVID))
__________________ Análises ____________________________________

Váriavel Dependente/Desfecho: UTI e Óbito

Análise das frequências das categorias da VD

table(dadosSRAG_final$UTI) table(dadosSRAG_final$OBITO)

summary(dadosSRAG_final)

Checagem das categorias de referência

levels(dadosSRAG_final$UTI) levels(dadosSRAG_final$OBITO)

levels(dadosSRAG_final$VACINA) levels(dadosSRAG_final$VACINA_COV) levels(dadosSRAG_final$ANTIVIRAL) levels(dadosSRAG_final$INFLUENZA) levels(dadosSRAG_final$OUTROS_VIRUS) levels(dadosSRAG_final$OUTRO_AGENTE) levels(dadosSRAG_final$NAO_ESPECIF) levels(dadosSRAG_final$COVID)

-- NÃO --

---- Checagem dos pressupostos ----

1 - VD dicotômica

2 independência das observações

Criação do modelo

modUTI <- glm(UTI ~ VACINA + VACINA_COV + ANTIVIRAL + INFLUENZA + OUTROS_VIRUS + OUTRO_AGENTE + NAO_ESPECIF + COVID, family = binomial(link = 'logit'), data = dadosSRAG_final)

modOBITO <- glm(OBITO ~ VACINA + VACINA_COV + ANTIVIRAL + INFLUENZA + OUTROS_VIRUS + OUTRO_AGENTE + NAO_ESPECIF + COVID, family = binomial(link = 'logit'), data = dadosSRAG_final)

3 Ausência de outliers (não atendido - outliers PRINCIPALMENTE no modelo óbito)

plot(modOBITO, which = 5) plot(modUTI, which = 5)

summary(stdres(modOBITO)) # max > 11 summary(stdres(modUTI)) # max 3.43

influencePlot(modOBITO) # 6 pontos de influencia influencePlot(modUTI) # 4 pontos de influencia

4 Ausência de multicolinearidade

vif(modOBITO) vif(modUTI)

influence.measures(modOBITO) influence.measures(modUTI)

pairs.panels(dadosSRAG_final)

5 Relação linear entre VI contínua e logito da VD

PS: Todas as VIs do modelo são categóricas

------ Modelo robusto na presença de outliers -----

install.packages("robustbase") library(robustbase)

Ajuste do modelo com o estimador de Huber

modUTI_rob <- glmrob(UTI ~ VACINA + VACINA_COV + ANTIVIRAL + INFLUENZA + OUTROS_VIRUS + OUTRO_AGENTE + NAO_ESPECIF + COVID, binomial, data = dadosSRAG_final, method = "Mqle", control = glmrobMqle.control(tcc = 1.5))

modOBITO_rob <- glmrob(OBITO ~ VACINA + VACINA_COV + ANTIVIRAL + INFLUENZA + OUTROS_VIRUS + OUTRO_AGENTE + NAO_ESPECIF + COVID, binomial, data = dadosSRAG_final, method = "Mqle", control = glmrobMqle.control(tcc = 1.5))

This is the first time that I work with regression models, so I really would like suggestions and references in the area. This problem is occuring because of correlation? There is another robust regression that I could try?

enter link description here

Lays
  • 1
  • 1
    https://stats.stackexchange.com/questions/76488/error-system-is-computationally-singular-when-running-a-glm?rq=1 – Steven Gubkin Jul 25 '23 at 19:06
  • 1
    If I had to guess crossprod(X, DiagB * X) is the hessian of the loss function, which is being used as part of an algorithm like https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm. This matrix is not invertible if you have multicollinearity. Are you sure your design matrix has linearly independent columns? – Steven Gubkin Jul 25 '23 at 19:09
  • 1
    Robust regressions don't handle collinearity. Just looking at the names of your variables, it sure seems like you have highly related variables. You can look at condition indexes and variance explained to find out where it is. – Peter Flom Jul 25 '23 at 19:26
  • These variables that are possibly correlated are the result of a single categorical variable that I had to transform, but they are not relevant to the study. Thank you all for the comments. – Lays Jul 25 '23 at 21:46

0 Answers0