-4

Suppose that I have data in order to run many linear regression model.

Data: https://www.img.in.th/image/TNHdEq

Given column C1 is y variable.

x variable is column C4 by create from column C2 and C3, Model1 is created by first row of column C2 and 8 rows remaining of column C3, Model2 is created by first 2 rows of column C2 and 7 rows remaining C3, Then to Model9 is created by first 8 rows of C2 and last row of C3.

Example x variable:

model1 : { b, d, i,...,z}

model2 : { b, f, i,..., z}

.

.

.

model9 : {b, f, h,..., z}

And select models by maximum R squared.

Question: How to code for it? loop?

Using both R and python.

Ps.Really, I use ordered probit model.And I have many rows 100+.

Thank you.

nitishagar
  • 8,480
  • 3
  • 22
  • 36
  • 4
    See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) how to ask a good question. And see [here](https://r4ds.had.co.nz/many-models.html) how to run multiple models in r –  Apr 16 '20 at 11:08

1 Answers1

0

To run many models can be done with *apply loops and the results output to a list object. In this case the loop variable will be the row number i, varying from 1 to nrow(df1) - 1.

n <- nrow(df1)
probit_list <- lapply(seq.int(n)[-n], function(i){
  C4 <- c(df1$C2[seq.int(i)], df1$C3[-seq.int(i)])
  C4 <- ordered(C4, levels = levels(df1$C2))
  dftmp <- data.frame(C1 = df1$C1, C4)
  tryCatch(glm(C1 ~ C4, data = dftmp, family = binomial(link = "probit")),
           error = function(e) e)
})

To see how many gave error run

ok <- sapply(probit_list, inherits, "error")
sum(!ok)

Test data

set.seed(1234)
n <- 9
df1 <- data.frame(
  C1 = rbinom(n, 1, prob = c(0.4, 0.6)),
  C2 = ordered(sample(1:4, n, TRUE), levels = 1:4),
  C3 = ordered(sample(1:4, n, TRUE), levels = 1:4)
)
Rui Barradas
  • 57,195
  • 8
  • 29
  • 57