0

We study the acceptation of students to the university according to different criteria. The parameters x are quantitative values. The parametre y is binary value, it is the result, if the student is admitted y equal to 1, if not y equal to 0. I'm trying to figure out which parameter x1, x2, x3 or x4 has the biggest influence on y in order to have 1. It can be useful to know the weight of each parametre on y.

For that, I'm using rcorr like this example. I'm not sure that I can use rcorr because of my y parametre. If not, which function should I use?

library(Hmisc)
x1 <- runif(50, min=0, max=100)
x2 <- runif(50, min=0, max=100)
x3 <- runif(50, min=0, max=100)
x4 <- runif(50, min=0, max=100)
y <- sample(0:1, 50, replace = TRUE)

d <- data.frame(x1,x2,x3,x4,y)
m <- as.matrix(d)
rcorr(m, type=c("pearson","spearman"))
Tali
  • 163

2 Answers2

1

In this case you may be better off using logistic regression rather than correlations for evaluating the relations between your continuous predictor variables and outcome. That will allow you to examine how all of the predictor variables together are related to admission success, and makes it possible to examine how interactions among the predictors might also be related to outcome.

Your search for "which parameter ... has the biggest influence on y in order to have 1," while understandable, may be dangerous. In general, trying to find a single predictor variable throws away the useful information from the other variables and can lead to severe problems with reliability. In particular, if some of your predictors are correlated then the particular one most highly related to outcome in your present data sample may not be so closely related when you try to apply your model to new cases. This Cross Validated page discusses the problems with model selection in logistic regression, and contains links to similar discussion in other contexts.

EdM
  • 92,183
  • 10
  • 92
  • 267
0

You will need to state to R, which package you take the function from. I guess, you'll want to use the Hmisc package, so

library(Hmisc)

Will have to be called before you code works. It then produces no warnings nor errors. You will have ties, but the manual describes how the functions deals with them ('midranks').

library(Hmisc)
help(rcorr)

If you are worried about ties, this may be of interest: https://stackoverflow.com/questions/10711395/spearman-correlation-and-ties

Cheers, Bernhard

Bernhard
  • 8,427
  • 17
  • 38