Find a model for two continuous predictors of a single categorical DV

Question

Suppose I ask subjects to place a value on two cups: A and B; on a scale from -10 to +10.

She says A is worth -4 and B is worth 9.

Now I say 'ok pick one'. She picks one. Now I ask 300 people the same thing.

Simply, I want to model how their valuation of the options predicts which cup they pick (e.g. the greater the value they assign to one over the other, the more likely they are to pick it).

What's the model?

(I want to use a linear model because I want to control for other factors).

Thank you!

user2974951 · Answer 1 · 2019-02-06T15:38:36.943

0

Here is my dumb example how I would do it. I'll just generate some random data. Basically you take the difference of the scores of A and B and use your glm model.

> library(lme4)
> df=data.frame(thot=rep(c("Alice","Barbara"),each=50),
>               value=runif(100,-10,10)-runif(100,-10,10))
> df$cup=factor(sample(rep(c("A","B"),each=50)))
> summary(glmer(cup~value+(1|thot),data=df,family=binomial(link="logit")))

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: cup ~ value + (1 | thot)
   Data: df

     AIC      BIC   logLik deviance df.resid 
   142.9    150.7    -68.4    136.9       97 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.33077 -1.00604  0.00095  0.99804  1.31654 

Random effects:
 Groups Name        Variance Std.Dev.
 thot   (Intercept) 0        0       
Number of obs: 100, groups:  thot, 2

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.02589    0.20274  -0.128    0.898
value        0.03221    0.02444   1.318    0.187

Correlation of Fixed Effects:
      (Intr)
value -0.097

Edit sample of data

> head(df)
   thot     value cup
1 Alice  2.899886   B
2 Alice  9.030560   B
3 Alice  1.281790   B
4 Alice 16.263392   B
5 Alice -5.369385   A
6 Alice  6.438415   A

edited Feb 06 '19 at 15:38

answered Feb 06 '19 at 14:34

user2974951

7,813

Great, thanks! So here's a dumb follow up(!):
First, what's the data frame look like? Is it one column for 'Alice' or 'Barbara', and another column for 'difference'? So, one column looks like 'alice, alice, alice, barbara, barbara, alice...', and the other looks like '-2, -5, 6, -1, 4...' ?

Second, what's 'thot' here?

thanks again... (new (ish) to models)
– cathalcom Feb 06 '19 at 15:35
@cathalcom I posted the whole code so you can check the whole process / data yourself in R, I added a snip of the data in the answer. Thot is the column for the people (all women in this case, alice and barbara), value is the difference in the scores of a and b, cup is what cup was selected. – user2974951 Feb 06 '19 at 15:38
Ah, right got it. Checked. Thanks. So, if for example all the scores were negative, would the model take that to be a significant bias toward picking the cup they ranked negatively? (that's what I'm predicting in my case). Also, I only have one observation per participant; will that affect things? – cathalcom Feb 06 '19 at 15:57
Finally, is the directionality in the difference based on the one in column 'cup'? So, in line 5 in the snippet, A is 5 points lower than B; while in line 4, B is 16 points higher than A. Is that right? So, if Alice had picked B in trial 5, that would read 'Alice, +5, B'. – cathalcom Feb 06 '19 at 17:36
@cathalcom I though your objective was to predict which cup they were going to predict, as for if all the scores were negative I really can't tell what would happen. if you have only one observation you do not need a mixed model such as I used, a regular glm would work. The score is always computed the same way, A - B, which means if the difference is positive then A had higher score, otherwise A had lower score. – user2974951 Feb 07 '19 at 07:47
thanks again for your help. Right, I get that about glmer, great. However I'm still missing something conceptually here: yes, I want the scores to predict the choices; so the difference column to predict the 'cup' column, perfect. But, under what circumstances will this model take the results to be significant? e.g. if all the scores are positive (so that A is always rated higher than B, and A is overwhelmingly chosen in the cup column), would the model say "ah, that's significant!", implying that the participants were significantly preferring the higher rated cup? – cathalcom Feb 07 '19 at 08:54
@cathalcom Logistic regression will test statistically whether there is a relation between the chosen cup and the difference in scores. That measure is summarized in the Fixed effects under value and estimate in my example, so a value 0.03221 (this is the log odds). This value tells you by how much do the log odds change if we increase / decrease the difference by 1 unit. For more information you can check this site's answers on interpretation of logistic regressionn https://stats.stackexchange.com/questions/86351/interpretation-of-rs-output-for-binomial-regression/86375#86375 – user2974951 Feb 07 '19 at 09:22
OK thanks. So, if the difference was always positive, will the model likely take that to indicate that the participants significantly preferred the cup they rated higher of the two? – cathalcom Feb 07 '19 at 09:40
@cathalcom Yes, if the effect is strong enough. – user2974951 Feb 07 '19 at 11:17
Ah... I'm starting to see the light. So, if there was a bias to pick the higher one, then if A>B, they'll pick A, but if A<B, they'll pick B. As a result, low scores would correlate with Bs (because when A<B, then A-B will be a low score), and high scores would correlate with As (because when A>B, then A-B will be a higher score)... right? – cathalcom Feb 07 '19 at 11:45
@cathalcom Yes, we are always computing the difference in the same order, A-B, so if A>B then the difference will be positive, which means A had a higher score, if A<B then the difference will be negative, which means A had a lower score. This would translate to positive values being related to picking A, and lower values being related to picking B, if this is indeed true (if these scores influence the choice of cup). – user2974951 Feb 07 '19 at 11:55
brilliant, I get it. To clarify: it is also true on this approach that it doesn't matter, from trial to trial, whether B is valued higher or lower than A. If in half the trials Bs are valued higher than As, and in half the trials Bs are valued lower, it will still come out that if the participants always prefer higher to lower valued cups, Bs will always correlate with lower scores than As. Yes? (Also if they preferred the 'lesser' of the two, this will simply be reversed (As would correlate with lower scores than Bs)). – cathalcom Feb 07 '19 at 12:03
Thanks again. Since you gave such a good answer I want to ask you about adding a control to the same setup. I want to ensure that 'absolute' value of the ratings doesn't explain the data (distance from zero negative or positive). How would you go about adding that? – cathalcom Feb 07 '19 at 18:14
@cathalcom Your last two questions I do not fully understand. Regardless, this discussion has been going on for too long and since you are asking new questions now, you should ask these in a new question. – user2974951 Feb 08 '19 at 08:19
ok I'll open another thread, thanks for your help! – cathalcom Feb 08 '19 at 12:04

Find a model for two continuous predictors of a single categorical DV

1 Answers1