How do I analyze a mixture of yes/no and success/failure data?

Question

My study is looking at different campaigns, what strategies each campaign used and if the campaign was successful or not. I want to try to figure out if a variable has more of an impact or if a mixture of the variables is more likely to result in a success.

I essentially want to be able to say including "this strategy" means it is more likely that your campaign will be successful.

Please help

(It would be better to provide your actual data, it doesn't seem to be a very large dataset.) What is the point of this analysis? — gung - Reinstate Monica, Aug 13 '19 at 20:22
There's not really any difference between "yes/no" and "success/failure" except semantics. — AdamO, Aug 13 '19 at 20:23
Is the question just asking what techniques could be used to predict the response variable SuccessOrFailure (which actually has three levels) using the other binary variables as predictors? — Joe, Aug 13 '19 at 20:39
Yes I want to know what statistical test I can do to predict the response/dependent variable (Success/Failure) using the binary variables. (Sorry I have changed between writing that as success/fail and Success/fail/mixed) — Robin, Aug 13 '19 at 20:42
Maybe if you edit your question to ask something like which techniques are suitable for predicting a categorical target using binary features, it will be taken off hold. In the meantime I’d recommend looking at decision trees, or even better random forests. But I don’t know them well enough myself to give an answer. — Joe, Aug 14 '19 at 00:39

score 1 · Accepted Answer · answered Aug 14 '19 at 10:43

1

In case you are interested in the independent effect of each separate yes/no on your outcome, a simple logistic regression should suffice in giving you some insights to start with. For example, if you coded all your campaign choices into zero if 'No' and one if 'Yes', you could just add all campaign choice related columns as independent variables and your outcome as the dependent variable and evaluate the coefficients.

Note that this assumes there is no compound effect into using different campaign choices jointly (e.g. that Mass Media has a different effect in case some campaign includes Audits relative to a campagin which doesnt - I just choose two variables in your set at random in this example, no idea if this is a reasonable one). This analysis would 'just' yield insights into whether switching from no to yes on one of your variables is associated with an increase in the succes change you defined, ceteris paribus (everything else being equal). In other words, all variables affect your dependent variable independently from one another and in an additive manner.

If you are interested in just predicting the outcome, perhaps classification models like Random Forests would be more successful.

answered Aug 14 '19 at 10:43

Mark Verhagen

616

Thank you! This was super helpful I was trying to do logistic regression, but it wasn't working, however your comment has shown me what I was doing wrong. I will likely use simple logistic regression, but out of curiosity is there a way to look at the compound effect? – Robin Aug 14 '19 at 15:42
It all depends on what you feed the model. If it is simply additive, e.g. Outcome = Paper_Media + Mass_Media the two effect will be independent. You could also say: Outcome = Mass_Media + Paper_Media + Paper_Media_and_Mass_Media. In this case you have the effect of Mass_Media and Paper_Media separate and an additional additive effect when they are both present. You would construct another variable which is one if both Paper_Media and Mass_Media are present and zero otherwise. In other words: just think about it as a simple equation.. – Mark Verhagen Aug 14 '19 at 19:06
Thank you for your help!! – Robin Aug 14 '19 at 23:57
If you're question is answered I think you should mark it as such such that it doesn't remain opened – Mark Verhagen Aug 15 '19 at 07:07

How do I analyze a mixture of yes/no and success/failure data?

1 Answers1