1

I’m working on a dataset collected from a 2-way Anova design (kind of) on binary data. The purpose of the experiment is to explore how different types of email marketing incentives affect customers propensity to buy certain fruits.

  • The dependent variable is Purchase rate (Binary; if a customer purchased the offered fruit or not)

  • Factor A has 3 Levels:

    1. Silent Communication (No incentive offered. To observe customers natural propensity to buy any given fruit.)
    2. Email explaining the benefits of consuming the fruit being offered
    3. Same as 2 but with a 15% discount on the fruit offered
  • Factor B has 5 levels. Customers in these groups are incentivized to consume:

    1. Bananas
    2. Apples
    3. Oranges
    4. Kiwis
    5. Fruit of choice (Of the 4 above)

A sample of 150 000 subscribers were randomly allocated to one of the 15 groups with 10k subscribers in each group.

The questions I have are the following:

  • Which method or technique should I use to account for the binary response variable?
  • When testing the main effect for Factor B, I do not want to test the difference in purchase rates in Table 1 (which is what is computed by default in SAS). Rather, I’m interested in comparing the differences in purchase uplifts between the groups, i.e. the numbers in table 2 highlighted in green.

I’m using SAS and would appreciate to get some guidance and advise on how to handle the two issues described?

enter image description here

Bilal
  • 11
  • Is this a question from a course or textbook? If so, please add the [tag:self-study] tag & read its wiki. – kjetil b halvorsen Feb 23 '23 at 16:06
  • It's not a course or textbook case, it's a real scenario - only that it in actuality fruits are not the objects studied. – Bilal Feb 23 '23 at 20:30
  • Do you have individual responses, that is 0/1 or yes/no? In that case, look into logistic regression. If you have proportions in groups, see https://stats.stackexchange.com/questions/216122/what-is-the-difference-between-logistic-regression-and-fractional-response-regre – kjetil b halvorsen Mar 01 '23 at 01:44

1 Answers1

2

Which method or technique should I use to account for the binary response variable?

For a binary response you can use logistic regression. In SAS that would be PROC LOGISTIC. You can also use PROC GLM with a binomial family and logit link, however PROC LOGISTIC is (obviously) designed for fitting logistic models and has extensive options for modelling, binary, and nominal response data, including various link functions, model selection methods, and diagnostic statistics (such as ROC analysis).

When testing the main effect for Factor B, I do not want to test the difference in purchase rates in Table 1 (which is what is computed by default in SAS). Rather, I’m interested in comparing the differences in purchase uplifts between the groups, i.e. the numbers in table 2 highlighted in green.

If I understand correctly there are several approaches for proportional/fractional outcomes: Tobit (with PROC QLIM or NLMIXED), NLS (nonlinear least squares, with PROC NLMIXED), fractional logit (PROC NLMIXED or GLIMMIX, beta (PROC GLIMMIX or NLMIXED, and simplex (PROC NLMIXED). If you have negative uplifts, or uplifts greater than 100% this may limit your options.

Robert Long
  • 60,630