1

I have 100K customers from a population with a KPI name "spent" which is lognormal distributed.

The customers were split into 2 groups - 1st group - customers with an id ending with 0 - control 2nd group - customers with an id ending with 1-9 - treatment

The treatment group got 20 marketing campaigns during 2022. during 2021 no marketing campaign was made - for both groups.

When comparing the difference between the groups I can see 10% uplift in terms of total spent (after multiplying by 9 to overcome the size difference). The uplift is already exist in the groups even before the campaigns. This uplift is sourced to less then 0.1% of the customers - some of the customers spent a lot.

Does it make sense to compare the total spent in order to evaluate the effectiveness of the marketing campaigns? if not, why?

  • 1
    Why would you multiply the observations in one group by 9 just because the other group is nine times as large? What kind of comparison between the two groups did you have in mind? What is the treatment you plan on applying to one group? – Stephan Kolassa Jan 30 '23 at 21:43
  • I measure the total spent each group has made. The treatment group gets some marketing campaigns while the control group doesn't. Since I'm measuring a total spent and not average/median it's necessary to overcome to differences in the groups sizes (I can't change the groups allocation/size) – Moti laluom Jan 31 '23 at 07:53
  • 1
    Please edit the body of your question to add relevant information, rather than in comments. It's not clear whether you have performed the experiment or plan do to so. And you mention "some marketing campaigns" (plural), not "campaign". How many campaigns are there? – dipetkov Jan 31 '23 at 08:28
  • If you only look at total spend, then the distribution is not all that relevant (except insofar as it makes total spend more variable). You are comparing only two data points. If that is what you want to do, then yes, you can scale total spend. Which is actually the precise same thing as calculating and comparing average spend in the two groups, which you would calculate by dividing each group's total spend by the number of individuals in each group. – Stephan Kolassa Jan 31 '23 at 10:04
  • I edited my question, waiting for approval. – Moti laluom Jan 31 '23 at 18:26
  • @StephanKolassa - there is an uplift for the treatment group, even before marketing campaigns. this uplift is due to less than 0.1% of the users in the treatment group. Those kind of customers are rare and they shift every KPI. I was wondering that maybe it's not correct to compare the total spent in such a case (and maybe it's related to the distribution) – Moti laluom Jan 31 '23 at 18:28
  • 1
    But you won't see that this is due to 0.1% of your customers if all you look at is total spend per group. That is why I believe that looking at total spend is too coarse. Your unit of measurement should be your separate customers. – Stephan Kolassa Jan 31 '23 at 19:35
  • I personally think you have to back up and describe your problem in much more detail. (ideally the ab test should be rerun differently). given that there is a difference in your groups spending even before the treatment, perhaps a difference in difference analysis[https://en.wikipedia.org/wiki/Difference_in_differences#:~:text=Difference%20in%20differences%20(DID%20or,'%20versus%20a%20'control%20group'] – seanv507 Jan 31 '23 at 20:15
  • You can calculate a difference between the total revenue as if all customers were treated versus the total revenue as if no customers were treated. This is an apples-to-apples comparison since it keeps group size constant. Here is an example with monthly users, but the same approach could be applied to revenue (just don't log revenue with the het-robust Poisson). You can control for pre-test expenditure as a covariate. – dimitriy Jan 31 '23 at 20:25

1 Answers1

1

A few things here:

The customers were split into 2 groups - 1st group - customers with an id ending with 0 - control 2nd group - customers with an id ending with 1-9 - treatment

This is not proper randomization. Probably fine for a one-experiment, but if you use this scheme in repeated experiments, then you will be testing systematic differences between those with an ID ending in 0, and everyone else. Its best to literally randomize using a pRNG or something. Additionally, by only putting users with an ID ending in 0 in the control, you've hurt yourself with respect to power. The most efficient designs are 50/50 designs. Aside from that...

When comparing the difference between the groups I can see 10% uplift in terms of total spent (after multiplying by 9 to overcome the size difference).

I understand why you did this, and I think its a bad idea. The total amount spent is simply the product of the sample mean and the sample size. Since we know a lot about the sample mean, I think its better to analyze that.

The uplift is already exist in the groups even before the campaigns. This uplift is sourced to less then 0.1% of the customers - some of the customers spent a lot.

This is unsurprising to me, many types of comapnies have so called "whales": users who spend thousands of dollars and drive the majority of the revenue.

What I might suggest is that you use log spend instead. This can reign in the spend of whales so that OLS is other regression methods are more tenable. Then, you can use something like a difference in difference as seanv507 notes. However, if there are customers with spend=0 then this won't work for obvious reasons (and doing log 1 + spend doesn't give you valid estimates of the ATE either [1].

Does it make sense to compare the total spent in order to evaluate the effectiveness of the marketing campaigns? if not, why?

As I mentioned above, "No" because we know more about the properties of the mean and the mean is a rescaled total spend.

  • Thanks for the helpful comment. I'll explain it a bit better - the method I described was already used in the organization. I understand it's not optimal (or even correct) and I need to explain and provide proof about why it's not optimal/correct (mainly because of the current numbers - an uplift for the treatment group means that the marketing campaigns are working). Once I explained the problems with the method - I'll be able to suggest a better method. – Moti laluom Feb 01 '23 at 14:41