4

I am currently experimenting with a TV attribution approach proposed by Google: Liu, Y., Schwarzkopf, Y., & Koehler, J. (2017). TV Impact on Online Searches.

They propose comparing website traffic after TV-spots to an estimated baseline. Because spots may overlap, these spots are first pooled in spotgroups and the spotgroup-uplifts are later disaggregated to the individual spots based on channel, impressions and hour of day.

The approach is outlined quite clearly (see Section 2.2 on p. 5f in the paper above), but I am not sure, how to implement the model (in R). I also posted my current state of experimentation with some toy data.

Basically this boils down to questions related to linear regression:

  • How can I implement the proposed multiplication between Channel and Impressions? Would this be a simple interaction effect? (say ch1:imp1 in R)
  • How can I account for a flexible number of spots within each spotgroup (compare sum over m in formula (3))?

There are plenty of spots with only one spot in the spotgroup. There are also quite a few different channels (around 20). How can I best model this? Do I manually have to create some dummy variables here?

(There may be multiple spots within the same spotgroup with the same channel also)

Desired Model

Proposed Model

Current Attempts

uplift_df <- tibble::tribble(
  ~uplift, ~ch1, ~imp1, ~hour1, ~ch2, ~imp2, ~hour2, ~ch3, ~imp3, ~hour3,
  200,    1,  5000,     19,    0,     0,      0,    1,   300,     19,
  50,    1,  4000,     22,    1,   500,     22,    0,     0,      0,
  400,    0,     0,      0,    1, 10000,     14,    1,   500,     14,
  80,    0,     0,      0,    0,     0,      0,    1,  1000,     21,
  10,    1,  1000,     12,    1,  2000,     13,    1,   500,     12,
  100,    1,  8000,     14,    0,     0,      0,    1,   300,     14,
  90,    1,  4000,     12,    1,   500,     12,    0,     0,      0,
  250,    0,     0,      0,    1, 10000,     14,    1,   500,     14,
  50,    0,     0,      0,    0,     0,      0,    1,  1000,     21,
  20,    1,  4000,     12,    1,  2000,     13,    1,   500,     12
)

lm(uplift ~ ch1:imp1 + hour1 + ch2:imp2 + hour2 + ch3:imp3 + hour3, data = uplift_df)
#> 
#> Call:
#> lm(formula = uplift ~ ch1:imp1 + hour1 + ch2:imp2 + hour2 + ch3:imp3 + 
#>     hour3, data = uplift_df)
#> 
#> Coefficients:
#> (Intercept)        hour1        hour2        hour3     ch1:imp1  
#>  278.551180   -43.146585    31.957221    64.208258    -0.001157  
#>    ch2:imp2     ch3:imp3  
#>   -0.051877    -1.575838

Created on 2019-02-20 by the reprex package (v0.2.1)

stats-hb
  • 289

0 Answers0