2

This is my dataset:

PageID  Hits    ClickThroughs    CTR
  1     1000     400             40%
  2     10       8               80%

I'm trying to calculate click-through rate (CTR) as a measure of how good the page content is.

However, clearly page 1 has much more reliable data than page 2 (1000 vs 10 hits).

What is a good way of weighting this data to reflect the amount of data? Multiplying the CTR by the number of hits seems boundless.

  • 2
    Could you elaborate on how weighting the data would address the question of measuring the quality of page content? Wouldn't you rather want to test whether the CTRs differ significantly between the two pages? – whuber Feb 25 '15 at 21:45

2 Answers2

1

I'm currently trying to do something very similar.

I have decided to use the Lower bound of Wilson score confidence interval for a Bernoulli parameter as talked about by Evan Miller here

In you're example above the lower bound would be:

PageID  Hits     ClickThroughs   lowerBoundCTR    
  1     1000     400             37%
  2     10       8               49%

If we had 10 hits and 4 ClickThroughs (Same CTR as PageId 1) it would look like this.

PageID  Hits    ClickThroughs    lowerBoundCTR    
  N     10      4                17%

I am using a 95% Confidence Interval.

Also I am using a Prior to help when there are no click throughs on a low sample count as discussed here

0

This post involves some related issues: Linear Regression Coefficients and Ratios

N = clicks, D = hits. There's a positive coefficient on N (clicks), and a negative coefficient on D (hits). This suggests that CTR is moving up. However, the coefficient on N/D is negative.

ABC
  • 489