2

Let's say, for example in an e-commerce website, I create a kernel density estimate for every sold items at their price point. I also create another KDE for every listed items at their price point.

Does it make sense to divide the first KDE by the second KDE to interpolate conversion rate at every price point? Is it even a valid method?

mitbal
  • 171
  • 1
    I am afraid the results will depend in much extend on bandwidth selection and a number of other factors. What exactly do you want to learn from this data? Why do you want KDE's rather then empirical CDF's? – Tim Jun 13 '17 at 08:35
  • Hi Tim, the goal is to to calculate conversion rate. At first, I use histogram to bin the data, and then calculate conversion rate for each bin. However, there comes the problem of choosing the number and width of each bin. Therefore, I believe using KDE would be better. However, I'm not sure if now I can just divide them like before because it's continuous now.

    I'm not well versed in empirical CDF, would you mind elaborating it more? Thanks!

    – mitbal Jun 13 '17 at 08:59
  • 1
    But in KDE instead of choosing the number of width of bins, you choose bandwidth that is basically the same as bin width, so it does not solve your problem... – Tim Jun 13 '17 at 09:05
  • The bandwidth is computed automatically using rule-of-thumbs, in this case it's scott's rule. Also the result is smooth compared to histogram, which is better in my opinion for my case because I need to interpolate the rate at new price point. – mitbal Jun 13 '17 at 09:13
  • But the same you can calculate it automatically for histograms (in fact using the same rules of thumb as it is commonly done). – Tim Jun 13 '17 at 09:20

0 Answers0