Bayes probability confidence

Question

I use Bayes theorem to estimate the impact of a sales person on customer's decision to buy a product.

$ P(buy|salesperson) = \frac{P(salesperson|buy) P(buy)}{P(salesperson)} $

Naturally, some salespersons are more active than others, which is taken care of by the denominator. However, I'm still having hard time to interpret similar values of $P(buy|salesperson)$ for two representatives with vastly different values of $P(salesperson)$. Take for example the following situation:

A dataset contains 1000 interactions between sales representatives and potential customers
Salesperson A performed 200 interactions ($P(salesperson A) = 0.2$)
Salesperson B performed 2 interactions ($P(salesperson B) = 0.002$)
The posterior probability $P(buy|salesperson)$ of the two representatives is identical

Am I less confident in the posterior probability calculated for salesperson B (I assume I am)? How do I communicate this confidence?

NOTE there are more salespersons that A and B and there are more buying events than those performed by A and B. At this point I'm only concerned about comparing the probabilities of these two ones

See also http://stats.stackexchange.com/questions/3496/why-break-down-the-denominator-in-bayes-theorem — Tim, Apr 02 '15 at 10:22

score 2 · Answer 1 · answered Apr 02 '15 at 12:38

I suggest that perhaps you should adopt a different approach for this problem, since you seem to be interested in not only estimating the effectiveness of each salesperson but also the precision with which that effectiveness has been measured. The logistic regression framework suits this problem well, and you can use frequentist or Bayesian methods to conduct it.

What you will want to do is use salesperson as a factor variable ($m$ is the number of salespersons) and estimate a coefficient for each individual salesperson.

\begin{align} Y_{i} &\sim Bernoulli( p_{i}) \\ \log{\left(\frac{p_{i}}{1-p_{i}}\right)} &= \mu + \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_m x_{im} \end{align}

In this setup $Y_{i}$ is the random variable reflecting whether customer $i$ makes a purchase, $p_{i}$ is the probability that customer $i$ makes the purchase, $\mu$ is the average probability that a customer makes a purchase, $\beta_j$ is a performance coefficient for salesperson $j$ and $x_{ij}$ is $1$ if salesperson $j$ attempted to sell to customer $i$ and $0$ otherwise. Since $x_{ij}$ are actually colinear if all used you will actually want to omit one of them (example choices are $1$, $m$ or whichever has most sales) and then the values of $\beta_j$ will give sales performance versus your omitted salesperson.

If you estimate this in a frequentist statistics package you will get SE and confidence intervals for the parameters $\mu$ and $\beta_j$. In a Bayesian package you will need to specify vague priors for these and then you will get a posterior distribution from which you can get SD and credible intervals. In a simple application there is not much to be gained from either approach, except perhaps that in a Bayesian package it might be easier to compute rank probabilities (i.e., the probability that salesperson $j$ is the best, second-best, third-best, etc.).

Things get more complicated if you have multiple attempts to sell to the same customer (potentially with multiple visits from multiple salespersons), in which case you will need to look at a mixed model approach.

Here are example formulae in Stata and R based on a toy dataset with four salespersons (Alice has 3 sales out of 7 customers, Bob 2 out of 5, Charles 1 out of 2, Danny 2 out of 6)...

In Stata:

. logistic sale i.salesperson_id

Logistic regression                               Number of obs   =         20
                                                  LR chi2(3)      =       0.22
                                                  Prob > chi2     =     0.9745
Log likelihood = -13.350794                       Pseudo R2       =     0.0081

--------------------------------------------------------------------------------
          sale | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
salesperson_id |
          Bob  |   .8888889   1.057989    -0.10   0.921     .0862412    9.161782
      Charles  |   1.333333   2.143034     0.18   0.858     .0571247    31.12102
        Danny  |   .6666667   .7698004    -0.35   0.725     .0693467     6.40902
               |
         _cons |        .75    .572822    -0.38   0.706     .1678593    3.351021
--------------------------------------------------------------------------------

In R:

> glm.out <- glm(sale ~ salesperson, family=binomial, data=df)
> summary(glm.out)

Call:
glm(formula = sale ~ salesperson, family = binomial, data = df)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.1774  -1.0226  -0.9005   1.3018   1.4823  

Coefficients:
                   Estimate Std. Error z value Pr(>|z|)
(Intercept)         -0.2877     0.7638  -0.377    0.706
salespersonBob      -0.1178     1.1902  -0.099    0.921
salespersonCharles   0.2877     1.6073   0.179    0.858
salespersonDanny    -0.4055     1.1547  -0.351    0.725

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 26.920  on 19  degrees of freedom
Residual deviance: 26.702  on 16  degrees of freedom
AIC: 34.702

Number of Fisher Scoring iterations: 4

Note that the estimates from Stata are in the form of odds ratios (use logit instead of logistic to change this) which are exponentiated versions of the R output. You can confirm for example that for Bob, Stata gives an odds ratio of 0.8888889 which is equal to exp(-0.1178) from R (up to rounding error).

You will also see that in both Stata and R the precision of the estimate is less (i.e., SE is greater) for Charles (only 2 customers) than for Bob and Danny.

Hope this helps!

Tim · Answer 2 · 2015-04-02T13:29:19.310

0

Bayes theorem states that:

$$ P(A|B) = \frac{ P(B|A)\,P(A)}{ P(B|A) P(A) + P(B|\neg A) P(\neg A) } = \frac{ P(B|A)\,P(A)}{ \sum_A P(B|A) P(A) } $$

so if in your case $buy$ stays the same, there is no way $P(buy|salesperson)$ could be the same for both cases because numerator changes while denominator does not change. The denominator is a total probability, that is constant (it makes the probabilities to sum to $1$). By total probability we mean in here sum over $B$ given all possible $A$'s, so in your case salespersons interactions given all possible buys.

edited Apr 02 '15 at 13:29

answered Apr 02 '15 at 10:18

Tim

138,066

I was talking about the situation where two probabilities do have the same values due to the different nominators and denominators. Also, see the note that I added to the question – David D Apr 02 '15 at 13:09
@DavidD denominators do not change and the example extends to more situations then only two events. – Tim Apr 02 '15 at 13:12

score 0 · Answer 3 · answered Apr 02 '15 at 11:48

Your data are 200 interactions for salesperson A and 2 interactions for salesperson B. You state "the posterior probability of the two representatives is identical", but this isn't a posterior probability. What you really mean is that the empirical sale proportion is identical.

To introduce notation, let $y_A$ be the number of sales for salespersonA out of $n_A=200$ interactions and $y_B$ be the number of sales for salespersonB out of $n_B=2$ transactions. What you have told us is $y_A/n_A=y_B/n_B$.

Now, let $\theta_A$ and $\theta_B$ be the true sale probability for salesperson A and B, respectively. And let's assume independent Jeffreys priors for $\theta_A$ and $\theta_B$, i.e. both independently have a Beta(1/2,1/2) prior. Now (using Bayes theorem) the posterior for salesperson A's true sales probability is Be(1/2+y_A, 1/2+n_A) and the posterior for salesperson B's true sales probability is Be(1/2+y_B, 1/2+n_B). So yes, you have much less uncertainty in salesperson A's true sales probability than you have in salesperson B's true sales probability.

Notice that the question is not about using priors but about calculating probabilities using observed proportions ($buy$ is not a prior, but a proportion of observed purchases). — Tim, Apr 02 '15 at 12:10
I did not say that $y_A/n_A = y_B / n_B$. I said that the result of the entire calculation is the same. Also, I noticed that I wasn't too clear in my question, therefore, see the note at the end — David D, Apr 02 '15 at 13:14

Bayes probability confidence

3 Answers3