What is the probability that a student gets a better score than another on a test with randomly selected questions?

Question

Suppose there is a set $S$ of $100$ questions and there are $2$ students $a$ and $b$.

Let $P_{ai}$ be the probability that $a$ answers the question $i$ correctly, and $P_{bi}$ the same for $b$.

All $P_{ai}$ and $P_{bi}$ are given for $i = 1...100$.

Suppose an exam $E$ is made by taking $10$ random questions from $S$.

How can I find the probability of $a$ getting a better score than $b$?

I thought about checking the combinations and comparing the probabilities but it is a very large number and will take forever, so I ran out of ideas.

Let $A$ and $B$ be the number of correct answers for $a$ and $b$ respectively. Then by law of total probability: $P(A > B) = \sum_{x=0}^sP(A>B|B=x)P(B=x)$. If the probabilities differ from question to question (i.e. probabilities depend on i), then evaluating the individual probabilities may require going through all possible combinations. Possible remedies... 1. This is still reasonable using a computer to calculate the probabilities via brute force. 2. If you can assume that (marginally) the probabilities don't depend on $i$, then this is a simple binomial distribution. — knrumsey, Apr 14 '18 at 20:42
@knrumsey all $P_{ai}$ and $P_{bi}$ are fixed values and you can suppose $P_{ai}$ and $P_{bi}$ are initially defined randomly for $i = 1...100$. It is possible to use a computer and in fact I'm using it, but the combinations totalize $\frac{100!}{10!(100-10)!}$ which is pretty big to iterate through — Daniel, Apr 14 '18 at 22:28
What is the meaning of the randomly generated $p_{ai}$? If the $p_{ai}$ and $p_{bi}$ don't vary too much across $i$, then maybe a Binomial assumption will provide a reasonable approximation. Setting $p_a = \frac{1}{s}\sum_{i=1}^s p_{ai}$ and similarly for $p_b$. — knrumsey, Apr 15 '18 at 03:30
Two more comments: If the $p_{ai}$ and $p_{bi}$ are generated from the same distribution, then $P(A > B) $ should equal 1/2. Secondly, if you're okay with an approximation you could just do some Monte Carlo simulation to estimate the probability. — knrumsey, Apr 15 '18 at 03:36
At each iteration, $ P(A>B) = P(A)(1-P(B))$ because A is only better when A is right and B is wrong. So If for a particular question, A is right 90% of the time and B is right 80% of the time, then the joint probability that A is right and B is wrong is $ 0.9 0.2 = 0.18$ Now you could write some code that goes through all ten chosen questions and assigns a point to either A or B based on this joint probability. At the end, the winner is the one with more points. Do this thousands of times and look at the probability of A winning over B. This might be called Monte Carlo. — COOLBEANS, Apr 15 '18 at 21:40

score 6 · Accepted Answer · answered Apr 19 '18 at 21:41

A dynamic program will make short work of this.

Suppose we administer all questions to the students and then randomly select a subset $\mathcal{I}$ of $k=10$ out of all $n=100$ questions. Let's define a random variable $X_i$ to compare the two students on question $i:$ set it to $1$ if student A is correct and student B not, $-1$ if student B is correct and student A not, and $0$ otherwise. The total

$$X_\mathcal{I} = \sum_{i\in\mathcal{I}} X_i$$

is the difference in scores for the questions in $\mathcal I.$ We wish to compute $\Pr(X_\mathcal{I} \gt 0).$ This probability is taken over the joint distribution of $\mathcal I$ and the $X_i.$

The distribution function of $X_i$ is readily calculated under the assumption the students respond independently:

$$\eqalign{ \Pr(X_i=1) &= P_{ai}(1-P_{bi}) \\ \Pr(X_i=-1) &= P_{bi}(1-P_{ai}) \\ \Pr(X_i=0) &= 1 - \Pr(X_i=1) - \Pr(X_i=0). }$$

As a shorthand, let us call these probabilities $a_i,$ $b_i,$ and $d_i,$ respectively. Write

$$f_i(x) = a_i x + b_i x^{-1} + d_i.$$

This polynomial is a probability generating function for $X_i.$

Consider the rational function

$$\psi_n(x,t) = \prod_{i=1}^n \left(1 + t f_i(x)\right).$$

(Actually, $x^n\psi_n(x,t)$ is a polynomial: it's a pretty simple rational function.)

When $\psi_n$ is expanded as a polynomial in $t$, the coefficient of $t^k$ consists of the sum of all possible products of $k$ distinct $f_i(x).$ This will be a rational function with nonzero coefficients only for powers of $x$ from $x^{-k}$ through $x^k.$ Because $\mathcal{I}$ is selected uniformly at random, the coefficients of these powers of $x,$ when normalized to sum to unity, give the probability generating function for the difference in scores. The powers correspond to the size of $\mathcal{I}.$

The point of this analysis is that we may compute $\psi(x,t)$ easily and with reasonable efficiency: simply multiply the $n$ polynomials sequentially. Doing this requires retaining the coefficients of $1, t, \ldots, t^k$ in $\psi_j(x,t)$ for $j=0, 1, \ldots, n.$ (we may of course ignore all higher powers of $t$ that appear in any of these partial products). Accordingly, all the necessary information carried by $\psi_j(x,t)$ can be represented by a $2k+1\times n+1$ matrix, with rows indexed by the powers of $x$ (from $-k$ through $k$) and columns indexed by $0$ through $k$.

Each step of the computation requires work proportional to the size of this matrix, scaling as $O(k^2).$ Accounting for the number of steps, this is a $O(k^2n)$-time, $O(kn)$-space algorithm. That makes it quite fast for small $k.$ I have run it in R (not known for excessive speed) for $k$ up to $100$ and $n$ up to $10^5,$ where it takes nine seconds (on a single core). In the setting of the question with $n=100$ and $k=10,$ the computation takes $0.03$ seconds.

Here is an example where the $P_{ai}$ are uniform random values between $0$ and $1$ and the $P_{bi}$ are their squares (which are always less than the $P_{ai}$, thereby strongly favoring student A). I simulated 100,000 examinations, as summarized by this histogram of the net scores:

The blue bars indicate those results in which student A got a better score than B. The red dots are the result of the dynamic program. They agree beautifully with the simulation ($\chi^2$ test, $p=51\%$). Summing all the positive probabilities gives the answer in this case, $0.7526\ldots.$

Note that this calculation yields more than asked for: it produces the entire probability distribution of the difference in scores for all exams of $k$ or fewer randomly selected questions.

For those who wish a working implementation to use or port, here is the R code that produced the simulation (stored in the vector Simulation) and executed the dynamic program (with results in the array P). The repeat block at the end is there only to aggregate all unusually rare outcomes so that the $\chi^2$ test becomes obviously reliable. (In most situations this doesn't matter, but it keeps the sofware from complaining.)

n <- 100
k <- 10
p <- runif(n) # Student A's chances of answering correctly
q <- p^2      # Student B's chances of answering correctly
#
# Compute the full distribution.
#
system.time({
  P <- matrix(0, 2*k+1, k+1) # Indexing from (-k,0) to (k,k)
  rownames(P) <- (-k):k
  colnames(P) <- 0:k
  P[k+1, 1] <- 1
  for (i in 1:n) {
    a <- p[i] * (1 - q[i])
    b <- q[i] * (1 - p[i])
    d <- (1 - a - b)
    P[, 1:k+1] <- P[, 1:k+1] + 
      a * rbind(0, P[-(2*k+1), 1:k]) + 
      b * rbind(P[-1,  1:k], 0) + 
      d * P[,  1:k]
  }
  P <- apply(P, 2, function(x) x / sum(x))
})
#
# Simulation to check.
#
n.sim <- 1e5
set.seed(17)
system.time(
  Simulation <- replicate(n.sim, {
    i <- sample.int(n, k)
    sum(sign((runif(k) <= p[i]) - (runif(k) <= q[i]))) # Difference in scores, A-B
  })
)
#
# Test the calculation.
#
counts <- tabulate(Simulation+k+1, nbins=2*k+1)
n <- sum(counts)
k.min <- 5
repeat {
  probs <- P[, k+1]
  i <- probs * n.sim >= k.min
  z <- sum(probs[!i]) 
  if (z * n >= 5) break
  if (k.min * (2*k+1) >= n) break
  k.min <- ceiling(k.min * 3/2)
}
probs <- c(z, probs[i])
counts <- c(sum(counts[!i]), counts[i])
chisq.test(counts, p=probs)
#
# The answer.
#
sum(P[(1:k) + k+1, k+1]) # Chance that A-B is positive

score 2 · Answer 2 · edited Jun 11 '20 at 14:32

At each iteration,

$P(A>B)=P(A)∗(1−P(B))$

because A is only better when A is right and B is wrong. So for a particular question, if A is right 90% of the time and B is right 80% of the time, then the joint probability that A is right and B is wrong is

$ P(A,B) = 0.9 ∗ 0.2 = 0.18 $

Now you could write some code that goes through all ten chosen questions and assigns a point to either A or B based on this joint probability. At the end of each exam, the winner is the one with the most points. Do this many of times and look at the probability of A winning over B.

I've written some code that does this here: https://nbviewer.jupyter.org/github/kevinmcinerney/exam_probabilities/blob/master/exam_probabilities.ipynb

The graph doesn't render on the link, but looks like this:

This simulation used values $ P_{ai} = P_{bi} = 0.5$

In this example, I ran 1000 exams each one made up of 25 questions, but all form the same set of 100 questions. The y-axis is the probability that A did better than B on an exam. The exam number from (0-1000) is along the x-axis

Both A and B had random P of getting a question correct so the graph converges on 50%. I used 25 questions, because 10 isn't very representative of the population containing 100 questions. Using ten questions the graph is more likely to converge to a % close to or around 50%. The three lines are three separate trials.

I hope that the graphs would not converge on $50%.$ When the $P_{ai}$ and $P_{bi}$ are independently uniformly distributed between $0$ and $1,$ they should converge to a value very close to $41%.$ That is due to the fact that (a) both students have equal chances of coming out with a higher score and (b) there is about an $18%$ chance that they will tie. — whuber, Apr 19 '18 at 22:30
It wasn't clear in my answer, but the the simulation used $P_{ai} = P_{bi} = 0.5.$ Not the probabilities 0.8 and 0.9 used in an earlier example. Does that clarify or am I still missing your point? — COOLBEANS, Apr 19 '18 at 23:19
You miss the point: When none of the $(P_{ai},P_{bi})$ pairs is $(0,1)$ or $(1,0),$ there is a positive chance that the students will get the same total score. When all those pairs are equal, as in your example, it therefore becomes impossible that one student's score will exceed the other's with a 50% chance. In your example the chance is only 41%. A close look at your plot suggests it is consistent with 41% but not with 50%. (The exact answer in your case is easy to obtain because all the probabilities are equal: 41% approximates $215955/524288.$) — whuber, Apr 20 '18 at 13:40

What is the probability that a student gets a better score than another on a test with randomly selected questions?

2 Answers2

Linked