Difference between two binomial random variables (the Danish Mask Study)

The Danish Mask Study presents the interesting probability problem: the odds of getting 5 infections for a group of 2470, vs 0 for one of 2398. It warrants its own test statistic which allows us to look at all conditional probabilities. Given that we are dealing with tail probabilities, normal approximations are totally out of order. Further we have no idea from the outset on whether the sample size is sufficient to draw conclusions from such a discrepancy (it is).
There appears to be no exact distribution in the literatrue for Y=X_ 1-X_2 when both X_1 and X_2 are binomially distributed with different probabilities. Let’s derive it.

Let X_1 \sim \mathcal{B}\left(n_1,p_1\right), X_2 \sim \mathcal{B}\left(n_2,p_2\right), both independent.

We have the constrained probability mass for the joint \mathbb{P}\left(X_1=x_1,X_2=x_2\right)=f\left(x_2,x_1\right):

f\left(x_2,x_1\right)= p_1^{x_1} p_2^{x_2} \binom{n_1}{x_1} \binom{n_2}{x_2} \left(1-p_1\right){}^{n_1-x_1} \left(1-p_2\right){}^{n_2-x_2} ,

with x_1\geq 0\land n_1-x_1\geq 0\land x_2\geq 0\land n_2-x_2\geq 0.

For each “state” in the lattice, we need to sum up he ways we can get a given total times the probability, which depends on the number of partitions. For instance:

Condition for Y \geq 0 :

\mathbb{P} (Y=0)=f(0,0)=

\mathbb{P} (Y=1)=f(1,0)+f(2,1) \ldots +f\left(n_1,n_1-1\right),

so

\mathbb{P} (Y=y)=\sum _{k=y}^{n_1} f(k,k-y).

Condition for Y < 0:

\mathbb{P} (Y=-1)=f(0,1)+f(1,2)+\ldots +\left(n_2-1,n_2\right),

alora

\mathbb{P} (Y=y)\sum _{k=y}^{n_2-y} f(k,k-y) (unless I got mixed up with the symbols).

The characteristic function:

\varphi(t)= \left(1+p_1 \left(-1+e^{i t}\right)\right){}^{n_1} \left(1+p_2 \left(-1+e^{-i t}\right)\right){}^{n_2}

Allora, the expectation: \mathcal{E}(Y)= n_1 p_1-n_2 p_2

The variance: \mathcal{V}(Y)= n_1^2 p_1^2 \left(\left(\frac{1}{1-p_1}\right){}^{n_1}\left(1-p_1\right){}^{n_1}-1\right)-n_1 p_1 \left(\left(\frac{1}{1-p_1}\right){}^{n_1}\left(1-p_1\right){}^{n_1}+p_1 \left(\frac{1}{1-p_1}\right){}^{n_1}\left(1-p_1\right){}^{n_1}+2 n_2 p_2 \left(\left(\frac{1}{1-p_1}\right){}^{n_1}\left(1-p_1\right){}^{n_1} \left(\frac{1}{1-p_2}\right){}^{n_2}\left(1-p_2\right){}^{n_2}-1\right)\right)-n_2 p_2 \left(n_2 p_2\left(\left(\frac{1}{1-p_2}\right){}^{n_2}\left(1-p_2\right){}^{n_2}-1\right)-\left(\frac{1}{1-p_2}\right){}^{n_2}\left(1-p_2\right){}^{n_2+1}\right)

The kurtosis:

\mathcal{K}=\frac{n_1 p_1 \left(1-p_1\right){}^{n_1-1} \, _4F_3\left(2,2,2,1-n_1;1,1,1;\frac{p_1}{p_1-1}\right)-\frac{n_2 p_2 \left(\left(1-p_2\right){}^{n_2} \, _4F_3\left(2,2,2,1-n_2;1,1,1;\frac{p_2}{p_2-1}\right)+n_2 \left(p_2-1\right) p_2 \left(\left(n_2^2-6 n_2+8\right) p_2^2+6 \left(n_2-2\right) p_2+4\right)\right)+n_1^4 \left(p_2-1\right) p_1^4-6 n_1^3 \left(1-p_1\right) \left(1-p_2\right) p_1^3-4 n_1^2 \left(2 p_1^2-3 p_1+1\right) \left(1-p_2\right) p_1^2+6 n_1 n_2 \left(1-p_1\right) \left(1-p_2\right){}^2 p_2 p_1}{p_2-1}}{\left(n_1^2 p_1^2 \left(\left(\frac{1}{1-p_1}\right){}^{n_1} \left(1-p_1\right){}^{n_1}-1\right)-n_1 p_1 \left(-\left(\frac{1}{1-p_1}\right){}^{n_1} \left(1-p_1\right){}^{n_1}+p_1 \left(\frac{1}{1-p_1}\right){}^{n_1} \left(1-p_1\right){}^{n_1}+2 n_2 p_2 \left(\left(\frac{1}{1-p_1}\right){}^{n_1} \left(1-p_1\right){}^{n_1} \left(\frac{1}{1-p_2}\right){}^{n_2} \left(1-p_2\right){}^{n_2}-1\right)\right)-n_2 p_2 \left(n_2 p_2 \left(\left(\frac{1}{1-p_2}\right){}^{n_2} \left(1-p_2\right){}^{n_2}-1\right)-\left(\frac{1}{1-p_2}\right){}^{n_2} \left(1-p_2\right){}^{n_2+1}\right)\right){}^2}

Difference between two binomial random variables (the Danish Mask Study)