## The probability that one random variable is smaller or larger than a Beta random variable

This super wonkish post will serve as convenient basket case for all the inglorious math that will be required for a series of more Evindence Based Medicine oriented posts. A result that will be repeatedly required in these posts is an expression for the probability that one random variable is smaller (or larger) than a Beta random variable. The necessity for this result is due to the ability of the Beta distribution to quantitate beliefs about the percentages or proportions of dichotomous outcomes , having observed $\alpha$ “successes” and $\beta$ “failures”.  So if one had just read about the efficacy  ($E$) of an intervention in a Randomized Controlled Trial (RCT), the Beta distribution would be a readily available candidate to summarize the uncertainty about the efficacy as $B(E|p\,N,(1-p)\,N)$ where $p$ is the proportion of responders and $N$ the number of study participants. In this context, a skeptic may not only adopt the justifiable position that the real world effectiveness of the intervention is less than the one reported in the RCT setting but also would even desire a numerical/probabilistic estimate of his scepticism. Similarly, the skeptic may want to use data about the adverse events in the aforementioned trial to give a lower bound on the adverse events that he or she will observe in everyday practice, arguing that the later rate should be larger than the one reported in the context of a controlled clinical experiment.  In both cases the need to calculate the probability that a “real world” proportion is is smaller or larger than the corresponding one seen in a RCT arises.

Starting from the premise that the real world effectiveness $E_{rw}$ is smaller than $E$, and given the value of “true” efficacy $E_{true}$, one could model one’s uncertainty about  $E_{rw}$ as $p(E_{rw}|E_{true}) = \frac{1}{E_{true}}$.  In other words, when we are given the true value of efficacy we adopt the most agnostic skeptical position possible: the real world effectiveness can be anywhere from zero (the intervention does not work at all!) up to the maximum possible effectiveness i.e. the trial efficacy.  Since the true efficacy is not known (otherwise we would not have performed the RCT!), one needs to average over the uncertainty of efficacy using standard expressions for the conditional, joint and marginal probability:

$p(E_{rw}|p\,N,(1-p)\,N) = \int_0^{1} p(E_{rw},E|p\,N,(1-p)\,N)\,\mathrm{d}E = \int_0^{1} p(E_{rw}|E) \times p(E|p\,N,(1-p)\,N)\,\mathrm{d}E = \int_p^{1} \frac{E^{p\,N} \times (1-E) ^{(1-P)\,N}}{E \times B(p\,N,(1-p)\,N)}\,\mathrm{d}E$

In the aforementioned expression $B(p\,N,(1-p)\,N)$ is the (complete) Beta function (not to be confused with the distribution of the same name) and we have substituted $E$ for $E_{true}$ .  Carrying out the integration (and using a touch of special function magic), we arrive at the following expression:

$p(E_{rw}|p\,N,(1-p)\,N) = \frac{N-1}{p\,N-1} \times (1-I_{E_{rw}}(p\,N-1,(1-p)\,N))$, where $I_z(\alpha,\beta)$ is the Regularized Incomplete Beta function (isn’t this a mouthful?). Although this result is an interesting in its own right, as it can be used as the basis for evidence synthesis of data from effectiveness and efficacy studies, its utility is limited by the need to access special function libraries for numerical evaluations. At a more practical note, if one could collapse this probability distribution into a single number, then one could directly compare this figure  to the percentage (or fraction of responders) in the RCT. In the case of the later, one would use the mean of the corresponding Beta distribution $\frac{p\,N}{p\,N+(1-p)\,N} = p$ to summarize results, so that the average of the effectiveness distribution naturally suggests itself for the same purpose.

To arrive at this expression, we compute the mean of the effectiveness distribution $p(E_{rw}|p\,N,(1-p)\,N)$ by direct integration. By using two integration formulas of the Regularized incomplete Beta function (1, 2) and a transformation formula of the Beta function to arrive at the (remarkably) simple result: $E_{rw} = \frac{p}{2}$.

Hence, the expected value of a random variable which is smaller than a Beta one is half the expected value of the beta.

To compute the expected value of a random variable which is larger than a Beta one, we resort to a trick: we reverse the definition of success and failure, in which case the previous result can be directly applied.