## Extracting standard errors and effect estimates for meta-analysis: paging Rev Bayes

After a very long leave of absence I return to the issue of extracting the effect estimate ($T$) and standard error ($se$) from reported and (rounded to a fixed number of decimal points) relative risk ($t$), limits of 95% confidence intervals ($t_L$ and $t_U$) and p-value ($p_v$) figures found in scientific publications. This solution is a Bayesian one, requiring nothing more than a straightforward application of the Bayes theorem for the posterior distribution of the A straightforward application of Bayes theorem for the quantities $T, se$ given the $t, t_L, t_U, p_v$:

$P(T,se|t, t_L, t_U, p_v) \propto P(t|T,se,t_L, t_U, p_v) \times P(T,se|t_L, t_U, p_v)$

However $P(t|T,se,t_L, t_U, p_v) = P(t|T) = \delta_{t,T_d}$, because the rounding operation only depends on the value to be rounded (thus dropping conditioning on all other variables but $T$). Since rounding to $d$ digits is a deterministic operation, admitting only one value ($T_d$) for a specific $T$, the conditional probability $P(t|T)$ is essentially a point mass function at $T_d$ given by the Kronecker $\delta$ function.
When viewed as a parametric function of $T$, this conditional probability will be non-zero in the interval $\left(-\frac{5}{10^{d+1}}+t\:,\frac{5}{10^{d+1}}+t\right)$ and zero otherwise.

Further application of the Bayes theorem yields:

$P(T,se|t_L, t_U, p_v) \propto P(p_v|T,se,t_L, t_U) \times P(t,se|t_L,t_U)$

The last factor in the right hand side of the formula is simply the (suitably transformed) joint probability function $X,Y$ obtained from the pair $H_L, H_U$ as discussed in the previous post . More specifically, it is the probability of $T,se$ obtained by taking the logarithm of $X$ and the ratio of $\log(Y)/q_{0.975}$ conditioning on the pair $H_L, H_U$.

On the other hand, p-value calculations are based on the ratio $z= T/se$, allowing us to drop conditioning of $P(p_v|T,se,t_L, t_U)$ on $t_L,t_U$. Applying the same line of reasoning as in the case of $P(t|T)$ the conditional probability $P(p_v|T,se)$ is mutatis mutandis equal to one when $T/se$ is in the interval $\left(-\frac{5}{10^{d+1}}+p_v\:,\frac{5}{10^{d+1}}+p_v\right)$ and zero otherwise.

To compute the Bayesian estimates of $T,se$ we can  resort to a rejection type of algorithm:

1. Simulate a pair of values from the joint density $p(X,Y|H_L,H_U)$
2. Transform $X,Y$ to $T,se$
3. Reject $T,se$ if $T$ is not in the interval $\left(-\frac{5}{10^{d+1}}+t\:,\frac{5}{10^{d+1}}+t\right)$
4. Reject $T,se$ if the corresponding p-value $2 \times (1-CDF(|T/se|))$ is not in the interval $\left(-\frac{5}{10^{d+1}}+p_v\:,\frac{5}{10^{d+1}}+p_v\right)$

Steps 1,2 ensure that the $T,se$ are compatible with the limits of the confidence interval, while steps 3,4 guarantee compatibility with the reported relative risk and p-value respectively. The non-rejected Monte Carlo samples can be summarized by various statistics (e.g. the mean or the median) to yield point estimates for meta-analytic applications.

Example

• True Values: $T=0.403876 , se=0.11345$
• Rounding (to 2 figures):  $t=1.50,t_L = 1.20,t_U=1.87$ ,$p_v=0.00$
• Result of steps 1,2 are shown in the figure below (the blue cross corresponds to the true values of $T,se$:
• Rejecting the pairs that are not compatible with the RR point estimate (step 3) yields the following two dimensional density:
• Applying the criterion in step 4 does not reject further samples in this particular example so the corresponding image is virtually indistinguishable from the previous one:\

Point estimates for the two log-relative risk and the standard error are: 0.4043738 and  0.1131314 respectively. These are in close agreement the true values of 0.403876 and 0.11345 !