Extracting standard errors and treatment effects from medical journal tables (powered by R)

I decided to start blogging the R code used for some of my statistical posts, so I will start with the meta-analysis posts and move on to more difficult stuff.

As stated previously (here and here) the problem is to convert the reported relative risks(RR, t), 95% confidence interval (t_L, t_U) and p-value (p_v) into estimates for the log-relative risk ratio and its associated standard error for down-stream use (usually meta-analysis). Medical journals are in the bad habit of exponentiating (and rounding) the output of statistical software so that one needs to manipulate the reported estimates in order to recover the output of the statistical software. The simplest approach is one based on algebraic manipulations and disregards the finite precision of the medical journal table, treating the reported figures as exact (i.e. 1.50 is exactly 1.50 not a number between 1.495-1.504):

  1. First we obtain an estimate for t from the reported RR: t_1 = \log(t)
  2. Secondly we transform the confidence interval to estimates for t, se :
    1. t_2 = \log(\sqrt{H_U \times H_L})
    2. se_1 = \log(\sqrt{\frac{H_U}{H_L}}) \times (q_{0.975})^{-1}
  3. Averaging over t_1, t_2 yields t=0.5 \times(t_1+t_2)
  4. A second estimate for the se is obtained from the p-value using the quantile function q_N(x) of the normal distribution: se_2 = \vert{ \frac{t}{q_N(p_v)}}\vert
  5. Averaging over se_1, se_2 yields = se = 0.5\times(se_1+se_2)

The following R function can be used to carry out these transformations:

algSTE<-function(X,L,U,p)
{
	v1<-c(log(X),log(sqrt(U*L)))
	alg.X<-mean(v1[is.finite(v1)])
	v2<-c(log(sqrt(U/L))/qnorm(0.975),abs(alg.X/qnorm(p/2)))
	alg.Y<-mean(v2[is.finite(v2)])
	return(list("TE"=alg.X,"se"=alg.Y))
}

A potentially more accurate approach is to acknowledge the finite precision inherent in the medical journal table for all quantities of interest (treatment effect, 95% confidence intervals and p-values). In a previous post I gave a Bayesian solution to this problem which obeys the d precision of the treatment effect and confidence interval and the d_p precision of the p-value. The R code for the Bayesian algorithm is the following:

</pre>
## function that extracts treatment effect (usually in log scale) and
## standard error from treatment effect (usually exponentiated log-odds ratios
## or log-relative risks), associated pci% CI and p-value rounded to
## d (TE, 95% CI) and dp (p-value) significant digits.
## This is based on a statistical simulation algorithm that draws
## N samples from the posterior distribution of these parameters
## followed by a rejection step. Due to the latter, the final number
## of samples can be <<N which may influence the accuracy of the results
## The algorithm monitors the acceptance probability and increases the
## number of points samples  (up to N*Nmax) to ensure that N points are
## actually returned for both SE and Treament Effect

##   X:  exponentiated treatment effect
## L,U: lower and upper limits of the pci (usually 95%) CI
##   p: p-value
##   d: precision of X,L,U
##  dp: precision of p value
##   N: number of Monte Carlo samples

extractTSE<-function(X,L,U,p,d,dp,pci=0.95, N=1000000,Nmax=10)
{

## Compute bounds of belief functions
 a<-5*(10^(-d-1));
 ap<-5*(10^(-dp-1));
 Xl<-X-5*(10^(-d-1));
 Xu<-X+5*(10^(-d-1));
 Ll<-L-5*(10^(-d-1)); ## cannot have negative ests for pos quant
 Lu<-L+5*(10^(-d-1));
 Ul<-U-5*(10^(-d-1));
 Uu<-U+5*(10^(-d-1));
 pn<-(p-5*(10^(-dp-1)))/2
 pp<-(p+5*(10^(-dp-1)))/2
 pn<-max(0,pn)
 pp<-min(1,pp)
 pl<-abs(qnorm(pn));
 pu<-abs(qnorm(pp));
 pci<-abs(qnorm((1-pci)/2));
 ret<-list("TE"=rep(0,N),"se"=rep(0,N),"prej"=0,"Nact"=0,"N"=0)
 Nest<-0
 Nsample<-0 ## how many samples were they obtained?
 Nget<-N
 Nmax<-Nmax*N ## maximum number of estimates to obtain
 while(Nest<N & Nsample <= Nmax & Nget>0) {

ret0<-simTSE(Nget,Xl,Xu,Ll,Ul,Lu,Uu,pl,pu,pci)
 ## take as much as needed to fill the N slots
 ntake<-min((N-Nest),ret0$Nact)
 ret$TE[(Nest+1):(Nest+ntake)]<-ret0$TE[1:ntake]
 ret$se[(Nest+1):(Nest+ntake)]<-ret0$se[1:ntake]
 Nest<-Nest+ntake
 Nsample<-Nsample+Nget
 Nget<-min(N,floor((N-Nest)/ret0$prej)) ## only obtain as many as anticipated
 }
 ret$N<-Nsample;ret$Nact<-Nest;ret$prej<-ret$Nact/ret$N
 ## now filter out the slots of N that were never sampled
 ret$TE<-head(ret$TE,ret$Nact)
 ret$se<-head(ret$se,ret$Nact)
 ret
}

simTSE<-function(N,Xl,Xu,Ll,Ul,Lu,Uu,pl,pu,pci)
{
 l<-runif(N,Ll,Lu); ## belief about lower limit of CI
 u<-runif(N,Ul,Uu); ## belief about upper limit of CI
 x1<-sqrt(u*l);
 y1<-sqrt(u/l);
 x1.t<-log(x1)
 y1.t<-log(y1)/pci;
 z<-abs(x1.t/y1.t);
 ## select only points satisfying the point estimate constraint
 sel<-I(x1>=Xl)*I(x1<=Xu)
 if(!is.finite(pl) ) { zlsel<-rep(1,length(z)) } else { zlsel<-I(z<=pl)}
 if(!is.finite(pu) ) { zusel<-rep(1,length(z)) } else { zusel<-I(z>=pu)}
 psel<-(zusel*zlsel)==1
 ## select points satisfying both criteria
 selAll<-(psel*sel)==1
 Nact<-sum(selAll,na.rm=T)
 ## the reason for the next test is to remove impossible
 ## sampled values i.e. negative values for the sampled values
 ## from the joint distro of the CI or the RR itself
 selAll<-selAll & !is.na(selAll)
 return(list("TE"=x1.t[selAll],"se"=y1.t[selAll],"prej"=Nact/N,"Nact"=Nact,"N"=N))
}
<pre>

The implementation is based around two functions with the user only needing to call the first one. Note that for some combinations of input values, a large number of random numbers (up to Nmax*N) have to be generated in order for the Monte Carlo algorithm to retain a certain level of precision (controlled by the number of samples N).

Let’s test these two approaches from within R now in a toy example:


> d<-2 ## precision TE, 95% CI (all exponentiated) to
> dp<-3 ## precision of p-value
> mn<-0.2209039 ## true treatment effect (log scale)
> se<-0.2761299 ## true SE
>
> X<-round(exp(mn),d);L<-round(exp(mn-qnorm(0.975)*se),d);U<-round(exp(mn+qnorm(0.975)*se),d)
> p<-round(2*(1-pnorm(abs(mn/se))),dp)
>
> X ## rounded/exponentiated treatment effect
[1] 1.25
> L ## lower 95% CI (rounded/exponentiated)
[1] 0.73
> U ## upper 95% CI (rounded/exponentiated)
[1] 2.14
> p
[1] 0.424
>
> moko<-extractTSE(X,L,U,p,d,dp,N=10000)
> mean(moko$TE) ## Bayesian TE
[1] 0.2204252
> mean(moko$se) ## Bayesian SE
[1] 0.2757051
>
> moko2<-algSTE(X,L,U,p)
> moko2$TE ## algebraic TE (log scale)
[1] 0.2230955
> moko2$se ## algebraic se
[1] 0.2767075

The performance of the two approaches appears similar in this particular case!

However as we show here there will be occasions which the simple(-minded) algebraic approach will fail and the (big) Bayesian guns will be required.  Posting the source code will ensure that the Bayesian procedure becomes available to everyone who needs to carry out a meta-analysis!!

 

Advertisements

Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: