I was recently confronted with the task of running a meta-analysis of a subject in which the various studies had reported (adjusted) measures of treatment efficacy on various continuous outcomes. This is one of the areas in which the data for meta-analysis comes not in the usual form of #events and #patients(N) , but as treatment effects and their associated standard errors. Not too uncommon examples include the effects of a given intervention on Blood Pressure, Cholesterol levels, Psychometric scales, Cox regression Hazard Ratios or Logistic Regression Odds Ratios etc. And then it hit me: for almost all of the studies I wanted to pool, I did have access at all to the actual data that I had to process!! For sure, there were treatment effects (actually hazard ratios,HRs, for my project) in the papers but the standard errors were not reported; furthermore, the information that was actually contained in the manuscripts (HRs, 95%CI and the p-value) was not the “real thing” but its approximation, rounded down to one (and sometimes two) significant digits.
I’m sure that others have run into this issue previously, but I have never seen a formal, discussion for handling this missing data problem. So here comes my take on this, delivered in a series of posts to keep the suspense factor up, before I provide some code for automating the task. But first, let’s delve a little bit on the two problem areas identified: the rounding error in the figures reported in the medical articles and the missing standard errors.
First, it should be noted that whenever we see a HR in print we should keep two things in mind:
- the reported HR is not the direct output of the statistical routine that did calculations, but is the result of exponentiating the actual output
- journals editors (and research investigators) put limits in the numbers of significant digits reported
So if we take the actual output of R/SAS/SPSS/Minitab etc as the big truth (let’s call it T for obvious reasons), what we are served is t (little truth), an exponentiated version of T rounded to d significant digits:
Hence, if we obtain a “big truth” of T = 0.1939207 (so that ), then the little truth (rounded to two significant digits) will be reported as 1.21. On the other hand, if T = -0.1601688 (ie. ) the little truth will be given as 0.85. Now, knowing that t=0.85 (and understanding that the chances of obtaining this figure exactly is infinitesimally small), one realizes that the exponentiated truth, , (not known except to the statistical authors of the paper) is greater or equal than and smaller than . Turning to the previous examples, seeing a printed (little t) HR of 1.21 we are certain that the exponentiated big truth is between 1.205 and 1.215 (non-inclusive).
Now, this may seem like an embarrassing situation in which we seem to know the approximate truth more exactly than the big truth, but this is not a unique state of affairs. In the measurement error literature, similar situations arise in the context of Berkson models in which the “truth” is equal to the error-corrupted measurement plus an additional error (Gaussian in the context of the classical Berkson model). However, our situation differs from the classical Berkson one because the error we are dealing with is definitely not Gaussian, being constrained to lie in the interval due to the constraints imposed by numerical error.
Nevertheless, one may adopt a Berkson-like measurement error model:
to describe the errors introduced by rounding (there exist other possibilities as well, including those using log scales to express the effects of rounding but there is no point in raising the level of complexity any further at this point. However, these models can be used to appease the mathematically inclined who would object that the bounds of the Uniform distribution are not treated symmetrically in the previous expression).
Having established one measurement error statistical model for rounding we now turn to the issue of the “missing standard error”. In the majority of the statistical models used in medical research, the 95% CI is computed according to the following Normal theory approximation formula:
where stands for the 97.5% quantile of the Normal distribution (equal to 1.96 at 2 significant digits!!) and is the missing standard error . So, unless the paper uses some way-out-there bootstrap/resampling/Monte Carlo procedure to establish the empirical distribution of the treatment effect (and this should be stated in the methods by the way, not hidden in an obscure sentence in the supplement!!), then one can be certain that this formula has been used.
Obviously, the 95% CI itself is rounded down to d significant digits, so that only the little truths and not the big truths are reported. Hence, to estimate the missing standard error from the 95% CI and the reported HR while accounting for the effects of rounding, one has to tackle the following three variable regression model subject to measurement errors:
This is a non-standard and highly non-linear regression model with measurement errors and as such is not covered by the textbooks e.g. Fuller’s classic or Carroll, Ruppert and Stefanski’s more recent book on non-linear models.
Nevertheless, this appears to be a trivial regression problem to handle from a Bayesian perspective and a generic code for the estimation of the standard error can easily be programmed in any of the “Bugs” languages (WinBUGS, OpenBUGS or JAGS). Pending any comments on the mathematical/statistical aspects of my approach, I will present the Bayesian solution and the code in a subsequent post in 2013, so…
HAPPY NEW YEAR BLOGGERS!!