I was recently revising a paper concerning statistical simulations of hemodialysis trials, in which I examine the effects of different technical aspects of the dialysis prescription at the population level. I had used the reported figures from a number of recent high profile papers, when I noticed that while the results were right on average, there was a substantial number of outliers, i.e. “digital patients” who would actually not be among the living if they were to be dialyzed with these parameters in the real world.After troubleshooting it became apparent that I’d overlooked an important restriction that we use to prescribe dialysis in the real world; even though the specifics are too technical to go over here, the general simulation problem is the following:
- Simulate pairs of such that the ratio of be in a narrow interval
- This is a rather straightforward problem to deal with, if one is given the bivariate distribution of , because one can simply sample from that distribution directly
- We are only given the marginal distributions (or rather the mean and variance) of , not the ratio constraint that the data obeyed
Simulating from the marginals will result in pairs of that marginally have the “correct” mean and variance, yet many of these pairs may violate the unknown ratio constraint. Had be given the ratio constraint, we could employ a standard rejection sampling scheme, independently simulating and then rejecting all pairs violating the condition:
So the problem is to come up with an estimator of the ratio ; one such estimator could be the expected value of the ratio of . Therefore, we need to find a way to compute the estimator using only the marginal statistics provided. Firing off the calculus cannon the initial order approximations are:
- (first order)
- (second order)
A first order estimator of the variance is given by:
As clinical papers never report data about the covariance of any quantity, even the second order approximation is impossible to compute; nevertheless if all quantities are positive and the covariance of is nill (or very close!) then the second order approximation is in fact computable and can be used to constraint the simulation.
Higher order approximations (which however need higher moment data, which are never reported) can be found here.