As a follow-up of our initial investigation of the bias of random effect predictions in generalized linear mixed regression models, I undertook a limited simulation experiment. In this experiment, we varied the population average complication rate and the associated standard deviation of the individual random effect and generated a small number of replications (N=200) for each combination of population mean and standard deviation. Limited by the computational resources of (the otherwise wonderful!) tablet, I had to forego a more thorough and more precise (larger number of replicates) examination of bias, variance and coverage. The focus is on general patterns rather than precise estimates; by making the code public, these can be obtained by anyone who has access to the R computing environment.

The basic code that simulates from these logistic regression models is shown below. Although we only examine small cluster sizes (number of observations per cluster 20-100) and a moderate number of individual random effects (200) it is straightforward to modify these aspects of the simulation study. For these simulations we examined a limited number of values for the standard deviation of the random effects (0.1, 0.2 and 0.5) and overall complication rates (0.05, 0.10, 0.15, 0.20) to reflect the potential range of values compatible with the raw data in the Surgeon Scorecard.

library(lme4) library(mgcv) ## helper functions logit<-function(x) log(x/(1-x)) invlogit<-function(x) exp(x)/(1+exp(x)) ## code allows a sensitivity analysis to the order of the nAGQ ## explanation of symbols/functions appearing inside the ## simulation function ## fit: glmer fit using a X point nAGQ integration rule ## ran: random effects using a X point nAGQ integration rule ## cR: correlation between fitted and simulated random effect ## the fitted were obtained with a X point nAGQ integrator ## int: intercept obtained with a X point nAGQ integration rule ## biG: bias of the fitted v.s. the simulated random effects ## this is obtained as the intercept of a GAM regression of ## the fitted versus the simulated values ## bi0: bias of the intercept from the glmer fit ## sdR: estimated standard deviation of the random eff distro ## bs0: bias of the standard deviation of the random effects (SDR) ## edf: non-linearity of the slope of a GAM regression of the ## fitted v.s. the simulated residuals ## slQ: derivative of the non-linear slope at the Qth quantile simFit<-function(X,complications,ind,pall,SDR,Q=0:10/10) { fit<-glmer(ev~1+(1|id),data=complications,family=binomial,nAGQ=X) ran<-ranef(fit)[["id"]][,1] ## predicted (mode) of random effects cR<-cor(ran,ind) int<-fixef(fit) ## regress on the residuals to assess performance fitg<-gam(ran~s(ind,bs="ts")) ## gam fit edf<-sum(fitg$edf[-1])## edf of the non-linear "slope" biG<-fitg$coef[1] bi0<-int-logit(pall) sdR<-sqrt(as.data.frame(VarCorr(fit))$vcov) bs0<-sdR-SDR sQ<-quantile(ind,prob=Q) slQ<-(predict(fitg,newdata=data.frame(ind=sQ+0.0001),type="terms")[,1]- predict(fitg,newdata=data.frame(ind=sQ),type="terms")[,1])/0.0001 names(slQ)<-paste("slQ",Q,sep="") ret<-cbind(int=int,edf=edf,biG=biG,bi0=bi0,bs0=bs0,sdR=sdR,cR=cR,t(slQ),pall=pall,SDR=SDR) } ## function to simulate cases simcase<-function(N,p) rbinom(N,1,p) ## simulation scenarios: fixed for each simulation Nsurgeon<-200; # number of surgeons Nmin<-20; # min number of surgeries per surgeon Nmax<-100; # max number of surgeries per surgeon Nsim<-sample(Nmin:Nmax,Nsurgeon,replace=TRUE); # number of cases per surgeon ## simulate individual surgical performance ## the reality consists of different combos of pall and the sd of the ## random effect Nscenariosim<-200 ## number of simulations per scenario scenarios<-expand.grid(iter=1:Nscenariosim,pall=seq(0.05,.20,.05), sd=c(0.1,0.2,.5)) ## simulate indivindual scenarios simScenario<-function(seed,pall,sd,X,Q) { set.seed(seed) ind<-rnorm(Nsurgeon,0,sd) ; # surgical random effects logitind<-logit(pall)+ind ; # convert to logits pind<-invlogit(logitind); # convert to probabilities complications<-data.frame(ev=do.call(c,mapply(simcase,Nsim,pind,SIMPLIFY=TRUE)), id=factor(do.call(c,mapply(function(i,N) rep(i,N),1:Nsurgeon,Nsim)))) simFit(X,complications,ind,pall,sd,Q) } X<-0 Q=0:10/10 system.time(perf<-mapply(simScenario,scenarios$iter,scenarios$pall,scenarios$sd, MoreArgs=list(X=X,Q=Q),SIMPLIFY=FALSE)) perf<-do.call(rbind,perf)

The datasets are generated by the function simScenario, which when applied over all combinations of population mean(pall, treated as a fixed effect) and random effect standard deviation (sd) generates the synthetic dataset (‘complications’ variable). Once the data have been generated the function simFit receives the synthetic dataset, fits the mixed logistic regression model and generates random effect predictions.

To assess and quantify shrinkage in these regression models, we compare the predicted random effects against the simulated values. Due to the potentially large number of these effects (10s of thousands in the Surgeon Score Card), we developed a methodology that looks for patterns in these bivariate relationships, taking into account the magnitude of each simulated and predicted random effect pair. This methodology rests on flexible regression between the predicted against the simulated random effect in each of the synthetic datasets. This flexible regression, using smoothing splines, generalizes linear regression and thus avoids the assumption that the shrinkage (as assessed by the slope of the linear regression line) is the same irrespective of the value of the simulated random effect. By numerically differentiating the spline at various quantiles of the simulated random effect, we thus have an estimate of shrinkage. In particular, this estimate gives the change in value of the predicted random effect for a unit change in the simulated one. The relevant code that fits the smoothing spline and differentiates it in (user defined) grid of random effects quantiles is shown below:

</pre> <pre>fitg<-gam(ran~s(ind,bs="ts")) ## gam fit edf<-sum(fitg$edf[-1])## edf of the non-linear "slope" biG<-fitg$coef[1] bi0<-int-logit(pall) sdR<-sqrt(as.data.frame(VarCorr(fit))$vcov) bs0<-sdR-SDR sQ<-quantile(ind,prob=Q) slQ<-(predict(fitg,newdata=data.frame(ind=sQ+0.0001),type="terms")[,1]- predict(fitg,newdata=data.frame(ind=sQ),type="terms")[,1])/0.0001 names(slQ)<-paste("slQ",Q,sep="")

A representative smoothing spline analysis of a mixed logistic regression model is shown in the figure below, along with the corresponding linear fit, and the line signifying zero shrinkage: a straight line passing from the origin with a unit slope. If the latter relationship is observed, then the predicted random effects are unbiased with respect to the simulated ones.

This figure shows the potential for random effect predictions to be excessively shrunken relative to the simulated ones (note that the blue line is more horizontal relative to the blue one). One should also observe that the amount of shrinkage is not the same throughout the range of the random effects: positive values (corresponding to individuals with higher complication rates) are shrank relatively less (the green line is closer to the blue line), relative to negative random effects (complication rates less than the population average). **The net effect of this non-linear relationship is that the separation between “good” (negative RE) and “bad” (positive values of the RE) is decreased in the predicted relative to the simulated random effects. This can be a rather drastic compression in dynamic range for small cluster sizes as we will see below. **

From each mixed logistic regression model we obtain a large number of information : i) the bias in the fixed effect estimate, ii) the bias in the population standard deviation, iii) the bias in the overall relationship between predicted and simulated residuals (indirectly assessing whether the random effects are centered around the fixed effect) iv) the non-linearity of the relationship between the predicted and the simulated random effects (this is given by the estimated degrees of freedom of the spline, with an edf = 1 signifying a linear fit and higher degrees of freedom non linear relationships; edfs that are less than one signify a more complex pattern of shrinkage of a variable proportion of the random effects to a single point, i.e. the fixed effect) v) the linear slope

There was no obvious bias in the estimation of the intercept (the only fixed effect in our simulation study) for various combinations of overall event rate and random effect standard deviation (but note higher dispersion for large standard deviation values, indicating the need for a higher number of simulation replicates in order to improve precision):

Similarly there was no bias in the estimation of the population standard deviation of the random effects:

The estimated relationship between predicted and simulated residuals was in linear for approximately 50% of simulated samples for small to moderate standard deviations. However it was non-linear (implying differential shrinkage according to random effect magnitude) for larger standard deviations, typical of the standard deviation of the surgeon random effect in the Surgeon Score Card.

The magnitude of the slopes of the smoothing splines at different quartiles of the random effects distribution and for combinations of overall rate and random effect standard deviation are shown below:

There are several things to note in this figure:

- the amount of shrinkage decreases as the population standard deviation increases (going from left to right in each row of the figure the slopes increase from zero towards one)
- the amount of shrinkage decreases as the overall average increases for the same value of the standard deviation (going from top to bottom in the same column)
- the degree of shrinkage appears to be the same across the range of random effect quantiles for small to moderate values of the population standard deviation
- the discrepancy between the degree of shrinkage is maximized for larger values of the standard deviation of the random effects and small overall rates (top right subpanel). This is the situation that is more relevant for the Surgeon Score Card based on the analyses reported by this project.

What are the practical implications of these observations for the individual surgeon comparison reported in the Score Card project? These are best understood by a numerical example from one of these 200 hundred datasets shown in the top right subpanel. To understand this example one should note that the individual random effects have the interpretation of (log-) odds ratios, irrespective of whether they are predicted or (the simulated) true effects. Hence, the difference in these random effects when exponentiated yield the odds ratio of suffering a complication in the hands of a good relative to a bad surgeon. By comparing these random effects for good and bad surgeons who are equally bad (or good) relative to the mean (symmetric quantiles around the median), one can get an idea of the impact of using the predicted random effects to carry out individual comparisons.

Good |
Bad |
Quantile (Good) |
Quantile (Bad) |
True OR |
Pred OR |
Shrinkage Factor |

-0.050 | 0.050 | 48.0 | 52.0 | 0.905 | 0.959 | 1.06 |

-0.100 | 0.100 | 46.0 | 54.0 | 0.819 | 0.920 | 1.12 |

-0.150 | 0.150 | 44.0 | 56.0 | 0.741 | 0.883 | 1.19 |

-0.200 | 0.200 | 42.1 | 57.9 | 0.670 | 0.847 | 1.26 |

-0.250 | 0.250 | 40.1 | 59.9 | 0.607 | 0.813 | 1.34 |

-0.300 | 0.300 | 38.2 | 61.8 | 0.549 | 0.780 | 1.42 |

-0.350 | 0.350 | 36.3 | 63.7 | 0.497 | 0.749 | 1.51 |

-0.400 | 0.400 | 34.5 | 65.5 | 0.449 | 0.719 | 1.60 |

-0.450 | 0.450 | 32.6 | 67.4 | 0.407 | 0.690 | 1.70 |

-0.500 | 0.500 | 30.9 | 69.1 | 0.368 | 0.662 | 1.80 |

-0.550 | 0.550 | 29.1 | 70.9 | 0.333 | 0.635 | 1.91 |

-0.600 | 0.600 | 27.4 | 72.6 | 0.301 | 0.609 | 2.02 |

-0.650 | 0.650 | 25.8 | 74.2 | 0.273 | 0.583 | 2.14 |

-0.700 | 0.700 | 24.2 | 75.8 | 0.247 | 0.558 | 2.26 |

-0.750 | 0.750 | 22.7 | 77.3 | 0.223 | 0.534 | 2.39 |

-0.800 | 0.800 | 21.2 | 78.8 | 0.202 | 0.511 | 2.53 |

-0.850 | 0.850 | 19.8 | 80.2 | 0.183 | 0.489 | 2.68 |

-0.900 | 0.900 | 18.4 | 81.6 | 0.165 | 0.467 | 2.83 |

-0.950 | 0.950 | 17.1 | 82.9 | 0.150 | 0.447 | 2.99 |

-1.000 | 1.000 | 15.9 | 84.1 | 0.135 | 0.427 | 3.15 |

-1.050 | 1.050 | 14.7 | 85.3 | 0.122 | 0.408 | 3.33 |

-1.100 | 1.100 | 13.6 | 86.4 | 0.111 | 0.390 | 3.52 |

-1.150 | 1.150 | 12.5 | 87.5 | 0.100 | 0.372 | 3.71 |

-1.200 | 1.200 | 11.5 | 88.5 | 0.091 | 0.356 | 3.92 |

-1.250 | 1.250 | 10.6 | 89.4 | 0.082 | 0.340 | 4.14 |

-1.300 | 1.300 | 9.7 | 90.3 | 0.074 | 0.325 | 4.37 |

-1.350 | 1.350 | 8.9 | 91.1 | 0.067 | 0.310 | 4.62 |

-1.400 | 1.400 | 8.1 | 91.9 | 0.061 | 0.297 | 4.88 |

-1.450 | 1.450 | 7.4 | 92.6 | 0.055 | 0.283 | 5.15 |

-1.500 | 1.500 | 6.7 | 93.3 | 0.050 | 0.271 | 5.44 |

-1.550 | 1.550 | 6.1 | 93.9 | 0.045 | 0.259 | 5.74 |

-1.600 | 1.600 | 5.5 | 94.5 | 0.041 | 0.247 | 6.07 |

-1.650 | 1.650 | 4.9 | 95.1 | 0.037 | 0.236 | 6.41 |

-1.700 | 1.700 | 4.5 | 95.5 | 0.033 | 0.226 | 6.77 |

-1.750 | 1.750 | 4.0 | 96.0 | 0.030 | 0.216 | 7.14 |

-1.800 | 1.800 | 3.6 | 96.4 | 0.027 | 0.206 | 7.55 |

-1.850 | 1.850 | 3.2 | 96.8 | 0.025 | 0.197 | 7.97 |

-1.900 | 1.900 | 2.9 | 97.1 | 0.022 | 0.188 | 8.42 |

-1.950 | 1.950 | 2.6 | 97.4 | 0.020 | 0.180 | 8.89 |

-2.000 | 2.000 | 2.3 | 97.7 | 0.018 | 0.172 | 9.39 |

-2.050 | 2.050 | 2.0 | 98.0 | 0.017 | 0.164 | 9.91 |

From these table it can be seen that predicted odds ratios are always larger than the true ones. The ratio of these odds ratios (the shrinkage factor) is larger, the more extreme comparisons are contemplated.

In summary, the use of the random effects models for the small cluster sizes (number of observations per surgeon) is likely to lead to estimates (or rather predictions) of individual effects that are smaller than their true values. Even though one should expect the differences to decrease with larger cluster sizes, this is unlikely to be observedin real world datasets (how often does one come across a surgeon who has performed 1000s of operation of the same type before they retire?). Furthermore, the comparison of surgeon performance based on predicted random effects is likely to be misleading due to over-shrinkage.