Where has evidence based medicine taken us in 20 years?

January 10, 2014

This one of the best appraisals of evidence based medicine(EBM):

http://www.kevinmd.com/blog/2013/06/evidence-based-medicine-20-years.html

It highlights one important pitfall of EBM, ie its failure to integrate scientific(biological) plausibility in the pyramid of evidence.

I think the best way to effect a synthesis between evidence from clinical data and biological plausibility is through a Bayesian argument:

- Decide on a standard EBM approach to synthesize the clinical data regarding a hypothesis from multiple sources eg a meta-analysis
- Encode the non-clinical, background knowledge into a prior. This prior encodes the scientific plausibility of the hypothesis under question

A Bayesian analysis of the clinical data under a scientific plausibility prior provides a transparent way to leverage the pyramid evidence of EBM while providing a basic science/disease understanding context for the interpretation of clinical data.

Of means, standard deviations and blood pressure trials

December 26, 2013

Last week, the long awaited Joint National Commission (JNC) released their 8th revision of their hypertension management guidelines (more than 10 years after the 7th version!). JNC 8 will join the plethora of guidelines (ASH/ISH, ESH, KDIGO and the ACC/AC advisory that will be upgraded to full guideline set later on) that may be used (or not) to justify (or defend) treating (or not treating) patients with elevated blood pressure (BP) down to pre-approved, committee blessed targets. Though it is not my intention to go into the guidelines (since opinions are like GI tracts: everyone has one, whereas biases are like kidneys: everyone has at least one and usually two), it is interesting to seize this opportunity and revisit one of the largest blood pressure trials that is used to defend certain aspects of the guidelines recommendations.

The Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT) is one of the largest hypertension trials and was all the rage 12 years ago when failing to meet its primary endpoint i.e. the superiority of various first line antihypertensive agents, it reported differences in various secondary outcomes (including blood pressure control) favoring the oldest (chlorothalidone) v.s. the newest (lisinopril/amlodipine) medication. Since 2002 all agents used in ALLHAT have gone generic (and thus have the same direct costs), diluting the economic rationale for choosing one therapy v.s. the other. Nevertheless the literature reverbated for years (2003 , 2007 v.s. 2009) with intense debates about what ALLHAT showed (or didn’t show). In particular the question of the blood pressure differences observed among the treatment arms was invoked as one of the explanations for the diverging secondary outcomes in ALLHAT. In its simplest form this argument says that the differences of 0.8 (amlodipine)/2 (lisinopril) mmHg of systolic BP over chlorothalidone may account for some of the differences in outcomes.

I was exposed to this argument when I was an internal medicine resident (more than a decade ago) and it did not really make a lot of sense to me: could I be raising the risk for cardiovascular disease, stroke, and heart failure by 10-19% of the patient I was about to see in my next clinic appointment by not going the extra mile to bring his blood pressure by 2 mmHg? Was my medical license, like the 00 in 007′s designation,  a license to kill (rather than heal)? Even after others explained that the figures reported in the paper referred to the difference in the average blood pressure between the treatment arms, not the difference of blood pressure within each patient I could not get to my self to understand the argument. So more than a decade after the publication of ALLHAT I revisited the paper and tried to infer what the actual blood pressures may have had been (no access to individual blood pressure data) during the five years of the study. The relevant information may be found in Figure 3 (with a hidden bonus in Figure 1) of the JAMA paper which are reproduced in a slightly different form below (for copyright reasons):

BPdiffs

The figure shows the Systolic (SBP) and Diastolic (DBP) pressures of the participants who continued to be followed up during the ALLHAT study; unsurprisingly for a randomized study there were almost no differences before the study (T=0). Differences did emerge during follow-up and in fact the mean/average (“dot”) in the graph was lower for chlorothalidone v.s. both amlodipine and lisinopril during the study. However, the standard deviation (the dispersion of the individuals receiving the study drugs around the mean: shown as the height of vertical line) was also larger (by about 17%) for lisinopril suggesting that there both more people with higher and lower blood pressures compared to chlorothalidone.

This pattern is also evident if one bothers to take a look at the flowchart of the study (Figure 1) which lists patterns/reasons for discontinuation during the 5th year. Among the many reasons, “Blood Pressure Elevation” and “Blood Pressure Too Low” are listed:

Chlorothalidone Amlodipine Lisinopril
N (total)
8083 4821 5004
BP High 84 38 131
BP Too Low 91 51 76
Other reasons for discontinuation
1698 963 1192

A garden variety logistic regression for the odds of discontinuation shows there is no difference between amlodipine and chlorothalidone due to high (p=0.16) or low (p=0.74) BP. On the other hand, the comparison betweeen lisinopril and chlorothalidone is more interesting:

  • Odds of discontinuation due to low BP: 1.44 95%CI: 1.00-2.06 (p=0.044)
  • Odds of discontinuation due to high BP: 3.38 95%CI: 2.35-4.87 (p<0.001)

So despite the higher mean, the higher standard deviation implies that there more patients with low (and too low) BP among the recipients of lisinopril. In other words, blood pressure control was more variable with lisinopril compared to the other two drugs: this variability can be quantified by using the reported means/standard deviations to look at the cumulative percentage of patients with BP below a given cutoff (for simplicity we base the calculations on the year 3 data):

BPcum

So it appears that lisinopril (at least as used in ALLHAT) was able to control more patients to < 120/70 (which are low levels based on the current guidelines), but fewer patients at the higher end of the BP spectrum. The clinical translation of this observation is that there will be patients who will be ideal candidates for lisinopril (maybe to the point where dose reduction is necessary) and others who will fail to respond, so that individualization of therapy, rather than one size fits all is warranted. Such individualization may be achieved either on the basis of short shared physician/patient decision making, n of 1 trials, biomarker levels (e.g. home blood pressure measurements) or demographic profiling (as is done in JNC8 for African American patients).

Notwithstanding these comments, one is left scratching one’s head with the following questions:

  • who were the patients with an exaggerated and dampened out response to lisinopril in ALLHAT
  • could the variability in BP control provide a much better explanation for the variability in secondary outcomes in ALLHAT? (the investigators did apply what is known as a time-updated analysis using the observed BPs during the trial, but this is not the statistically proper way to analyze this effect in the presence of loss-to-follow up and possibly informative censoring)
  • what are the clinical implications of lowering BP to a given level when this is done with different classes of agents? This question is related to the previous one and both are not unaswerable with time-updated models of endogenous (such as BP readings) variables

At a more abstract level, should be scrutinize paper tables for the means as well as the standard deviations of response variables looking for hidden patterns that may not evident at a first look? In the clinic one is impressed with the variability of the patient responses to interventions, yet this variability is passed over when analyzing, reporting and discussing trial results in which we only look at the means. To me this is seems a rather deep inconsistency between our behaviours and discourse with our Clinician v.s. our Evidence Basis Medicine hats on, which may even decrease the chances of finding efficacious and effective treatments. Last but certainly not least, how can be begin to acknowledge variability in trial design, execution, analysis and reporting so as to better reflect what actually happens in the physical world, rather than the ivory towers of our statistical simulators?

 

Even with non-informative priors, the Bayesian approach is preferable to frequentist linear regression

December 2, 2013

A unique aspect of the Bayesian approach is that it allows one to integrate previous knowledge (“prior beliefs“) with current data (“likelihood“). Yet even in those cases in which non-informative priors are used (as in here and here) , the Bayesian approach is preferable due to its conservative nature. Read the rest of this entry »

The little non-informative prior that could (be informative)

November 26, 2013

Christian Robert reviewed on line a paper that was critical of non-informative priors. Among the points that were discussed by him and other contributors (e.g. Keith O’Rourke), was the issue of induced priors, i.e. priors which arise from a transformation of original parameters, or of observables. I found this exchange interesting because I did something similar when revisiting an old research project that had been collecting digital dust in my hard disk. The specific problem had to do with analysis of a biomarker that was measured with a qualitative technique yielding a binary classification of measurements as present or absent, in two experimental conditions (call them A and B). Ignoring some technical aspects of the study design, the goal was to calculate the odds ratio of the biomarker being expressed in condition B v.s A (the reference state signifying absence of disease).

When writing the programs for the analysis, I defaulted to the N(0.0,1.0E-6) prior that epitomizes non-informativeness in BUGS. However, one of my co-authors asked the “What the @#$%& does this prior mean?” question. And then we stopped … and reflected on what we were about to do. You see, before the experiment started we had absolutely no prior information about the behaviour of the biomarker in either experimental state so that we did not want to commit one way or another. In other words, Laplace’s original uniform (or Beta(1,1)) prior would have been reasonable if the expression data for  A and B were to be analyzed separately. However, we wanted to analyze the data with a logistic regression model, so was the ubiquitous N(0.0,1.0E-6) the prior we were after?

The answer is a loud NO! According to Wikipedia, the mother of all knowledge, the logistic transformation of a uniform variate is the logistic distribution with location of zero and scale of 1. Hence, the prior on the intercept of the logistic regression (interpretable as the odds of the biomarker being expressed in state A) had to be a Logistic(0,1).

UAsLogistic

Surprisingly the Odds Ratio of B v.s. A was found (after trial and error and method of moments considerations) to be very well approximated by a 1:1 mixture of a logistic and a Gaussian which clearly departs from the N(0.0,1.0-6) prior we (almost) used:

ORs

Bottom line: Even informative (in the BUGS sense!) priors can be pretty non-informative in some intuitively appropriate parameterization. Conversely, one could start with a non-informative prior in a parameterization that is easier to reason about and look for an induced prior (using analytic considerations or even simulations) to convert it to a parameterization that is more appropriate to the analytic plan at hand.

(R code for the plots and simulations is given below)

## approximating uniforms
logit<-function(x) log(x/(1-x))
set.seed(1234)
N<-10000000
s<-runif(N,0,1);
s2<-runif(N,0,1);
y<-logit(s)
y2<-logit(s2)
m<-mean(y)
s<-sd(y)
x<-seq(-10,10,.1)
## logistic is logit of a uniform
hist(y,prob=TRUE,breaks=50,main="intercept",
     xlab="logit(A)")
lines(x,dnorm(x,m,s),col="red")
lines(x,dlogis(x,0,1),col="blue")
legend(-15,0.20,legend=c("Normal(0,1)",
      "Logistic(0,1)"),lty=1,col=c("blue","red") )

## approximating the difference of two uniforms
hist(y-y2,prob=TRUE,ylim=c(0,.25),breaks=200,
     xlim=c(-10,10),main="OR between two U(0,1)",
     xlab="logit(B)-logit(A)")
## logistic approximation
lines(x,dlogis(x,0,sqrt(2)),col="blue",lwd=2)
## normal
lines(x,dnorm(x,0,(pi)*sqrt(2/3)),col="red",lwd=2)
## mixture of a logistic and a normal approximation
lines(x,0.5*(dlogis(x,0,sqrt(2))+
     dnorm(x,0,(pi)*sqrt(2/3))),col="green",lwd=2)
## legends
NL<-expression(paste("Normal(0,",pi*sqrt(2/3),")"))
LL<-expression(paste("Logistic(0,",sqrt(2),")"))
ML<-expression(paste("0.5 Normal(0,",pi*sqrt(2/3),")+0.5 Logistic(0,",sqrt(2),")"))
legend(-6.5,0.25,legend=c(NL,LL,ML),
       lty=1,col=c("blue","red","green") )

## does it extend to more general cases?
m1<--2;m2<-2;s1<-1;s2<-2.5;
l1<-rlogis(N,m1,s1)
l2<-rlogis(N,m2,s2)
d<-l1-l2
hist(d,prob=TRUE,ylim=c(0,0.25),breaks=200)
plot(density(d))
lines(x,dlogis(x,m1-m2,sqrt(s1^2+s2^2)),col="green",lwd=2)
lines(x,dnorm(x,m1-m2,pi*sqrt((s1^2+s2^2)/3)),col="red",lwd=2)
lines(x,0.5*(dnorm(x,m1-m2,pi*sqrt((s1^2+s2^2)/3))+
             dlogis(x,m1-m2,sqrt(s1^2+s2^2))),col="blue",lwd=2)


Edit (29/11/2013):
Updated the first image due to an accidental reversal of the distribution labels

Bayesian Linear Regression Analysis (with non-informative priors but without Monte Carlo) In R

November 24, 2013

Continuing the previous post concerning linear regression analysis with non-informative priors in R, I will show how to derive numerical summaries for the regression parameters without Monte Carlo integration. The theoretical background for this post is contained in Chapter 14 of Bayesian Data Analysis which should be consulted for more information.

The Residual Standard Deviation

The residual standard deviation \sigma is just the square root of the residual variance \sigma^2 which has a scaled inverse chi-square distribution given the data and the covariates: \sigma^2 \sim Scale-inv-\chi^2(\nu,s^2) with \nu=N-p and s^2 are the residual degrees of freedom and residual variance reported by the (frequentist) least square fit (R function lm). Possible point estimates for the residual standard deviation are the posterior mean, the mode and the median which can be obtained through (numerical) integration of the probability density function (PDF), maximization of the PDF and the quantile function respectively. The standard deviation may also be obtained via univariate numerical integration of the PDF. To obtain the latter, we apply a standard change of variables transformation to the scaled inverse chi-square PDF to obtain:

p(\sigma|y) = \frac{s^2 \nu/2}{\Gamma(\nu/2)}\times exp(-\frac{\nu s^2}{2 \sigma^2})\times \sigma^{-(n+2)}

where \Gamma(x) is the Gamma function. Comparing expressions, the Cumulative Density Function CDF of \sigma|y can be seen to be equal to the survival function of the gamma distributed random variable : 1-F_{\Gamma}(\sigma^{-2};\frac{\nu}{2},\frac{2}{\nu s^2}), where F_{\Gamma}(x;k,\theta) is the value, at x, of the CDF of the gamma variable with degrees of freedom and scale k, \theta respectively. The median (or any other quantile i.e. latex q$) can be obtained by solving the equation:

1-F_{\Gamma}(\sigma^{-2};\frac{\nu}{2},\frac{2}{\nu s^2}) = q

No closed form appears to exist for the quantiles (median, lower and upper limits of a symmetric credible interval, CrI) so that the equation above needs to be solved numerically. Since the marginal \sigma |y is a skewed distribution, the mean bounds (from above) both the median and the lower limit of thecredible (not to be confused with a confidence!) interval. Similarly, the upper limit of the CrI is bound by the corresponding limit of a Gaussian with mean and standard deviation equal to the values we previously obtained through numerical integration. These observations are used to restrict the range over which numerical optimization will be carried out, to avoid false convergence or outright failures.

The mode can be obtained by direct differentiation of the PDF and is given by the closed form expression: s\sqrt{\frac{n}{n+1}}

The Regression Coefficients

The marginal distribution of the entire coefficient vector \beta | y is a multivariate T distribution with location vector \hat{\beta} (obtained with lm) and scale matrix V_ {\beta}\sigma^2. Both these quantities may be obtained from the values returned by lm. To calculate numerical summaries for each of the components of this vector the other components need to be integrated out (marginalization). It is a basic property of the multivariate T distribution that its marginal  distributions are univariate t so that the following relation holds for each regression parameter:

\frac{\beta_i - \hat{\beta_i}}{s \sqrt{V_{\beta}[i,i]}} \sim t_{\nu}

Since this is a symmetric unimodal distribution, the mean, mode and median coincide these are all equal to the maximum likelihood estimates, while its standard deviation is available in closed form: \sqrt{\frac{n}{n-2}}. Since the quantiles univariate t are already available in R there is no need to write special code to obtain any of numerical summaries of the regression coefficients: one simply calculates the relevant quantities from the univariate t with degrees of freedom equal to \nu =n-p once, and translates/scales using the appropriate elements of the location vector and scale matrix.

The Gory Details

Two R functions are used to carry out the aforementioned tasks, i.e. one function that provides the integrand for the mean of the scaled inverse chi square distribution:

## integrand for the mean of the scaled inverse
##chisq2 with df=n, scale=t2
mn.scIX2.sqrt<-function(x,n,t2)
{
	s2<-x^2
	n.2<-n/2.0
	lx<-log(x)
	2.0*x*x*exp(-n*t2/(2.0*s2)-lgamma(n.2)+
	n.2*(log(t2)+log(n.2))-(n+2)*lx)
}

and a second (much longer!) function that receives the fitted lm object and calculates the numerical summaries for the regression coefficients and the residual standard deviation:

bayesfitAnal<-function(lmfit){
	## extract coefficients, dfs and variance
	## matrix from lm
    	QR<-lmfit$qr
    	df<-lmfit$df
    	R<-qr.R(QR) ## R component
    	coef<-lmfit$coef
    	Vb<-chol2inv(R) ## variance(unscaled)
    	s2<-(t(lmfit$residuals)%*%lmfit$residuals)
    	s2<-s2[1,1]/df
    	scale<-sqrt(s2*diag(Vb))

	## standard errors of the univariate t
	se=scale*ifelse(df>2,
	sqrt(df/(df-2)),
	ifelse(df<1,NA,Inf))
	## dataframe for returned values
    	ret<-data.frame(coef=coef,se=se,t=coef/se,
		mode=coef,median=coef,
		"CrI.2.5%"=qt(0.025,df=df),
		"CrI.97.5%"=qt(0.975,df=df))
	## CrI limits for t-distributed quantities
	ret[,6:7]<-ret[,6:7]*se+coef

	## make extra space for sigma
	ret<-rbind(ret,rep(0,7))
	rownames(ret)[3]<-"sigma"
	## mean of scale
	M1<-integrate(mn.scIX2.sqrt,n=df,
		t2=s2,lower=0,upper=Inf)$val
	## expectation of the square of the scale
	## (which is equal to the expectation of the
	## initial inverse chi square distribution
	S1<-ifelse(df<=2,NA,
		df*s2/(df-2))
	ret[3,1]<-M1	## mean
	ret[3,2]<-sqrt(S1-M1^2)	## sd
	ret[3,3]<-ret[3,1]/ret[3,2]	## t
	ret[3,4]<-sqrt(s2*df/(df+1)) ## mode
	## calculate quantiles - these take
	## place in the scale of the precision
	## median
	ret[3,5]<-uniroot(function(x) pgamma(x,
		shape=df/2,scale=2/(s2*df),
		lower.tail=FALSE)-0.5,
		lower=0,upper=1/s2)$root
	## lower limit of 95% CrI
	ret[3,6]<-uniroot(function(x) pgamma(x,
		shape=df/2,scale=2/(s2*df),
		lower.tail=FALSE)-0.025,lower=0,
		upper=1/(M1-3*ret[3,2])^2)$root
	## upper limit of 95% CrI
	ret[3,7]<-uniroot(function(x) pgamma(x,
		shape=df/2,scale=2/(s2*df),
		lower.tail=FALSE)-0.975,lower=0,
		upper=1/s2)$root
	## raise to -1/2 to change from
	## precision to standard deviations
	ret[3,5:7]<-sqrt(1/ret[3,5:7])

	ret
}

To see some action I will revisit the linear regression example introduced in the first post and compare frequentist (lm) and Bayesian estimates obtained with Monte Carlo and non-Monte Carlo approaches:

 summary(lmfit) ## Frequentist

Call:
lm(formula = weight ~ group)

Residuals:
    Min      1Q  Median      3Q     Max
-1.0710 -0.4938  0.0685  0.2462  1.3690

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   5.0320     0.2202  22.850 9.55e-15 ***
groupTrt     -0.3710     0.3114  -1.191    0.249
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared:  0.07308,   Adjusted R-squared:  0.02158
F-statistic: 1.419 on 1 and 18 DF,  p-value: 0.249

> t(apply(bf,2,Bayes.sum)) ## Bayesian (MC)
                  mean        se         t     median   CrI.2.5% CrI.97.5%
(Intercept)  5.0321144 0.2338693 21.516785  5.0326123  4.5709622 5.4947022
groupTrt    -0.3687563 0.3305785 -1.115488 -0.3706862 -1.0210569 0.2862806
sigma        0.7270983 0.1290384  5.634745  0.7095935  0.5270569 1.0299920
> bayesfitAnal(lmfit) ## Bayesian (non-MC)
                  coef        se         t       mode     median   CrI.2.5. CrI.97.5.
(Intercept)  5.0320000 0.2335761 21.543296  5.0320000  5.0320000  4.5412747 5.5227253
groupTrt    -0.3710000 0.3303265 -1.123131 -0.3710000 -0.3710000 -1.0649903 0.3229903
sigma        0.7271885 0.1295182  5.614566  0.6778158  0.7095623  0.5262021 1.0298379

The conservative, skeptical nature of the Bayesian inference (wider confidence intervals, due to proper acknowledgement of uncertainty) is evident no matter the numerical approach (Monte Carlo v.s. numerical integration) one uses for the inference. Although the numerical estimates of the regression coefficients agree (up to three decimal points) among the three numerical approaches, residual variance estimates don’t for this relatively small data set.

Monte Carlo estimates are almost as precise as the numerical integration for things like the mean and standard error of the estimated parameters, yet they lose precision for extreme quantilies. Monte Carlo is also slower, which may become an issue when fitting larger models.

That was all folks! In subsequent posts Iwill explore features of the Bayesian solution  that will (hopefully) convince you that it is a much better alternative (especially when it doesn’t bring tears to your eyes!) relative to the frequentist one. If you use any of the code in these posts, I’d appreciate if you can drop a line :)

Kicking ass with Bayesian Statistics in R

November 22, 2013

Some excellent R posts regarding Bayesian statistics:

1. How to program the Laplace approximation in R:

http://www.r-bloggers.com/easy-laplace-approximation-of-bayesian-models-in-r/

Though heavily dominated by Monte Carlo methods, Bayesian computation with the Laplace expansion is a nice tool to deploy in cases your MCMC fails to converge. Plus it makes one appreciate Laplace’s genius.

2. A bird’s eye view of R’s Bayesian analysis facilities:

http://blog.revolutionanalytics.com/2013/11/r-and-bayesian-statistics.html

Watch this blog for a series of posts about Bayesian survival analysis with R, BUGS and stan.

Bayesian linear regression analysis without tears (R)

November 17, 2013

Bayesian methods are sure to get some publicity after Vale Johnson’s PNAS paper regarding the use of Bayesian approaches to recalibrate p-value cutoffs from 0.05 to 0.005. Though the paper itself is bound to get some heat (see the discussion in Andrew Gelman’s blog and Matt Briggs’s fun-to-read deconstruction), the controversy might stimulate people to explore Bayesianism and (hopefully!) to move away from frequentist analyses. Read the rest of this entry »

Extracting standard errors and treatment effects from medical journal tables (powered by R)

November 10, 2013

I decided to start blogging the R code used for some of my statistical posts, so I will start with the meta-analysis posts and move on to more difficult stuff.

As stated previously (here and here) the problem is to convert the reported relative risks(RR, t), 95% confidence interval (t_L, t_U) and p-value (p_v) into estimates for the log-relative risk ratio and its associated standard error for down-stream use (usually meta-analysis). Medical journals are in the bad habit of exponentiating (and rounding) the output of statistical software so that one needs to manipulate the reported estimates in order to recover the output of the statistical software. Read the rest of this entry »

Page Rev Bayes – we found statistical irregularities in a randomized controlled trial

November 9, 2013

The Bayesian counterpart to the frequentist analysis of the Randomized Controlled Trial is in many aspects more straightforward than the Bayesian analysis. One starts with a prior probability about the probability of a patient being assigned to each of the three arms and combines it with the (multinomial) likelihood of observing a given assignment pattern in the 240 patients enrolled in the study. Bayes theorem gives the posterior probability quantifying our belief about the magnitudes of the unknown assignment probabilities. Note that testing the strict equality is bound to lead us straight to the arms of the Lindley paradox so that a different approach is likely to be more fruitful. Specifically, we specify a maximum tolerable threshold for the difference between the maximum and the minimum probability of being assigned the trial arms (let’s say 1-5%) and we directly calculate the probability for this difference (“probability of foul play”).

In the absence of prior evidence for (or against) foul play we use a non-informative prior in which all possible values of assignment probabilities are equally plausible. This (Dirichlet) prior corresponds to a prior state of knowledge in which three individuals were randomized and all three ended up in different treatment arms. Under this prior, the posterior distribution is itself a Dirichlet distribution with parameters equal to the number of individuals actually assigned to each arm+1. The following R code may then be used to calculate the probability of foul play, as previously defined i.e.

 event<-c(105,70,65)
set.seed(1234);
r<-rdirichlet(10000,event+1);
res0<-mean(apply(r,1,function(x,tol)
I(abs(max(x)-min(x))<=tol),0.01))
res0*100

This probability comes down to 0.4% which is numerically close to the frequentist answer, yet with a more intuitive interpretation: based on the observed trial sizes and a numerical tolerance for the maximum tolerable difference in assignment probability the odds for “foul play” are 249:1.
Increasing the tolerance will obviously decrease these odds, but in such a case we would be willing to tolerate larger differences in assignment probabilities. Although these results are mathematically trivial (and non-controversial), the plot will become more convoluted when one proceeds to use them to make a declaration of “foul play”. For in that case, a decision needs to be made which has to consider not only the probability of the uncertain events: “foul play” v.s. “not foul play” but also the consequences for the journal, the study investigators and the scientific community at large. At this level one would need to decide whether the odds of 249:1 are high enough or not for subsequent action to be taken. But this consideration will take us to the realm of decision theory (and it already 11pms).

The utility of frequentist statistics (in a single picture)

November 8, 2013

Ted Bunn nailed it

image

Time to move beyond Laplace and what prior he would have used, had he been alive today.


Follow

Get every new post delivered to your Inbox.

Join 464 other followers