Bayesian First Aid is an attempt at implementing reasonable Bayesian alternatives to the classical hypothesis tests in R. For the rationale behind Bayesian First Aid see the original announcement. The development of Bayesian First Aid can be followed on GitHub. Bayesian First Aid is a work in progress and I’m grateful for any suggestion on how to improve it! ref
The following code and tests in this chapter are taken from the package author’s webpages, namely:
# To install:
## install.packages("devtools")
# library(devtools)
# install_github("rasmusab/bayesian_first_aid")
library(BayesianFirstAid)
We have \(n\) trials with \(x\) sucesses and \(n-x\) failures. We assume that the tests are iid.
The likelihood, the distribution of the data is considered a Binomial with parameter \(\theta\) (\(n\) is considered fixed):
\[p(x|n, \theta) = {n \choose x} \theta^x (1-\theta)^{(n-x)}\]
The prior distribution of parameter \(\theta\) used in the flat prior which is described by a Beta(1,1):
\[\theta \sim \text{Beta}(1,1)\]
The Bayesian estimation:
# x: number of successes
# n: number of trials
# comp.theta: a fixed relative frequency of success to compare with the estimated relative frequency of success
# cred.mass: the amount of probability mass that will be contained in reported credible intervals
# n.iter: The number of iterations to run the MCMC sampling (default: 15000, max: 1e6-1)
model <- bayes.binom.test(x=9, n=11, comp.theta=0.7, cred.mass = 0.95)
model
##
## Bayesian First Aid binomial test
##
## data: 9 and 11
## number of successes = 9, number of trials = 11
## Estimated relative frequency of success:
## 0.78
## 95% credible interval:
## 0.55 0.96
## The relative frequency of success is more than 0.7 by a probability of 0.742
## and less than 0.7 by a probability of 0.258
Versus the classical binomial test:
binom.test(x=9, n=11, p=0.7, conf.level=0.95)
##
## Exact binomial test
##
## data: 9 and 11
## number of successes = 9, number of trials = 11, p-value = 0.523
## alternative hypothesis: true probability of success is not equal to 0.7
## 95 percent confidence interval:
## 0.4822 0.9772
## sample estimates:
## probability of success
## 0.8182
To plot and summarize:
summary(model)
## Data
## number of successes = 9, number of trials = 11
##
## Model parameters and generated quantities
## theta: the relative frequency of success
## x_pred: predicted number of successes in a replication
##
## Measures
## mean sd HDIlo HDIup %<comp %>comp
## theta 0.767 0.113 0.545 0.96 0.258 0.742
## x_pred 8.434 1.847 5.000 11.00 0.000 1.000
##
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## '%<comp' and '%>comp' are the probabilities of the respective parameter being
## smaller or larger than 0.7.
##
## Quantiles
## q2.5% q25% median q75% q97.5%
## theta 0.512 0.696 0.781 0.853 0.944
## x_pred 4.000 7.000 9.000 10.000 11.000
plot(model)
The posterior predictive distribution is the distribution of unobserved observation \(y\) (a prediction) conditional on the observed data \(x\)
\[p(y|x,n) = \int_{\theta} p(y|\theta) p(\theta|x,n) d\theta = E_{\theta|x,n} [ p(y|\theta) ]\]
diagnostics
plots MCMC diagnostics based on the code
package:
diagnostics(model)
##
## Iterations = 1:5000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## Diagnostic measures
## mean sd mcmc_se n_eff Rhat
## theta 0.767 0.113 0.001 14883 1
## x_pred 8.434 1.847 0.015 14749 1
##
## mcmc_se: the estimated standard error of the MCMC approximation of the mean.
## n_eff: a crude measure of effective MCMC sample size.
## Rhat: the potential scale reduction factor (at convergence, Rhat=1).
##
## Model parameters and generated quantities
## theta: The relative frequency of success
## x_pred: Predicted number of successes in a replication
model_code
print out R and JAGS code that runs the model (copy-paste friendly):
model.code(model)
## ### Model code for the Bayesian First Aid alternative to the binomial test ###
## require(rjags)
##
## # Setting up the data
## x <- 9
## n <- 11
##
## # The model string written in the JAGS language
## model_string <- "model {
## x ~ dbinom(theta, n)
## theta ~ dbeta(1, 1)
## x_pred ~ dbinom(theta, n)
## }"
##
## # Running the model
## model <- jags.model(textConnection(model_string), data = list(x = x, n = n),
## n.chains = 3, n.adapt=1000)
## samples <- coda.samples(model, c("theta", "x_pred"), n.iter=5000)
##
## # Inspecting the posterior
## plot(samples)
## summary(samples)
Eg, to change the prior just replace, in the model_string, the distribution of \(\theta\) for something else (liek Jeffrey’s prior dbeta(0.5,0.5)
) and run the code yourself.
To replace the classical t-test, this test uses the BEST methodology which assumes the data \(x=(x_1,x_2,\ldots,x_n)\) are distributed as a t distribution according to the following model:
\[x_i \sim t(\mu,\sigma,\nu\]
\[\mu \sim \mathcal{N}(M_{\mu}, S_{\mu})\]
\[\sigma \sim U(L_{\sigma}, H_{\sigma})\]
\[\nu \sim \text{ShiftedExp}(\frac{1}{29}, \text{shift}=1)\]
\(\nu\) is the degrees of freedom of the t distribution (lower values mean heavier tails). The value \(\frac{1}{29}\) is used to try to balance nearly normal distributions (\(\nu\gt 30\)) with heavy tailed distributions (\(\nu\lt 30\)).
For paired samples the test takes the difference between each paired sample and model just the paired differences using the one sample procedure.
As an eg let’s load a dataset with coffee yields and compare these yields in two different time periords (1960-80 amd 1980-2001) and see if the test recognize some statistical significant change:
d <- read.csv("roubik_2002_coffe_yield.csv")
head(d)
## world country yield_61_to_80 yield_81_to_01
## 1 new Costa_Rica 9139 14620
## 2 new Bolivia 7686 8767
## 3 new El_Salvador 9996 8729
## 4 new Guatemala 5488 8231
## 5 new Colombia 5920 7740
## 6 new Honduras 4096 7264
new.yield_61 <- d[,3][d$world=="new"]
new.yield_61
## [1] 9139 7686 9996 5488 5920 4096 4566 4965 5487 5227 2347 3089 1938
new.yield_81 <- d[,4][d$world=="new"]
new.yield_81
## [1] 14620 8767 8729 8231 7740 7264 6408 6283 5740 5116 4124
## [12] 3240 2789
model <- bayes.t.test(new.yield_61, new.yield_81, paired=TRUE)
model
##
## Bayesian estimation supersedes the t test (BEST) - paired samples
##
## data: new.yield_61 and new.yield_81, n = 13
##
## Estimates [95% credible interval]
## mean paired difference: -1399 [-2462, -377]
## sd of the paired differences: 1690 [915, 2727]
##
## The mean difference is more than 0 by a probability of 0.006
## and less than 0 by a probability of 0.994
The \(95\)% credible interval does not include zero, so the test recognizes a statistical difference between the two samples.
summary(model)
## Data
## new.yield_61, n = 13
## new.yield_81, n = 13
##
## Model parameters and generated quantities
## mu_diff: the mean pairwise difference between new.yield_61 and new.yield_81
## sigma_diff: the scale of the pairwise difference, a consistent
## estimate of SD when nu is large.
## nu: the degrees-of-freedom for the t distribution fitted to the pairwise difference
## eff_size: the effect size calculated as (mu_diff - 0) / sigma_diff
## diff_pred: predicted distribution for a new datapoint generated
## as the pairwise difference between new.yield_61 and new.yield_81
##
## Measures
## mean sd HDIlo HDIup %<comp %>comp
## mu_diff -1401.85 525.00 -2461.806 -376.69 0.994 0.006
## sigma_diff 1757.19 487.99 915.427 2726.86 0.000 1.000
## nu 29.21 27.57 1.011 84.65 0.000 1.000
## eff_size -0.85 0.36 -1.576 -0.18 0.994 0.006
## diff_pred -1420.79 2165.88 -5583.972 2803.76 0.781 0.219
##
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## '%<comp' and '%>comp' are the probabilities of the respective parameter being
## smaller or larger than 0.
##
## Quantiles
## q2.5% q25% median q75% q97.5%
## mu_diff -2463.431 -1729.394 -1399.476 -1063.340 -377.569
## sigma_diff 1002.421 1431.532 1689.791 2007.287 2868.776
## nu 2.314 9.441 20.572 40.006 102.353
## eff_size -1.580 -1.079 -0.838 -0.608 -0.183
## diff_pred -5610.141 -2645.939 -1410.718 -183.086 2782.547
plot(model)
The posterior predictive box presents a histogram with a smattering of t-distributions drawn from the posterior. If there is a large discrepancy between the model fit and the data then we need to think twice before proceeding. Herein it’s ok.
diagnostics(model)
##
## Iterations = 601:10600
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 10000
##
## Diagnostic measures
## mean sd mcmc_se n_eff Rhat
## mu_diff -1401.85 525.00 4.291 15032 1.002
## sigma_diff 1757.19 487.99 6.084 7160 1.014
## nu 29.21 27.57 0.384 5203 1.001
## eff_size -0.85 0.36 0.003 14951 1.000
## diff_pred -1420.79 2165.88 12.971 27889 1.001
##
## mcmc_se: the estimated standard error of the MCMC approximation of the mean.
## n_eff: a crude measure of effective MCMC sample size.
## Rhat: the potential scale reduction factor (at convergence, Rhat=1).
##
## Model parameters and generated quantities
## mu_diff: the mean pairwise difference between new.yield_61 and new.yield_81
## sigma_diff: the scale of the pairwise difference, a consistent
## estimate of SD when nu is large.
## nu: the degrees-of-freedom for the t distribution fitted to the pairwise difference
## eff_size: the effect size calculated as (mu_diff - 0) / sigma_diff
## diff_pred: predicted distribution for a new datapoint generated
## as the pairwise difference between new.yield_61 and new.yield_81
model.code(model)
## ## Model code for Bayesian estimation supersedes the t test - paired samples ##
## require(rjags)
##
## # Setting up the data
## x <- new.yield_61
## y <- new.yield_81
## pair_diff <- x - y
## comp_mu <- 0
##
## # The model string written in the JAGS language
## model_string <- "model {
## for(i in 1:length(pair_diff)) {
## pair_diff[i] ~ dt( mu_diff , tau_diff , nu )
## }
## diff_pred ~ dt( mu_diff , tau_diff , nu )
## eff_size <- (mu_diff - comp_mu) / sigma_diff
##
## mu_diff ~ dnorm( mean_mu , precision_mu )
## tau_diff <- 1/pow( sigma_diff , 2 )
## sigma_diff ~ dunif( sigma_low , sigma_high )
## # A trick to get an exponentially distributed prior on nu that starts at 1.
## nu <- nuMinusOne + 1
## nuMinusOne ~ dexp(1/29)
## }"
##
## # Setting parameters for the priors that in practice will result
## # in flat priors on mu and sigma.
## mean_mu = mean(pair_diff, trim=0.2)
## precision_mu = 1 / (mad(pair_diff)^2 * 1000000)
## sigma_low = mad(pair_diff) / 1000
## sigma_high = mad(pair_diff) * 1000
##
## # Initializing parameters to sensible starting values helps the convergence
## # of the MCMC sampling. Here using robust estimates of the mean (trimmed)
## # and standard deviation (MAD).
## inits_list <- list(
## mu_diff = mean(pair_diff, trim=0.2),
## sigma_diff = mad(pair_diff),
## nuMinusOne = 4)
##
## data_list <- list(
## pair_diff = pair_diff,
## comp_mu = comp_mu,
## mean_mu = mean_mu,
## precision_mu = precision_mu,
## sigma_low = sigma_low,
## sigma_high = sigma_high)
##
## # The parameters to monitor.
## params <- c("mu_diff", "sigma_diff", "nu", "eff_size", "diff_pred")
##
## # Running the model
## model <- jags.model(textConnection(model_string), data = data_list,
## inits = inits_list, n.chains = 3, n.adapt=1000)
## update(model, 500) # Burning some samples to the MCMC gods....
## samples <- coda.samples(model, params, n.iter=10000)
##
## # Inspecting the posterior
## plot(samples)
## summary(samples)
This is a measure of the linear correlation (dependence) between two variables \(X_1\) and \(X_2\), giving a value \(\rho\) between +1 and -1 inclusive, where 1 is total positive correlation, 0 is no correlation, and -1 is total negative correlation
\[\rho = \frac{\text{cov}(X_1,X_2)}{\sigma_{X_1} \sigma_{X_2}\]
Classical statistical inference based on Pearson’s correlation coefficient aims at test the null hypothesis that \(\rho=0\) and finding the p% confidence interval around \(r\) (the sample correlation) that contains \(\rho\).
The Pearson’s correlation coefficient is which assumes bivariate normality.
The model that a classical Pearson’s correlation test assumes is that between two paired variables \(X_{i1}\) and \(X_{i2}\) it follows a bivariate normal distribution, ie, \(X_i1 \sim \mathcal{N}(\mu_1,\sigma_1), X_i2 \sim \mathcal{N}(\mu_2,\sigma_2)\) and for each pair \((x_i,y_i)\) there is a linear dependency which is quantified by \(\rho\). Here’s an eg for \(\rho=0.3\):
The classical model pressuposes that \(X=(X_1,X_2)^T\) has a multivariate normal distribution:
\[X \sim \mathcal{N}(\mu,\Sigma)\]
where \(\mu=(\mu_1,\mu_2)^T\) and
\[\Sigma = \left() \begin{array}{cc} \sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2 \end{array} \right)\]
This test is implemented by R function cor.test
.
The following Bayesian model replaces the normal for a bivariate t distribution which is more robust to outliers (it could also be a normal, of course, the Bayesian framework is much more flexible in this sense).
The Bayesian model becomes
\[X \sim t(\mu,\Sigma,nu)\]
\[\rho \sim U(-1,1)\]
\[\mu_x,\mu_y \sim \mathcal{N}(M_{\mu}, \S_{\mu})\]
\[\sigma_{x_1}, \sigma_{x_2}\sim U(L_{\sigma}, U_{\sigma})\]
\[\nu \sim \text{ShiftedExp}(\frac{1}{29}, \text{shift}=1)\]
Let’s load some data:
d <- read.csv("2d4d_hone_2012.csv")
head(d)
## ratio_2d4d grip_kg sex
## 1 0.9427 54.19 male
## 2 0.9336 52.16 male
## 3 0.9460 51.15 male
## 4 0.9527 51.15 male
## 5 0.9178 50.14 male
## 6 0.9386 50.16 male
plot(d$ratio_2d4d, d$grip_kg, col=unclass(d$sex), pch=18)
legend("topright",c("female","male"), col=1:2, pch=18)
There is a visual difference between the two variables concerning gender. Let’s try to find some pattern with this test
model.male <- bayes.cor.test( ~ ratio_2d4d + grip_kg, data=d[d$sex=="male",], n.iter = 5000)
model.male
##
## Bayesian First Aid Pearson's Correlation Coefficient Test
##
## data: ratio_2d4d and grip_kg (n = 97)
## Estimated correlation:
## -0.34
## 95% credible interval:
## -0.51 -0.15
## The correlation is more than 0 by a probability of <0.001
## and less than 0 by a probability of >0.999
summary(model.male)
## Data
## ratio_2d4d and grip_kg, n = 97
##
## Model parameters
## rho: the correlation between ratio_2d4d and grip_kg
## mu[1]: the mean of ratio_2d4d
## sigma[1]: the scale of ratio_2d4d , a consistent
## estimate of SD when nu is large.
## mu[2]: the mean of grip_kg
## sigma[2]: the scale of grip_kg
## nu: the degrees-of-freedom for the bivariate t distribution
##
## Measures
## mean sd HDIlo HDIup %<comp %>comp
## rho -0.339 0.092 -0.511 -0.154 0.999 0.001
## mu[1] 0.958 0.003 0.952 0.965 0.000 1.000
## mu[2] 37.252 0.923 35.488 39.124 0.000 1.000
## sigma[1] 0.032 0.003 0.028 0.037 0.000 1.000
## sigma[2] 8.701 0.684 7.490 10.130 0.000 1.000
## nu 47.783 31.402 6.589 110.502 0.000 1.000
##
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## '%<comp' and '%>comp' are the probabilities of the respective parameter being
## smaller or larger than 0.
##
## Quantiles
## q2.5% q25% median q75% q97.5%
## rho -0.506 -0.402 -0.345 -0.283 -0.143
## mu[1] 0.951 0.956 0.958 0.960 0.965
## mu[2] 35.442 36.646 37.243 37.870 39.086
## sigma[1] 0.028 0.031 0.032 0.034 0.038
## sigma[2] 7.458 8.228 8.667 9.140 10.112
## nu 10.569 25.617 40.041 60.820 129.822
model.female <- bayes.cor.test( ~ ratio_2d4d + grip_kg, data=d[d$sex=="female",], n.iter = 5000)
model.female
##
## Bayesian First Aid Pearson's Correlation Coefficient Test
##
## data: ratio_2d4d and grip_kg (n = 119)
## Estimated correlation:
## -0.14
## 95% credible interval:
## -0.31 0.052
## The correlation is more than 0 by a probability of 0.064
## and less than 0 by a probability of 0.936
plot(model.male)
plot(model.female)
Both estimates indicate a slight negative correlation (the males more than the females). The two ellipses show the 50% (darker blue) and 95% (lighter blue) highest density regions. The red histograms show the marginal distributions of the data with a smatter of marginal densities drawn from the posterior. Looking at this plot we see that the model fits quite well, however, we could be concerned with the right skewness of the ratio_2d4d
marginal which is not captured by the model (the t-distribution is symmetric).
To take a look at the posterior difference in correlation between the male and the female group, we first extract the MCMC samples from the Bayesian First Aid fit object using the as.data.frame
function:
female.mcmc <- as.data.frame(model.female)
male.mcmc <- as.data.frame(model.male)
hist(male.mcmc$rho-female.mcmc$rho, breaks=50, xlim=c(-1,1), main=bquote(paste("Density for ", rho, " male minus ", rho, " female")), yaxt="n", ylab="")
And the entire code:
model.code(model.female)
## ## Model code for the Bayesian First Aid alternative to Pearson's correlation test. ##
## require(rjags)
##
## # Setting up the data
## x <- d[d$sex == "female", ][ , "ratio_2d4d"]
## y <- d[d$sex == "female", ][ , "grip_kg"]
## xy <- cbind(x, y)
##
## # The model string written in the JAGS language
## model_string <- "model {
## for(i in 1:n) {
## xy[i,1:2] ~ dmt(mu[], prec[ , ], nu)
## }
##
## # JAGS parameterizes the multivariate t using precision (inverse of variance)
## # rather than variance, therefore here inverting the covariance matrix.
## prec[1:2,1:2] <- inverse(cov[,])
##
## # Constructing the covariance matrix
## cov[1,1] <- sigma[1] * sigma[1]
## cov[1,2] <- sigma[1] * sigma[2] * rho
## cov[2,1] <- sigma[1] * sigma[2] * rho
## cov[2,2] <- sigma[2] * sigma[2]
##
## # Priors
## rho ~ dunif(-1, 1)
## sigma[1] ~ dunif(sigmaLow, sigmaHigh)
## sigma[2] ~ dunif(sigmaLow, sigmaHigh)
## mu[1] ~ dnorm(mean_mu, precision_mu)
## mu[2] ~ dnorm(mean_mu, precision_mu)
## nu <- nuMinusOne+1
## nuMinusOne ~ dexp(1/29)
## }"
##
## # Initializing the data list and setting parameters for the priors
## # that in practice will result in flat priors on mu and sigma.
## data_list = list(
## xy = xy,
## n = length(x),
## mean_mu = mean(c(x, y), trim=0.2) ,
## precision_mu = 1 / (max(mad(x), mad(y))^2 * 1000000),
## sigmaLow = min(mad(x), mad(y)) / 1000 ,
## sigmaHigh = max(mad(x), mad(y)) * 1000)
##
## # Initializing parameters to sensible starting values helps the convergence
## # of the MCMC sampling. Here using robust estimates of the mean (trimmed)
## # and standard deviation (MAD).
## inits_list = list(mu=c(mean(x, trim=0.2), mean(y, trim=0.2)), rho=cor(x, y, method="spearman"),
## sigma = c(mad(x), mad(y)), nuMinusOne = 5)
##
## # The parameters to monitor.
## params <- c("rho", "mu", "sigma", "nu")
##
## # Running the model
## model <- jags.model(textConnection(model_string), data = data_list,
## inits = inits_list, n.chains = 3, n.adapt=1000)
## update(model, 500) # Burning some samples to the MCMC gods....
## samples <- coda.samples(model, params, n.iter=5000)
##
## # Inspecting the posterior
## plot(samples)
## summary(samples)
For instance, if you wanted to replace \(\rho \sim U(-1,1)\) with a Beta, such that \(\rho \sim \text{Beta}(-1,1)\) we would replace
rho ~ dunif(-1,1)
with
rho_half_width ~ dbeta(1,1)
rho ~ 2*rho_half_width - 1 # shift and strech from [0,1] to[-1,1]
and replace the inits_lit variable:
# from
rho=cor(x, y, method="spearman")
# to
rho_half_width=( cor(x,y,method="spearman") + 1 ) / 2
By changing the parameter values of dbeta
we could define several priors, placing more probability mass at that value of rho
we know is more probable.
Goto part2