Bayesian First Aid

Bayesian First Aid is an attempt at implementing reasonable Bayesian alternatives to the classical hypothesis tests in R. For the rationale behind Bayesian First Aid see the original announcement. The development of Bayesian First Aid can be followed on GitHub. Bayesian First Aid is a work in progress and I’m grateful for any suggestion on how to improve it! ref

The following code and tests in this chapter are taken from the package author’s webpages, namely:

# To install:
## install.packages("devtools")
# library(devtools)
# install_github("rasmusab/bayesian_first_aid")

library(BayesianFirstAid)

Binomial Test

We have \(n\) trials with \(x\) sucesses and \(n-x\) failures. We assume that the tests are iid.

The likelihood, the distribution of the data is considered a Binomial with parameter \(\theta\) (\(n\) is considered fixed):

\[p(x|n, \theta) = {n \choose x} \theta^x (1-\theta)^{(n-x)}\]

The prior distribution of parameter \(\theta\) used in the flat prior which is described by a Beta(1,1):

\[\theta \sim \text{Beta}(1,1)\]

The Bayesian estimation:

# x: number of successes
# n: number of trials
# comp.theta: a fixed relative frequency of success to compare with the estimated relative frequency of success
# cred.mass: the amount of probability mass that will be contained in reported credible intervals
# n.iter: The number of iterations to run the MCMC sampling (default: 15000, max: 1e6-1)

model <- bayes.binom.test(x=9, n=11, comp.theta=0.7, cred.mass = 0.95)
model
## 
##  Bayesian First Aid binomial test
## 
## data: 9 and 11
## number of successes = 9, number of trials = 11
## Estimated relative frequency of success:
##   0.78 
## 95% credible interval:
##   0.55 0.96 
## The relative frequency of success is more than 0.7 by a probability of 0.742 
## and less than 0.7 by a probability of 0.258

Versus the classical binomial test:

binom.test(x=9, n=11, p=0.7, conf.level=0.95)
## 
##  Exact binomial test
## 
## data:  9 and 11
## number of successes = 9, number of trials = 11, p-value = 0.523
## alternative hypothesis: true probability of success is not equal to 0.7
## 95 percent confidence interval:
##  0.4822 0.9772
## sample estimates:
## probability of success 
##                 0.8182

To plot and summarize:

summary(model)
##   Data
## number of successes = 9, number of trials = 11
## 
##   Model parameters and generated quantities
## theta: the relative frequency of success
## x_pred: predicted number of successes in a replication
## 
##   Measures
##         mean    sd HDIlo HDIup %<comp %>comp
## theta  0.767 0.113 0.545  0.96  0.258  0.742
## x_pred 8.434 1.847 5.000 11.00  0.000  1.000
## 
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## '%<comp' and '%>comp' are the probabilities of the respective parameter being
## smaller or larger than 0.7.
## 
##   Quantiles
##        q2.5%  q25% median   q75% q97.5%
## theta  0.512 0.696  0.781  0.853  0.944
## x_pred 4.000 7.000  9.000 10.000 11.000
plot(model)

plot of chunk unnamed-chunk-4

The posterior predictive distribution is the distribution of unobserved observation \(y\) (a prediction) conditional on the observed data \(x\)

\[p(y|x,n) = \int_{\theta} p(y|\theta) p(\theta|x,n) d\theta = E_{\theta|x,n} [ p(y|\theta) ]\]

diagnostics plots MCMC diagnostics based on the code package:

diagnostics(model)
## 
## Iterations = 1:5000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 5000 
## 
##   Diagnostic measures
##         mean    sd mcmc_se n_eff Rhat
## theta  0.767 0.113   0.001 14883    1
## x_pred 8.434 1.847   0.015 14749    1
## 
## mcmc_se: the estimated standard error of the MCMC approximation of the mean.
## n_eff: a crude measure of effective MCMC sample size.
## Rhat: the potential scale reduction factor (at convergence, Rhat=1).
## 
##   Model parameters and generated quantities
## theta: The relative frequency of success
## x_pred: Predicted number of successes in a replication

plot of chunk unnamed-chunk-5

model_code print out R and JAGS code that runs the model (copy-paste friendly):

model.code(model)
## ### Model code for the Bayesian First Aid alternative to the binomial test ###
## require(rjags)
## 
## # Setting up the data
## x <- 9 
## n <- 11 
## 
## # The model string written in the JAGS language
## model_string <- "model {
##   x ~ dbinom(theta, n)
##   theta ~ dbeta(1, 1)
##   x_pred ~ dbinom(theta, n)
## }"
## 
## # Running the model
## model <- jags.model(textConnection(model_string), data = list(x = x, n = n), 
##                     n.chains = 3, n.adapt=1000)
## samples <- coda.samples(model, c("theta", "x_pred"), n.iter=5000)
## 
## # Inspecting the posterior
## plot(samples)
## summary(samples)

Eg, to change the prior just replace, in the model_string, the distribution of \(\theta\) for something else (liek Jeffrey’s prior dbeta(0.5,0.5)) and run the code yourself.

One Sample and Paired Samples t-test

To replace the classical t-test, this test uses the BEST methodology which assumes the data \(x=(x_1,x_2,\ldots,x_n)\) are distributed as a t distribution according to the following model:

\[x_i \sim t(\mu,\sigma,\nu\]

\[\mu \sim \mathcal{N}(M_{\mu}, S_{\mu})\]

\[\sigma \sim U(L_{\sigma}, H_{\sigma})\]

\[\nu \sim \text{ShiftedExp}(\frac{1}{29}, \text{shift}=1)\]

\(\nu\) is the degrees of freedom of the t distribution (lower values mean heavier tails). The value \(\frac{1}{29}\) is used to try to balance nearly normal distributions (\(\nu\gt 30\)) with heavy tailed distributions (\(\nu\lt 30\)).

For paired samples the test takes the difference between each paired sample and model just the paired differences using the one sample procedure.

As an eg let’s load a dataset with coffee yields and compare these yields in two different time periords (1960-80 amd 1980-2001) and see if the test recognize some statistical significant change:

d <- read.csv("roubik_2002_coffe_yield.csv")
head(d)
##   world     country yield_61_to_80 yield_81_to_01
## 1   new  Costa_Rica           9139          14620
## 2   new     Bolivia           7686           8767
## 3   new El_Salvador           9996           8729
## 4   new   Guatemala           5488           8231
## 5   new    Colombia           5920           7740
## 6   new    Honduras           4096           7264
new.yield_61 <- d[,3][d$world=="new"]
new.yield_61
##  [1] 9139 7686 9996 5488 5920 4096 4566 4965 5487 5227 2347 3089 1938
new.yield_81 <- d[,4][d$world=="new"]
new.yield_81
##  [1] 14620  8767  8729  8231  7740  7264  6408  6283  5740  5116  4124
## [12]  3240  2789
model <- bayes.t.test(new.yield_61, new.yield_81, paired=TRUE)
model
## 
##  Bayesian estimation supersedes the t test (BEST) - paired samples
## 
## data: new.yield_61 and new.yield_81, n = 13
## 
##   Estimates [95% credible interval]
## mean paired difference: -1399 [-2462, -377]
## sd of the paired differences: 1690 [915, 2727]
## 
## The mean difference is more than 0 by a probability of 0.006 
## and less than 0 by a probability of 0.994

The \(95\)% credible interval does not include zero, so the test recognizes a statistical difference between the two samples.

summary(model)
##   Data
## new.yield_61, n = 13
## new.yield_81, n = 13
## 
##   Model parameters and generated quantities
## mu_diff: the mean pairwise difference between new.yield_61 and new.yield_81 
## sigma_diff: the scale of the pairwise difference, a consistent
##   estimate of SD when nu is large.
## nu: the degrees-of-freedom for the t distribution fitted to the pairwise difference
## eff_size: the effect size calculated as (mu_diff - 0) / sigma_diff
## diff_pred: predicted distribution for a new datapoint generated
##   as the pairwise difference between new.yield_61 and new.yield_81 
## 
##   Measures
##                mean      sd     HDIlo   HDIup %<comp %>comp
## mu_diff    -1401.85  525.00 -2461.806 -376.69  0.994  0.006
## sigma_diff  1757.19  487.99   915.427 2726.86  0.000  1.000
## nu            29.21   27.57     1.011   84.65  0.000  1.000
## eff_size      -0.85    0.36    -1.576   -0.18  0.994  0.006
## diff_pred  -1420.79 2165.88 -5583.972 2803.76  0.781  0.219
## 
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## '%<comp' and '%>comp' are the probabilities of the respective parameter being
## smaller or larger than 0.
## 
##   Quantiles
##                q2.5%      q25%    median      q75%   q97.5%
## mu_diff    -2463.431 -1729.394 -1399.476 -1063.340 -377.569
## sigma_diff  1002.421  1431.532  1689.791  2007.287 2868.776
## nu             2.314     9.441    20.572    40.006  102.353
## eff_size      -1.580    -1.079    -0.838    -0.608   -0.183
## diff_pred  -5610.141 -2645.939 -1410.718  -183.086 2782.547
plot(model)

plot of chunk unnamed-chunk-8

The posterior predictive box presents a histogram with a smattering of t-distributions drawn from the posterior. If there is a large discrepancy between the model fit and the data then we need to think twice before proceeding. Herein it’s ok.

diagnostics(model)
## 
## Iterations = 601:10600
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 10000 
## 
##   Diagnostic measures
##                mean      sd mcmc_se n_eff  Rhat
## mu_diff    -1401.85  525.00   4.291 15032 1.002
## sigma_diff  1757.19  487.99   6.084  7160 1.014
## nu            29.21   27.57   0.384  5203 1.001
## eff_size      -0.85    0.36   0.003 14951 1.000
## diff_pred  -1420.79 2165.88  12.971 27889 1.001
## 
## mcmc_se: the estimated standard error of the MCMC approximation of the mean.
## n_eff: a crude measure of effective MCMC sample size.
## Rhat: the potential scale reduction factor (at convergence, Rhat=1).
## 
##   Model parameters and generated quantities
## mu_diff: the mean pairwise difference between new.yield_61 and new.yield_81 
## sigma_diff: the scale of the pairwise difference, a consistent
##   estimate of SD when nu is large.
## nu: the degrees-of-freedom for the t distribution fitted to the pairwise difference
## eff_size: the effect size calculated as (mu_diff - 0) / sigma_diff
## diff_pred: predicted distribution for a new datapoint generated
##   as the pairwise difference between new.yield_61 and new.yield_81

plot of chunk unnamed-chunk-9plot of chunk unnamed-chunk-9

model.code(model)
## ## Model code for Bayesian estimation supersedes the t test - paired samples ##
## require(rjags)
## 
## # Setting up the data
## x <- new.yield_61 
## y <- new.yield_81 
## pair_diff <- x - y
## comp_mu <-  0 
## 
## # The model string written in the JAGS language
## model_string <- "model {
##   for(i in 1:length(pair_diff)) {
##     pair_diff[i] ~ dt( mu_diff , tau_diff , nu )
##   }
##   diff_pred ~ dt( mu_diff , tau_diff , nu )
##   eff_size <- (mu_diff - comp_mu) / sigma_diff
##   
##   mu_diff ~ dnorm( mean_mu , precision_mu )
##   tau_diff <- 1/pow( sigma_diff , 2 )
##   sigma_diff ~ dunif( sigma_low , sigma_high )
##   # A trick to get an exponentially distributed prior on nu that starts at 1.
##   nu <- nuMinusOne + 1 
##   nuMinusOne ~ dexp(1/29)
## }"
## 
## # Setting parameters for the priors that in practice will result
## # in flat priors on mu and sigma.
## mean_mu = mean(pair_diff, trim=0.2)
## precision_mu = 1 / (mad(pair_diff)^2 * 1000000)
## sigma_low = mad(pair_diff) / 1000 
## sigma_high = mad(pair_diff) * 1000
## 
## # Initializing parameters to sensible starting values helps the convergence
## # of the MCMC sampling. Here using robust estimates of the mean (trimmed)
## # and standard deviation (MAD).
## inits_list <- list(
##   mu_diff = mean(pair_diff, trim=0.2),
##   sigma_diff = mad(pair_diff),
##   nuMinusOne = 4)
## 
## data_list <- list(
##   pair_diff = pair_diff,
##   comp_mu = comp_mu,
##   mean_mu = mean_mu,
##   precision_mu = precision_mu,
##   sigma_low = sigma_low,
##   sigma_high = sigma_high)
## 
## # The parameters to monitor.
## params <- c("mu_diff", "sigma_diff", "nu", "eff_size", "diff_pred")
## 
## # Running the model
## model <- jags.model(textConnection(model_string), data = data_list,
##                     inits = inits_list, n.chains = 3, n.adapt=1000)
## update(model, 500) # Burning some samples to the MCMC gods....
## samples <- coda.samples(model, params, n.iter=10000)
## 
## # Inspecting the posterior
## plot(samples)
## summary(samples)

Pearson Correlation Test

This is a measure of the linear correlation (dependence) between two variables \(X_1\) and \(X_2\), giving a value \(\rho\) between +1 and -1 inclusive, where 1 is total positive correlation, 0 is no correlation, and -1 is total negative correlation

\[\rho = \frac{\text{cov}(X_1,X_2)}{\sigma_{X_1} \sigma_{X_2}\]

Classical statistical inference based on Pearson’s correlation coefficient aims at test the null hypothesis that \(\rho=0\) and finding the p% confidence interval around \(r\) (the sample correlation) that contains \(\rho\).

The Pearson’s correlation coefficient is which assumes bivariate normality.

The model that a classical Pearson’s correlation test assumes is that between two paired variables \(X_{i1}\) and \(X_{i2}\) it follows a bivariate normal distribution, ie, \(X_i1 \sim \mathcal{N}(\mu_1,\sigma_1), X_i2 \sim \mathcal{N}(\mu_2,\sigma_2)\) and for each pair \((x_i,y_i)\) there is a linear dependency which is quantified by \(\rho\). Here’s an eg for \(\rho=0.3\):

The classical model pressuposes that \(X=(X_1,X_2)^T\) has a multivariate normal distribution:

\[X \sim \mathcal{N}(\mu,\Sigma)\]

where \(\mu=(\mu_1,\mu_2)^T\) and

\[\Sigma = \left() \begin{array}{cc} \sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2 \end{array} \right)\]

This test is implemented by R function cor.test.

The following Bayesian model replaces the normal for a bivariate t distribution which is more robust to outliers (it could also be a normal, of course, the Bayesian framework is much more flexible in this sense).

The Bayesian model becomes

\[X \sim t(\mu,\Sigma,nu)\]

\[\rho \sim U(-1,1)\]

\[\mu_x,\mu_y \sim \mathcal{N}(M_{\mu}, \S_{\mu})\]

\[\sigma_{x_1}, \sigma_{x_2}\sim U(L_{\sigma}, U_{\sigma})\]

\[\nu \sim \text{ShiftedExp}(\frac{1}{29}, \text{shift}=1)\]

Let’s load some data:

d <- read.csv("2d4d_hone_2012.csv")
head(d)
##   ratio_2d4d grip_kg  sex
## 1     0.9427   54.19 male
## 2     0.9336   52.16 male
## 3     0.9460   51.15 male
## 4     0.9527   51.15 male
## 5     0.9178   50.14 male
## 6     0.9386   50.16 male
plot(d$ratio_2d4d, d$grip_kg, col=unclass(d$sex), pch=18)
legend("topright",c("female","male"), col=1:2, pch=18)

plot of chunk unnamed-chunk-10

There is a visual difference between the two variables concerning gender. Let’s try to find some pattern with this test

model.male <- bayes.cor.test( ~ ratio_2d4d + grip_kg, data=d[d$sex=="male",], n.iter = 5000)
model.male
## 
##  Bayesian First Aid Pearson's Correlation Coefficient Test
## 
## data: ratio_2d4d and grip_kg (n = 97)
## Estimated correlation:
##   -0.34 
## 95% credible interval:
##   -0.51 -0.15 
## The correlation is more than 0 by a probability of <0.001 
## and less than 0 by a probability of >0.999
summary(model.male)
##   Data
## ratio_2d4d and grip_kg, n = 97
## 
##   Model parameters
## rho: the correlation between ratio_2d4d and grip_kg 
## mu[1]: the mean of ratio_2d4d 
## sigma[1]: the scale of ratio_2d4d , a consistent
##   estimate of SD when nu is large.
## mu[2]: the mean of grip_kg 
## sigma[2]: the scale of grip_kg 
## nu: the degrees-of-freedom for the bivariate t distribution
## 
##   Measures
##            mean     sd  HDIlo   HDIup %<comp %>comp
## rho      -0.339  0.092 -0.511  -0.154  0.999  0.001
## mu[1]     0.958  0.003  0.952   0.965  0.000  1.000
## mu[2]    37.252  0.923 35.488  39.124  0.000  1.000
## sigma[1]  0.032  0.003  0.028   0.037  0.000  1.000
## sigma[2]  8.701  0.684  7.490  10.130  0.000  1.000
## nu       47.783 31.402  6.589 110.502  0.000  1.000
## 
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## '%<comp' and '%>comp' are the probabilities of the respective parameter being
## smaller or larger than 0.
## 
##   Quantiles
##           q2.5%   q25% median   q75%  q97.5%
## rho      -0.506 -0.402 -0.345 -0.283  -0.143
## mu[1]     0.951  0.956  0.958  0.960   0.965
## mu[2]    35.442 36.646 37.243 37.870  39.086
## sigma[1]  0.028  0.031  0.032  0.034   0.038
## sigma[2]  7.458  8.228  8.667  9.140  10.112
## nu       10.569 25.617 40.041 60.820 129.822
model.female <- bayes.cor.test( ~ ratio_2d4d + grip_kg, data=d[d$sex=="female",], n.iter = 5000)
model.female
## 
##  Bayesian First Aid Pearson's Correlation Coefficient Test
## 
## data: ratio_2d4d and grip_kg (n = 119)
## Estimated correlation:
##   -0.14 
## 95% credible interval:
##   -0.31 0.052 
## The correlation is more than 0 by a probability of 0.064 
## and less than 0 by a probability of 0.936
plot(model.male)

plot of chunk unnamed-chunk-11

plot(model.female)

plot of chunk unnamed-chunk-11

Both estimates indicate a slight negative correlation (the males more than the females). The two ellipses show the 50% (darker blue) and 95% (lighter blue) highest density regions. The red histograms show the marginal distributions of the data with a smatter of marginal densities drawn from the posterior. Looking at this plot we see that the model fits quite well, however, we could be concerned with the right skewness of the ratio_2d4d marginal which is not captured by the model (the t-distribution is symmetric).

To take a look at the posterior difference in correlation between the male and the female group, we first extract the MCMC samples from the Bayesian First Aid fit object using the as.data.frame function:

female.mcmc <- as.data.frame(model.female)
  male.mcmc <- as.data.frame(model.male)

hist(male.mcmc$rho-female.mcmc$rho, breaks=50, xlim=c(-1,1), main=bquote(paste("Density for ", rho, " male minus ", rho, " female")), yaxt="n", ylab="")

plot of chunk unnamed-chunk-12

And the entire code:

model.code(model.female)
## ## Model code for the Bayesian First Aid alternative to Pearson's correlation test. ##
## require(rjags)
## 
## # Setting up the data
## x <- d[d$sex == "female", ][ , "ratio_2d4d"] 
## y <- d[d$sex == "female", ][ , "grip_kg"] 
## xy <- cbind(x, y)
## 
## # The model string written in the JAGS language
## model_string <- "model {
##   for(i in 1:n) {
##     xy[i,1:2] ~ dmt(mu[], prec[ , ], nu) 
##   }
## 
##   # JAGS parameterizes the multivariate t using precision (inverse of variance) 
##   # rather than variance, therefore here inverting the covariance matrix.
##   prec[1:2,1:2] <- inverse(cov[,])
## 
##   # Constructing the covariance matrix
##   cov[1,1] <- sigma[1] * sigma[1]
##   cov[1,2] <- sigma[1] * sigma[2] * rho
##   cov[2,1] <- sigma[1] * sigma[2] * rho
##   cov[2,2] <- sigma[2] * sigma[2]
## 
##   # Priors  
##   rho ~ dunif(-1, 1)
##   sigma[1] ~ dunif(sigmaLow, sigmaHigh) 
##   sigma[2] ~ dunif(sigmaLow, sigmaHigh)
##   mu[1] ~ dnorm(mean_mu, precision_mu)
##   mu[2] ~ dnorm(mean_mu, precision_mu)
##   nu <- nuMinusOne+1
##   nuMinusOne ~ dexp(1/29)
## }"
## 
## # Initializing the data list and setting parameters for the priors
## # that in practice will result in flat priors on mu and sigma.
## data_list = list(
##   xy = xy, 
##   n = length(x),
##   mean_mu = mean(c(x, y), trim=0.2) ,
##   precision_mu = 1 / (max(mad(x), mad(y))^2 * 1000000),
##   sigmaLow = min(mad(x), mad(y)) / 1000 ,
##   sigmaHigh = max(mad(x), mad(y)) * 1000)
## 
## # Initializing parameters to sensible starting values helps the convergence
## # of the MCMC sampling. Here using robust estimates of the mean (trimmed)
## # and standard deviation (MAD).
## inits_list = list(mu=c(mean(x, trim=0.2), mean(y, trim=0.2)), rho=cor(x, y, method="spearman"), 
##                   sigma = c(mad(x), mad(y)), nuMinusOne = 5)
## 
## # The parameters to monitor.
## params <- c("rho", "mu", "sigma", "nu")
##   
## # Running the model
## model <- jags.model(textConnection(model_string), data = data_list,
##                     inits = inits_list, n.chains = 3, n.adapt=1000)
## update(model, 500) # Burning some samples to the MCMC gods....
## samples <- coda.samples(model, params, n.iter=5000)
## 
## # Inspecting the posterior
## plot(samples)
## summary(samples)

For instance, if you wanted to replace \(\rho \sim U(-1,1)\) with a Beta, such that \(\rho \sim \text{Beta}(-1,1)\) we would replace

rho ~ dunif(-1,1)

with

rho_half_width ~ dbeta(1,1)
rho ~ 2*rho_half_width - 1  # shift and strech from [0,1] to[-1,1]

and replace the inits_lit variable:

# from 
rho=cor(x, y, method="spearman")
# to
rho_half_width=( cor(x,y,method="spearman") + 1 ) / 2

By changing the parameter values of dbeta we could define several priors, placing more probability mass at that value of rho we know is more probable.

Goto part2