library(rstan)
library(rethinking)
library(latex2exp)  # use latex expressions

References

Consider the following problem:

Compute \(p(X>12|X_1,...X_n)\) for \(X,X_i \sim \mathcal{N}(\mu, \sigma)\)

We will use the following data,

set.seed(101)

mu   <- 10 # unknown values
sigma <- 2

n <- 1000
xs <- rnorm(n, mu, sigma) # only this data is known

Using MLE

For the Gaussian, the maximum likelihood estimator of \((\mu, \sigma^2)\) given data \(D = \{X_1,\ldots,X_n\}\) is

\[(\hat\mu,\hat\sigma) = \left( \overline{D}, \operatorname{sd}(D)\right)\] The MLE of a function \(f\) of the parameters is the value of \(f\) applied to the MLEs of the parameters. So, given \(f\),

\[f(\mu,\sigma) = \Pr(X \gt 12~|~\mu,\sigma) = 1 - \Phi\left(\frac{12-\mu}{\sigma}\right)\]

f <- function(threshold, mu, sigma) {
  pnorm((threshold - mu) / sigma, lower.tail = FALSE)
}

The MLE of \(f\) is,

\[\hat f = 1 - \Phi\left(\frac{12-\hat\mu}{\hat\sigma}\right)\]

For the data defined above, the MLE is:

threshold <- 12

f(threshold, mean(xs), sd(xs))
## [1] 0.1402983

Let’s simulate this computation for many data sets to see how estimates are spread,

set.seed(17)

n.sim <- 1e4
n <- 240
x <- matrix(rnorm(n.sim*n, mu, sigma), n.sim)

# Compute the MLEs.
mu.hat     <- rowMeans(x)
sigma2.hat <- rowMeans((x - mu.hat)^2)
p.hat      <- f(threshold, mu.hat, sqrt(sigma2.hat))

hist(p.hat, freq=FALSE, col="dodgerblue", breaks=50,
     xlab=TeX("$\\hat{f}$"), cex.lab=1.25, main="")
abline(v = f(threshold, mu, sigma), lwd=2, col="red")

Using Bayesian Inference

Given \(D=\{X_1, \ldots, X_n\}\) we need to find \(p(\mu~|~D)\) and \(p(\sigma~|~D)\). With these two posteriors we can compute the posterior \(p(X>12~|~D)\), since any function with random vars as parameters, is also a random variable.

The respective Stan model is,

data { 
  int<lower=1> N;
  real X[N]; 
} 
parameters {
  real mu; 
  real<lower=0> sigma;
} 
model {
  mu ~ normal(0, 10);
  sigma ~ cauchy(0, 5);
  
  X ~ normal(mu, sigma);
}

Fitting the model with the available data,

fit1 <- sampling( model1,   
                  data    = list(X=xs, N=length(xs)), 
                  iter    = 50000, 
                  chains  = 2, 
                  refresh = 0
                )
precis(fit1)
##           mean         sd     5.5%     94.5%    n_eff     Rhat4
## mu    9.930269 0.06094363 9.833117 10.027941 50135.70 0.9999947
## sigma 1.920236 0.04303259 1.852817  1.990586 49368.88 1.0000179

Let’s extract the samples and show the posterior \(p(X>12~|~D)\),

samples <- rstan::extract(fit1)

# fs is a sample based on mu and sigma samples
fs <- f(threshold, samples$mu, samples$sigma)

hist(fs, freq=FALSE, col="dodgerblue", breaks=50,
     xlab=TeX("$\\hat{f}$"), cex.lab=1.25,
     main=TeX("$p(X>12|D)$"))
abline(v = f(threshold, mu, sigma), lwd=2, col="red")

This histogram and the MLE one are summarizing very different experiences. The histogram from the MLE summarizes thousands of different data sets, while this histogram shows the MCMC sampling of just the initial data set (which inherits its sample bias). The MLE of the initial data set is just one number.

Using Resampling

I’m using here the two bucket factories to define and run the bootstrap:

library(magrittr)

# multiplier == 0 represents infinite population (ie, no replacement)
make.bucket1 <- function(universe, withReplacement=TRUE) {
  function(n.sample) {
    sample(universe, n.sample, rep=withReplacement)
  }
}

# uses the bucket1 urn to generate a sample of size 'size.sample'
# and applies the given statistic function 
make.bucket2 <- function(bucket1, size.sample, statistic) {
  function(n) {
    replicate(n, bucket1(size.sample) %>% statistic) %>% 
    as.vector 
  }
}

The statistic to apply returns the pair of empirical mean and standard deviation for the current bootstrap sample,

stat <- function(sample) {
  c(mu=mean(sample), sigma=sd(sample)) 
}

bucket1 <- make.bucket1(xs, TRUE)
bucket2 <- make.bucket2(bucket1, length(xs), stat)

Let’s sample and plot the results:

bootstrap <- bucket2(5e4) %>% matrix(ncol=2,byrow=TRUE)

fs <- f(threshold, bootstrap[,1], bootstrap[,2])

hist(fs, freq=FALSE, col="dodgerblue", breaks=50,
     xlab=TeX("$\\hat{f}$"), cex.lab=1.25,
     main=TeX("$p(X>12|D)$"))
abline(v = f(threshold, mu, sigma), lwd=2, col="red")

Results Comparison

Comparing the true probability, the MLE estimator, the mean of the posterior, and the mean of the bootstrap samples:

f(threshold, mu, sigma)          # True value
## [1] 0.1586553
f(threshold, mean(xs), sd(xs))   # MLE
## [1] 0.1402983
f(threshold, mean(samples$mu), mean(samples$sigma))    # Posterior mean
## [1] 0.1405498
f(threshold, mean(bootstrap[,1]), mean(bootstrap[,2])) # Bootstrap mean
## [1] 0.1401176