Current version of OpenBUGS needed.

Refs:

• Lunn et al. The BUGS Book website

• Kruschke Doing Bayesian Data Analysis

BUGS allow us to do statistical inference using a graphical model approach:

1. Start with the model describing assumptions concerning the relationships of the problem
2. Feed the model with the available data
3. Use an engine to obtain approximated inferences about the unobserved variables

The model is described in a declarative way. It has two kinds of connections:

• logical dependence which is represented by <-, there is not randomness here, just a simple assignment. At the right there is an expression

• stochastic dependence which is represented by ~. At the right there is a statistical distribution

BUGS supports many distributions. Some of them:

• Normal, dnorm(mean,precision), precision = 1/variance

• Bernoulli, dbern(p), p is the success probability

• Binomial, dbin(p,n), p is the success probability and n is the number of trials

• Categorical, dcat(p[]), p is a vector with a discrete distribution

• Uniform, dunif(a,b), uniform between [a,b]

• Beta, dbeta(a,b)

• Gamma, dgamma(a,b)

• Logistic, dlogis(mu, tau)

• Exponential, dexp(theta)

• Student’s t, dt(mean, tau, k), where $$k$$ is the degrees of freedom

• Pareto, dpar(a,b), $$p(x|a,b)=ab^ax^{-(a+1)}$$

• Poisson, dpois(theta)

• Geometrix, dgeom(theta)

• Beta-binomial, dbetabin(a,b,n)

• Multinomial, x[1:R] ~ dmulti(theta[], n), $$n$$ events each with R mutually exclusive outcomes with probabilities $$\theta_1\ldots \theta_R$$

Vector’s use:

• v[i], the ith position

• v[a:b], (v[a], v[a+1], …, v[b])

• v[i,], the ith row

• v[,j], the jth column

## First Example

Let’s try a simple example: if we flip a fair coin 8 times, what is the probability of seeing 2 or less heads?

First, we can solve this analitically without BUGS. Given $$Y \sim Binomial(0.5,8)$$, we wish to know $$Pr(Y \le 2)$$. That is,

$Pr(Y \le 2) = \sum_{y=0}^2 p(y|\pi=0.5, n=8) = \sum_{y=0}^2 {8 \choose y} (0.5)^y (1-0.5)^{8-y} = 0.1445$

Or in R:

pbinom(2, size=8, prob=0.5)
## [1] 0.1445313

Now, let’s show how to simulate it using BUGS.

Our example is modelled with two assigments (step is BUGS step function, ie, returns 1 if the input is non-negative):

  Y  ~ dbin(0.5, 8)
P2 <- step(2.5-Y) # i.e., is Y <= 2?

In diagram:

Let’s show how to define and run this model using R to communicate with BUGS:

library(BRugs)
## Welcome to BRugs connected to OpenBUGS version 3.2.3
# Function to draw histograms of sample results
draw.hist <- function(data, text="") {
tmp <- hist(data, breaks=0:(max(data)+1), xaxt="n", right=FALSE, freq=TRUE, main=text)
axis(1, at=tmp$mids, labels=0:max(data)) } #------------------------------------------------------------------------------ # THE MODEL # Specify the model in BUGS language, but save it as a string in R: modelString = " # BUGS model specification begins ... model { Y ~ dbin(0.5, 8) P2 <- step(2.5-Y) # i.e., is Y <= 2? } # ... BUGS model specification ends. " # close quote to end modelString # Write the modelString to a file, using R commands: writeLines(modelString,con="model.txt") # Use BRugs to send the model.txt file to BUGS, which checks the model syntax: modelCheck( "model.txt" ) ## model is syntactically correct #------------------------------------------------------------------------------ # INTIALIZE THE CHAIN. modelCompile() # BRugs command tells BUGS to compile the model. ## model compiled modelGenInits() # BRugs command tells BUGS to randomly initialize a chain. ## initial values generated, model initialized #------------------------------------------------------------------------------ # RUN THE CHAINS. # BRugs tells BUGS to keep a record of the sampled values: samplesSet( c("Y","P2") ) ## monitor set for variable 'Y' ## monitor set for variable 'P2' # R command defines a new variable that specifies an arbitrary chain length: chainLength = 20000 # BRugs tells BUGS to generate a MCMC chain: modelUpdate( chainLength ) ## 20000 updates took 0 s #------------------------------------------------------------------------------ # EXAMINE THE RESULTS. YSample <- samplesSample( "Y" ) # BRugs asks BUGS for the sample values. # The simulated results of flips of heads head(YSample, n=100)  ## [1] 4 8 4 5 4 5 4 2 5 3 6 4 4 3 5 3 5 3 4 2 3 7 2 5 3 3 4 6 5 3 1 6 2 4 4 ## [36] 1 6 6 3 6 2 5 5 4 5 5 6 3 4 6 3 3 4 3 2 4 5 3 4 5 5 4 2 4 2 6 5 3 4 4 ## [71] 3 4 7 2 3 5 4 4 6 7 5 6 3 6 6 6 5 1 7 3 2 5 3 4 6 4 5 4 4 5 draw.hist(YSample, "num. of heads") P2Sample <- samplesSample( "P2" ) # But it's P2 that matter to us # we can see them (there are 10k samples, 0 if there were more than two heads, 1 otherwise) head(P2Sample, n=50)  ## [1] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 ## [36] 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 draw.hist(P2Sample, "less or equal than 2 heads?") samplesStats( "Y" ) # BRugs asks BUGS for summary statistics. ## mean sd MC_error val2.5pc median val97.5pc start sample ## Y 4.006 1.424 0.008681 1 4 7 1 20000 P2Summary <- samplesStats( "P2" ) P2Summary ## mean sd MC_error val2.5pc median val97.5pc start sample ## P2 0.1464 0.3536 0.00228 0 0 1 1 20000 We see that the mean is 0.1464, quite close to the true answer. To avoid repeating the initial steps, let’s make a function for them: run.model <- function(model, samples, data=list(), chainLength=10000, burnin=0.10, init.func, n.chains=1, thin=1) { writeLines(model, con="model.txt") # Write the modelString to a file modelCheck( "model.txt" ) # Send the model to BUGS, which checks the model syntax if (length(data)>0) # If there's any data available... modelData(bugsData(data)) # ... BRugs puts it into a file and ships it to BUGS modelCompile(n.chains) # BRugs command tells BUGS to compile the model if (missing(init.func)) { modelGenInits() # BRugs command tells BUGS to randomly initialize a chain } else { for (chain in 1:n.chains) { # otherwise use user's init data modelInits(bugsInits(init.func)) } } modelUpdate(chainLength*burnin) # Burn-in period to be discarded samplesSet(samples) # BRugs tells BUGS to keep a record of the sampled values samplesSetThin(thin) # Set thinning modelUpdate(chainLength) # BRugs command tells BUGS to randomly initialize a chain } ## Coin eg A defective coin minting machine produces coins whose probability of Heads is a random variable Q with pdf $f_Q(q) = 3q^2, q \in [0,1]$ A coin produced by this machine is tossed repeatedly, with successive tosses assumed to be independent. Let A be the event that the first toss of this coin results in Heads, and let B be the event that the second toss of this coin results in Heads. The probability $$P(A)$$ is the expected value of tossing Heads: $E[q] = \int_0^1 3q^2\times q~dq = 0.75$ It’s possible to simulate this in Bugs. However, we need to create a random generator of this pdf. For that we use the inverse transformation technique: Compute the cdf: $F_Q(q) = q^3$ and its inverse $F_Q^{-1}(u)=u^{1/3}$ modelString = " model { u ~ dunif(0,1) fq <- pow(u,1/3) Toss ~ dbern(fq) } " run.model(modelString, samples=c("fq"), chainLength=50000) ## model is syntactically correct ## model compiled ## initial values generated, model initialized ## 5000 updates took 0 s ## monitor set for variable 'fq' ## 50000 updates took 0 s samplesStats("fq")$mean
## [1] 0.7489

Say we wish to compute the posterior pdf $$f_{Q|A}$$ after event $$A$$.

$f_{Q|A}(q|A) = \frac{f_Q(q) p(A|q)}{\int_0^1 f_Q(q) p(A|q)~dq}$

The prior $$f_Q(q)$$ is the uniform, so $$f_Q(q)=1$$.

The likelihood $$p(A|q)$$ is given by $$P_Q(X \le q)$$ which is the cdf $$F_Q(q) = q^3$$, so:

$f_{Q|A}(q|A) = \frac{1 \times q^3}{\int_0^1 1 \times q^3~dq} = \frac{q^3}{1/4} = 4q^3$

What is the expected value of tossing Heads after event $$A$$?

$E[q|A] = \int_0^1 4q^3\times q~dq = 0.8$

Let Bugs it:

data.list = list(
Toss = 1  # event A
)

run.model(modelString, samples=c("fq"), data=data.list, chainLength=50000)
## model is syntactically correct
## data loaded
## model compiled
## initial values generated, model initialized
## 5000 updates took 0 s
## monitor set for variable 'fq'
## 50000 updates took 0 s
samplesStats("fq")$mean ## [1] 0.8003 # comparing the empirical data with the analytical solution: hist(samplesSample( "fq" ), breaks=50, prob=T) curve(4*x^3, col="red", lwd=2, add=T) ## Random Variable Transformation The next eg asks questions after a transformation of the random variable: We have $$Z \sim N(0,1)$$. From $$Z$$ we obtain $$Y=(2Z+1)^3$$. What is its distribution, expected value and $$Pr(Y>10)$$? This problem has a non-trivial analytical solution but can be easily simulated: modelString = " model { Z ~ dnorm(0, 1) # note: 1 is precision = 1/variance Y <- pow(2*Z+1, 3) P10 <- step(Y-10) } " run.model(modelString, samples=c("Y","P10"), chainLength=50000) ## model is syntactically correct ## model compiled ## initial values generated, model initialized ## 5000 updates took 0 s ## monitor set for variable 'Y' ## monitor set for variable 'P10' ## 50000 updates took 0 s Now we can answer the questions: # Its distribution YSample <- samplesSample( "Y" ) hist(YSample, prob=TRUE) lines(density(YSample, adjust=5), col="red", lwd=2) # Its expected value samplesStats("Y")$mean
## [1] 12.95
# Pr(Y>10)
samplesStats("P10")$mean ## [1] 0.2832 Analytically the expected value is $$13$$ and $$Pr(Y>10) \approx 0.2820$$. The simulated results are very close. ## An otherwise intractable problem We have $$20$$ equal items to repair. The unit repair cost is given by a gamma with mean $$100$$ euros and sd $$50$$ euros. We have $$1000$$ euros for repairs. How many items, on average, will we be able to repair? A gamma(a,b) distribution has mean $$\frac{a}{b}$$ and variance $$\frac{a}{b^2}$$. So the gamma for this exercise must be gamma(4,0.04). To use BUGS we will add 20 Y random variables, where $$Y_i$$ is the cost of item i. We’ll compute a cumulative vector of costs, cum.cost, where cum.cost[i] is the sum of the first i items. Then we’ll make a cum.step vector that consists of $$1,2,\ldots,M,0,0,\ldots$$ where $$M$$ is the last item we fix before running out of money. Then we just need to find that $$M$$ which is the maximum value of cum.step. The maximum will be found using the ranked function. modelString = " model { for(i in 1:20) { Y[i] ~ dgamma(4, 0.04) } cum.cost[1] <- Y[1] for(i in 2:20) { cum.cost[i] <- cum.cost[i-1] + Y[i] } for(i in 1:20) { cum.step[i] <- i * step(1000 - cum.cost[i]) # 1,2,...,M,0,0,... } answer <- ranked(cum.step[],20) } " run.model(modelString, samples=c("answer", "Y[1]"), chainLength=50000) ## model is syntactically correct ## model compiled ## initial values generated, model initialized ## 5000 updates took 0 s ## monitor set for variable 'answer' ## monitor set for variable 'Y[1]' ## 50000 updates took 0 s samplesStats("Y[1]") # just to check if the mean and sd are ok (must be close to 100 and 50) ## mean sd MC_error val2.5pc median val97.5pc start sample ## Y[1] 100.2 49.95 0.2212 27.29 92.21 219.2 5001 50000 samplesStats("answer")$mean  # what we want to know!
## [1] 9.62

## Prediction eg

A hospital will begin a new high-risk treatment. Other hospitals that do it have a risk of death $$\theta$$ around $$10\%$$ and it will be quite surprisingly if it was less than $$3\%$$ and more than $$20\%$$ (say that this interval occupies $$90\%$$ of the probability mass). This is our prior information, which will be modelled by a convenient beta distribution:

library(rriskDistributions) # for finding distribution parameters

prior.dist <- get.beta.par(p=c(.05,.5,.95),q=c(.03,.1,.2), plot=FALSE, show.output=FALSE)
prior.shape1 <- prior.dist[["shape1"]]
prior.shape1
## [1] 3.420687
prior.shape2 <- prior.dist[["shape2"]]
prior.shape2
## [1] 28.46136
xs <- seq(0,1,len=100)
plot(xs, dbeta(xs,prior.shape1,prior.shape2), type="l", col="red", main="prior distribution")

Now suppose the hospital is expected to perform $$20$$ operations. How many deaths $$Y$$ should we expect? What is the probability of having at least $$6$$ deaths, ie, $$Pr(Y \ge 6)$$?

Our death rate is $$\theta \sim \text{Beta}(3.42,28.46)$$. The number of deaths are modeled by $$Y|\theta \sim \text{Binomial}(\theta,20)$$. Analytically, prediction of $$Y$$ is given by a $$\text{Beta-Binomial}(3.42,28.46,20)$$ which mean is $$\frac{20\times 3.42}{3.42+28.46} \approx 2.14$$. This is the expected number of deaths.

To compute $$Pr(Y \ge 6)$$:

library(VGAM)
## Loading required package: stats4
## Loading required package: splines
1-pbetabinom.ab(5,size=20,shape1=3.42,shape2=28.46) # Pr(Y >= 6) = 1 - Pr(Y <= 5)
## [1] 0.04663492

Now the respective simulation:

modelString = "
model {
theta ~ dbeta(3.42, 28.46)
Y ~ dbin(theta, 20)
P6 <- step(Y - 5.5)
}
"
run.model(modelString, samples=c("Y", "P6"), chainLength=20000)
## model is syntactically correct
## model compiled
## initial values generated, model initialized
## 2000 updates took 0 s
## monitor set for variable 'Y'
## monitor set for variable 'P6'
## 20000 updates took 0 s
samplesStats("Y") # mean should be close to 2.14
##    mean    sd MC_error val2.5pc median val97.5pc start sample
## Y 2.148 1.735  0.01206        0      2         6  2001  20000
samplesStats("P6")$mean # Pr(Y >= 6) ## [1] 0.0459 ## Conjugate inference The previous hospital executed $$n=10$$ high risk treatments and observed $$y=0$$ deaths. Using the prior for the risk of death $$\theta \sim \text{Beta}(3.42,28.46)$$ from the previous exercise, what will be the posterior distribution given this data? Also, what is the probability of the next patient dying? What is the probability of having 2 or more deaths in the next 20 operations? Using the conjugate functions, we know that for a prior $$\theta \sim \text{Beta}(a,b)$$ and a sampling distribution $$y|\theta,n \sim\text{Binomial}(\theta,n)$$ the posterior distribution is given by $$\theta|y,n \sim \text{Beta}(a+y,b+n-y)$$. In this case, the posterior distribution is $$\text{Beta}(3.42+0,28.46+10-0)=\text{Beta}(3.42,38.46)$$. Plotting the prior and posterior distributions together we see that the amount of uncertainty as diminished (we know more data now): xs <- seq(0,1,len=50) plot(xs,dbeta(xs,3.42,28.46), type="l", ylim=c(0,11)) points(xs, dbeta(xs,3.42,38.46), type="l", col="red") legend("topright", c("prior", "posterior"), col = c("black","red"), lwd=1)  With the posterior we can answer the next question: # Q: What is the probability of the next patient dying? # A: The mean of the posterior, in this case, X ~ Beta(a,b) => E[X] = a/(a+b) 3.43/(3.42+38.46) ## [1] 0.08190067 And we use the preditive function $$\text{Beta-Binomial}(3.42,38.46,20)$$ to answer the final question: # Q: What is the probability of having 2 or more deaths in the next 20 operations, Pr(Y>=2)? 1-pbetabinom.ab(1,size=20,shape1=3.42,shape2=38.46) # Pr(Y >= 2) = 1 - Pr(Y <= 1) ## [1] 0.4580711 Before this data, this same question would evaluate to $$58\%$$. But because the excellent first results at the hospital, the estimation is revised to just $$46\%$$ Now (surprise!) let’s do this using BUGS: modelString = " model { theta ~ dbeta(3.42, 28.46) # prior y ~ dbin(theta, n) # likelihood Y.pred ~ dbin(theta, 20) # predictive distribution Q3 <- step(Y.pred - 1.5) # = 1 if Y.pred >= 2, = 0 otherwise } " # We list the data we know, in this case, there were 0 deaths in 10 operations data.list = list( y = 0, n = 10 ) run.model(modelString, samples=c("theta", "Q3"), data=data.list, chainLength=50000) ## model is syntactically correct ## data loaded ## model compiled ## initial values generated, model initialized ## 5000 updates took 0 s ## monitor set for variable 'theta' ## monitor set for variable 'Q3' ## 50000 updates took 0 s samplesStats("theta")$mean  # the posterior risk of death in 20 trials
## [1] 0.08134
samplesStats("Q3")$mean # Pr(Y>=2) ## [1] 0.4554 BUGS adds the data from the data list into the model and propagate that information for the other nodes, including the nodes we care about (in this eg, theta and Q3). Note: if we need to input multidimensional data, use like this: list( X=structure(.Data=c(1,2,3,4,5,6), .Dim=c(2,3)) ) which defines a matrix with 2 rows and 3 columns (values are placed by row) Just to compare, let’s say we still do not have the previous data. Then the same answers would be computed by the following model (notice that there is no data list): modelString = " model { theta ~ dbeta(3.42, 28.46) # prior Y.pred ~ dbin(theta, 20) # predictive distribution Q3 <- step(Y.pred - 1.5) # = 1 if Y.pred >= 2, = 0 otherwise } " run.model(modelString, samples=c("theta", "Q3"), chainLength=50000) ## model is syntactically correct ## model compiled ## initial values generated, model initialized ## 5000 updates took 0 s ## monitor set for variable 'theta' ## monitor set for variable 'Q3' ## 50000 updates took 0 s samplesStats("theta")$mean  # the posterior risk of death in 20 trials
## [1] 0.107
samplesStats("Q3")$mean # Pr(Y>=2) ## [1] 0.5808 We got the expected $$10\%$$ risk of death (the prior information we had) and the $$58\%$$ that we already knew. ## Discrete parameters We have three coins in my pocket. One $$c_1$$ is biased 3:1 to tails, one $$c_2$$ if fair, and another $$c_3$$ is biased 3:1 to heads. I pick one coin at random and toss it observing heads. What the posterior distribution of the next toss is heads again? Each coin has a probability of being pick of $$1/3$$, i.e., $$p(c_i)=1/3, i=1,2,3$$. Let’s call $$\theta_i$$ the probability of tossing heads for coin $$c_i$$. The likelihood of tossing heads ($$y=1$$) or tails ($$y=0$$) is $$p(y|\theta_i) = \theta_i^y (1-\theta_i)^{1-y}$$. We can get the posterior $$p(\theta|y=1)$$ using Bayes’ theorem. So, $\begin{array}{cccc} \theta_i & p(\theta_i) & p(y=1|\theta_i) & p(\theta_i|y=1) \\ 0.25 & 0.33 & 0.25 & 0.167 \\ 0.50 & 0.33 & 0.50 & 0.333 \\ 0.75 & 0.33 & 0.75 & 0.5 \end{array}$ After observing a heads, the distribution changed accordingly. Eg, now there’s more chance that the coin we have in the hand in the head biased coin ($$50\%$$ chance). To compute the probability of the next toss $$y_1$$ is heads: $p(y_1=1|y=1) = \sum_i p(y_1=1|\theta_i) p(\theta_i|y=1) = 7/12 \approx 0.583$ With BUGS: modelString = " model { coin ~ dcat(p[]) # coin is modeled by a categorical distribution # ie, it outputs 1,2..n where the given vector p[] has the respective probabilities # in this eg it will output k=1,2 or 3 which corresponds to coin c_k y ~ dbern(true.theta) # the toss depends on the true coin we picked true.theta <- theta[coin] # this can be coin 1,2,3 each with its theta for(i in 1:3) { p[i] <- 1/3 # creating vector p = {1/3,1/3,1/3} theta[i] <- 0.25*i # creating vector theta = {0.25,0.50,0.75} coin.prob[i] <- equals(coin,i) # data for the posterior distribution } y1 ~ dbern(true.theta) # the outcome of the next toss } " data.list = list( y = 1 ) run.model(modelString, samples=c("coin.prob","y1"), data=data.list, chainLength=50000) ## model is syntactically correct ## data loaded ## model compiled ## initial values generated, model initialized ## 5000 updates took 0 s ## monitor set for variable 'coin.prob' ## monitor set for variable 'y1' ## 50000 updates took 0 s samplesStats("y1")$mean   # the probability that next coin is heads
## [1] 0.583
samplesStats("coin.prob") # the posterior probability
##                mean     sd MC_error val2.5pc median val97.5pc start sample
## coin.prob[1] 0.1679 0.3738 0.002207        0      0         1  5001  50000
## coin.prob[2] 0.3269 0.4691 0.002375        0      0         1  5001  50000
## coin.prob[3] 0.5051 0.5000 0.002896        0      1         1  5001  50000

Let’s say we had four tosses, with 3 heads and 1 tail, $$y$$ is now a vector of four positions:

modelString = "
model {
coin ~ dcat(p[]) # coin is modeled by a categorical distribution
# ie, it outputs 1,2..n where the given vector p[] has the respective probabilities
# in this eg it will output k=1,2 or 3 which corresponds to coin c_k

for (i in 1:nTosses) {
y[i] ~ dbern(true.theta)
}
true.theta <- theta[coin]         # this can be coin 1,2,3 each with its theta

for(i in 1:3) {
p[i] <- 1/3         # creating vector p = {1/3,1/3,1/3}
theta[i] <- 0.25*i  # creating vector theta = {0.25,0.50,0.75}
coin.prob[i] <- equals(coin,i)  # data for the posterior distribution
}

y1 ~ dbern(true.theta)  # the outcome of the next toss
}
"

data.list = list(
y = c(1,1,1,0),
nTosses = 4
)

run.model(modelString, samples=c("coin.prob","y1"), data=data.list, chainLength=50000)
## model is syntactically correct
## data loaded
## model compiled
## initial values generated, model initialized
## 5000 updates took 0 s
## monitor set for variable 'coin.prob'
## monitor set for variable 'y1'
## 50000 updates took 0 s
samplesStats("y1")$mean # the probability that next coin is heads ## [1] 0.6301 samplesStats("coin.prob") # the posterior probability ## mean sd MC_error val2.5pc median val97.5pc start ## coin.prob[1] 0.06468 0.2460 0.001521 0 0 1 5001 ## coin.prob[2] 0.34610 0.4757 0.002559 0 0 1 5001 ## coin.prob[3] 0.58920 0.4920 0.002967 0 1 1 5001 ## sample ## coin.prob[1] 50000 ## coin.prob[2] 50000 ## coin.prob[3] 50000 Notice that the order of heads and tails given in the data list is considered irrelevant because of the exchangeability assumption. A sequence of random variables that are iid conditional on some underlying distributional form is exchangeable. ## Non-conjugate eg In the hospital eg, we which to infer the risk of death $$\theta$$ having $$y$$ deaths after $$n$$ operations. The likelihood, up to a constant, is $$p(y|\theta) \propto \theta^y (1-\theta)^{n-y}$$ and we choose a non-conjugate normal prior given by the logistic transform of $$\theta$$: $\text{logit} \theta = \log \frac{\theta}{1-\theta} =\sim Normal(0,2.71)$ With this prior there’s no conjugate to find a well-behaved closed form of the posterior $$p(\theta|y,n)$$. BUGS to the rescue: modelString = " model { y ~ dbin(theta, n) # likelihood logit(theta) <- logit.theta logit.theta ~ dnorm(0, 0.368) # precision: 1/2.71 = 0.368 } " data.list = list( y = 10, # our data: 10 deaths after 100 operations n = 100 ) run.model(modelString, samples=c("theta"), data=data.list, chainLength=50000) ## model is syntactically correct ## data loaded ## model compiled ## initial values generated, model initialized ## 5000 updates took 0 s ## monitor set for variable 'theta' ## 50000 updates took 0 s samplesStats("theta") ## mean sd MC_error val2.5pc median val97.5pc start sample ## theta 0.108 0.03017 0.0001324 0.05636 0.1056 0.1739 5001 50000 thetaSample <- samplesSample("theta") samplesDensity("theta", mfrow = c(1, 1)) Let’s say we wish to fit a beta to this data: library(MASS) beta.fit <- fitdistr(thetaSample,"beta",list(shape1=1,shape2=1)) beta.fit ## shape1 shape2 ## 11.32464529 93.53647889 ## ( 0.07060012) ( 0.59461703) # let's draw & compare both densities samplesDensity("theta", mfrow = c(1, 1)) xs <- seq(0,1,len=100); points(xs, dbeta(xs,beta.fit$estimate[1],beta.fit$estimate[2]), type="l", add=TRUE) Sidenote: logit is the inverse of the sigmoid function; for a probability $$p$$, logit(p) means the log-odds of $$p$$, ie, log(p/(1-p)). Logit is the quantile function of the logistic distribution: logit <- function (x) log(x/(1-x)) logit(0.34) ## [1] -0.6632942 qlogis(0.34) ## [1] -0.6632942 ## Multi-parameter Model Assume the following model (described below in BUGS) and we wish to make a marginal inference about the unknown number of degrees of freedom: $p(d|y) = \int_\mu \int_r p(\mu,r,d|y) dr d\mu$ modelString = " model { ### Likelihood for (i in 1:n) { y[i] ~ dt(mu,r,d) } } " # first generate 100 dummy observations given some initial parameter values data.list = list( n=100, mu=0, r=1, d=4 ) run.model(modelString, samples=c("y"), data=data.list, chainLength=10000) ## model is syntactically correct ## data loaded ## model compiled ## initial values generated, model initialized ## 1000 updates took 0 s ## monitor set for variable 'y' ## 10000 updates took 0 s n <- 100 ys <- rep(NA,n) for(i in 1:n) { ys[i] <- samplesStats(paste0("y[",i,"]"))$mean
}
head(ys, n=12)
##  [1]  0.0136300 -0.0185600  0.0263600 -0.0029770 -0.0007505 -0.0066320
##  [7] -0.0282100  0.0113100 -0.0335200 -0.0200000 -0.0143100 -0.0166100

With this dummy dataset, let’s feed the model with some ‘vague’ priors and execute it again

modelString = "
model {
### Likelihood
for (i in 1:n) {
y[i] ~ dt(mu,r,d)
}
### Priors
mu ~ dnorm(gamma, precision)
r  ~ dgamma(alpha, beta)
d  ~ dcat(p[])
p[1] <- 0
for (i in 2:30) {p[i] <- 1/29}
### Values for 'vague' priors
alpha <- 0.001
beta  <- 0.001
gamma <- 0
precision <- 0.0001
}
"

data.list = list(
n=100, y=ys
)

# sometimes it is necessary to tell BUGS where to start, especially in cases
# of distributions with large variances, where BUGS might create inappropriate
# starting values or simply it's unable to produce a starting value.
# JAGS chooses a 'central' value which avoids this problem.

n.chains <- 3
mus <- c(0,1,2)  # more chains => these vectors should have more values
rs  <- c(1,1.2,2)
ds  <- c(10,15,20)

# Herein, it's used a closure with a local var 'i' that determines which chain
# is being initialized

genInitFactory <- function()  {
i <- 0
function() {
i <<- i + 1
list(
mu = mus[i],
r  = rs[i],
d  = ds[i]
)
}
}

run.model(modelString, samples=c("d"), data=data.list, chainLength=25000, init.func=genInitFactory(), n.chains=n.chains)
## model is syntactically correct
## data loaded
## model compiled
## Initializing chain 1:
## initial values loaded and chain initialized but another chain contain uninitialized variables
## Initializing chain 2:
## initial values loaded and chain initialized but another chain contain uninitialized variables
## Initializing chain 3:
## model is initialized
## 2500 updates took 1 s
## monitor set for variable 'd'
## 25000 updates took 10 s
samplesStats("d")
##    mean    sd MC_error val2.5pc median val97.5pc start sample
## d 19.49 6.858   0.1139        7     20        30  2501  75000
samplesDensity("d", mfrow = c(1, 1))

This eg was adapted from Probabilistic Programming and Bayesian Methods for Hackers which is an introduction of MCMC for Python using the library PyMC.

You are given a series of daily text-message counts from a user of your system. The data, plotted over time, appears in the chart below. You are curious to know if the user’s text-messaging habits have changed over time, either gradually or suddenly. How can you model this?

This is the data:

msg.data <- read.csv("txtdata.csv")[,1]
length(msg.data)
## [1] 73
head(msg.data)
## [1] 24  8 24  7 35 14
space <- 0.2
width <- 1
barplot(msg.data, names.arg=1:73, space=space, width=width, xlab="days", ylab="#messages")

We start modelling the number of messages, at a given day $$i$$, $$C_i$$, with a Poisson. We will assume that there will be one moment when this parameter will change. So, at a day $$\tau$$ the Poisson parameter will change from, say, $$\lambda_1$$ to $$\lambda_2$$. So,

$C_i \sim Poisson(\lambda_1), i \leq \tau$ $C_i \sim Poisson(\lambda_2), i > \tau$

If the data does not provide evidence for this assumption, we will expect that $$\lambda_1 = \lambda_2$$.

Since we don’t know the $$\lambda_i$$ parameters, we will need to include them in model as random variables. In this case, we use an exponential to model our uncertainty (we know that they must be positive), ie,

$\lambda_i \sim Exp(\alpha)$

where $$\alpha$$ is a hyperparameter (parameter of a parameter). Of course, we do not know $$\alpha$$ either, so we could also model it, but since its effect is smaller, we decide to make it equal to the inverse of the average of the data because

$\frac{1}{N} \sum_{i=1}^N C_i \approx E[\lambda | \alpha] = \frac{1}{\alpha}$

We still need to model $$\tau$$. Since we don’t have any extra information, let’s give equal probability for every day:

$\tau \sim DiscreteUniform(1,73)$

So, let’s write this all, BUGS way:

modelString = "
model {
tau ~ dcat(day[])       # the day that the Poisson parameter changes

lambda1 ~ dexp(alpha)   # the Poisson parameter before the change
lambda2 ~ dexp(alpha)   # the Poisson parameter after  the change

for(i in 1:N) {
day[i] <- 1/N         # each day has an a priori equal probability
lambdas[i] <- step(tau-i) * lambda1 + step(i-tau-1) * lambda2
msgs[i] ~ dpois(lambdas[i])
}

}
"

data.list = list(
alpha = 1 / mean(msg.data),
N = length(msg.data),
msgs  = msg.data
)

run.model(modelString, samples=c("tau", "lambda1", "lambda2"), data=data.list, chainLength=1e5)
## model is syntactically correct
## data loaded
## model compiled
## initial values generated, model initialized
## 10000 updates took 5 s
## monitor set for variable 'tau'
## monitor set for variable 'lambda1'
## monitor set for variable 'lambda2'
## 100000 updates took 49 s
samplesStats("tau")      # stats of the posterior for the phase change
##      mean    sd MC_error val2.5pc median val97.5pc start sample
## tau 43.26 2.356  0.02795       41     43        44 10001 100000
samplesStats("lambda1")  # stats of the posterior for the 1st interval
##          mean     sd MC_error val2.5pc median val97.5pc start sample
## lambda1 17.87 0.6681 0.003383    16.63  17.86     19.19 10001 100000
samplesStats("lambda2")  # stats of the posterior for the 2nd interval
##          mean    sd MC_error val2.5pc median val97.5pc start sample
## lambda2 22.69 1.016 0.009435    20.93  22.69     24.48 10001 100000
hist(samplesSample("tau"), prob=TRUE, breaks=c(0,40.5,41.5,42.5,43.5,44.5,75), xlim=c(39,46), main="tau")

hist(samplesSample("lambda1"), prob=TRUE, xlim=c(15,21), breaks=100, main="lambda1")

hist(samplesSample("lambda2"), prob=TRUE, xlim=c(19,26), breaks=100, main="lambda2")

So the model, after processing the data, proposes the following scenario:

barplot(msg.data, names.arg=1:73, space=space, width=width, xlab="days", ylab="#messages")
lines(((1:73)-.5)*(width+space),c(rep(17.9,44),rep(22.7,29)), lwd=2, col="red")

## Mixture of Priors

We have a coin that might the biased or fair. We assume a prior probability of 0.9 of being fair. We observe 15 heads out of $$n=20$$ tosses. What is the posterior probability of being biased?

For two prior distributions, $$p_1, p_2$$, the overall distribution is a mixture:

$p(\theta) = q \times p_1(\theta) + (1-q) \times p_2(\theta)$

where $$q$$ is the assessed probability.

After observing data $$y$$:

$p(\theta|y) = q' \times p_1(\theta|y) + (1-q') \times p_2(\theta|y)$

where:

$p_i(\theta|y) \propto p(y|\theta) p_i(\theta)\\ q' = \frac{q~p_1(y)}{q~p_1(y) + (1-q)~p_2(y)}\\ p_i(y) = \int p(y|\theta) p_i(\theta) d\theta$

modelString = "
model {
heads ~ dbin(p, n) # Likelihood

p <- theta[pick] # p might be biased (pick=2) or unbiased (pick=1)

pick ~ dcat(q[]) # pick=i => prior assumption i was selected
q[1] <- 0.9      # pick is modeled by a categorical distribution
q[2] <- 0.1      # ... with these values

theta[1] <- 0.5       # theta for the fair coin
theta[2] ~ dunif(0,1) # vague prior for the biased coin

biased <- pick-1  # 1 if biased, 0 if unbiased
}
"

data.list = list(
run.model(modelString, samples=c("biased","theta[2]"), data=data.list, chainLength=50000)
## model is syntactically correct
## data loaded
## model compiled
## initial values generated, model initialized
## 5000 updates took 0 s
## monitor set for variable 'biased'
## monitor set for variable 'theta[2]'
## 50000 updates took 0 s
biasedSample <- samplesSample( "biased" )
draw.hist(biasedSample, "biased coin?")