This page contains source code relating to chapter 14 of Bishop’s Pattern Recognition and Machine Learning (2009)

This chapter is about combining models

Bagging (Section 14.2)

A committee is a combination os models. The simplest way to construct a committee is to average a set of different models. Ideally, we would have \(M\) datasets and \(M\) models to train, but usually we just have one dataset.

One approach to this problem is to bootstrap the dataset, ie, resampling the original dataset into the required \(M\) datasets.

d <- iris
head(d)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

M <- 12
bootstrap_samples <- replicate(M,sample(1:nrow(d), nrow(d), replace=TRUE))
head(bootstrap_samples)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,]   51   36    8   75   95   31  137   30   62    27   133    57
## [2,]   47   29  113   76   53   37   83  101   77   106    70   131
## [3,]    7   19   16  108   32   92   86  131  108    52    18    36
## [4,]  123   35   34  127   94   76   52   57  107    63    21     8
## [5,]   84  119   50   18   45  128  139  109  111   142    77   103
## [6,]   12   98   80  113   81  142   32  139  121   105   122   131

The columns of bootstrap_Samples are resamples of the original dataset (in this case, each column keeps the indices that refer to the original dataset). They are going to be used to train \(M\) models, \(y_1, \ldots, y_m\) (herein, we will use linear regression has an eg):

# models is a list with M linear regression models
models <- apply(bootstrap_samples, 2, 
                function(indices) lm(Petal.Length ~ Sepal.Length, data=d[indices,]))

THe committee prediction is given by

\[y_{com}(x) = \frac{1}{M} \sum_{m=1}^M y_m(x)\]

This technique is called bagging:

bagging <- function(models, vals) {
  M     <- length(models)
  # each column (one per value in vals) will have M predictions
  preds <- t(sapply(1:M, function(m) predict(models[[m]], vals)))
  apply(preds, 2, mean)
}

Let’s predict the bagged linear regression:

xs <- seq(4,8,len=50)
ys <- bagging(models, data.frame(Sepal.Length=xs))

plot(d$Sepal.Length, d$Petal.Length, pch=19)
points(xs, ys, type="l", col="red", lwd=2)

Boosting (section 14.3)

**Boosting* is a more advanced technique to form a committee. Here the \(M\) models are trained in sequence, they depend on the results of the previous one. Datapoints that were wrongly classified will get more weigth in the next classification. After all the classifications end, we will do the mean of the models, as before.

The next version is called AdaBoost (for adaptive boosting):

# Adaboost using logistic regression (for classification, 2 classes named 0 and 1)
# Just serves as an eg. To use an efficient adaboost check library 'adabag'
lg_adaboost <- function(dataset, ab_formula, M) {
  N <- nrow(dataset)
  w <- rep(1/N, N)  # weights
  models  <- list()
  alpha_m <- rep(NA,M)
  
  for(m in 1:M) {
    models[[m]] <- 
      eval(substitute(
        glm(ab_formula, family = binomial("logit"), data=dataset, weights=w)
      ), list(w=w)) # weights are evaluated like variables in formula [glm's help]
    
    # get current classification
    pred <- predict(models[[m]], dataset[,-ncol(dataset)], type="response")
    pred <- (pred > 0.5)+0  # round to 0 and 1
    
    # updating weights (cf Bishop's pg. 658)
    indicators <- 0+(pred!=dataset[,ncol(dataset)])  # I(y_m(x_n) != t_n)
    epsilon_m  <- sum(w * indicators) / sum(w)
    alpha_m[m] <- log((1-epsilon_m)/epsilon_m)

    w <- w * exp(alpha_m[m] * indicators)
  }
  
  list(models=models, alpha_m=alpha_m)
}

The function lg_adaboost returns the models \(y_m\) and a vector of weighting coefficients \(\alpha_m\) which are used to define the committee prediction:

\[y_{com}(x) = \text{sign}\left( \sum_{m=1}^M \alpha_m y_m(x) \right)\]

# prediction of dataset using a committee produced by function 'lg_adaboost'
pred_adaboost <- function(committee, dataset) {
  N <- nrow(dataset)
  M <- length(committee$models)
  
  committee_vote <- function(row) {
    preds <- sapply(1:M, function(m) committee$alpha_m[m] * 
                                     predict(committee$models[[m]], row))
    (sum(preds)>0.5)+0 # return sign( sum(...) )
  }
  
  sapply(1:N, function(n) committee_vote(dataset[n,]))
}

Let’s check the performance with a two class dataset:

dataset <- iris[51:150,]                    # get just two flowers from iris dataset
dataset[,5] <- (dataset[,5]=="virginica")+0 # convert classes to 0 and 1

committee <- lg_adaboost(dataset, ab_formula=as.formula(Species ~ .), M)

# compare true classification with prediction by the committee:
table(dataset[,5], pred_adaboost(committee, dataset))

##    
##      0  1
##   0 48  2
##   1  3 47

Bishop’s PRML, Chapter 14

June, 2015

Bagging (Section 14.2)

Boosting (section 14.3)