Refs:

Independent Component Analysis (ICA) is a signal processing technique that tries to unmix two different signals that were collected together. One instance is the cocktail problem: two different sound sources, say a music background and a conversation were recorded by two different microphones placed on distinct locations. We wish to extract the music and the conversation into different tracks so that we can listen to both. These type of problems where we wish to separtate mixed sources is usually called blind source separation (BSS). ICA is a tool to solve BSS problems where the signal was produced by a linear mixture.

ICA assumes the each received signals \(X = (X_1, X_2, \ldots)\) are an unknown linear mixture \(A\) of the unknown sources \(S\):

\[X = A.S\]

being \(A\) a full rank matrix, ie, \(A\) as an inverse.

ICA also assumes that the signals are mutually independent, \(p(S_i, S_j) = p(S_i)p(S_j)\), and that the signals \(S_i\) do not come from a Gaussian source, which is necessary because ICA requires that the kurtosis must be different than zero (which does not happen in the Gaussian distribution).

The goal of ICA is to find an approximation \(W\) of \(A^{-1}\) such that:

\[\hat{S} = WX \approx S\]

ICA is a generative model since the model describes how \(X\) could be generated from \(A\) and \(X\).

ICA tries to find \(A\) by estimating the matrices of its SVD decomposition:

\[A = U \Sigma V^T\]

so \(W\) should be, ideally,

\[W = A^{-1} = V \Sigma^{-1} U^T\]

since the inverse of a rotation matrix is its transpose (\(V^{-1}=V^T\)), and, since \(A\) is invertible by assumption, the inverse of \(\Sigma\) exists.

ICA goes in three steps:

  1. Examine the covariance of the data to estimate \(U^T\)

  2. Also using the covariance, it estimates \(\Sigma^{-1}\)

  3. Finally, it uses the independence and kurtosis assumptions to estimate \(V\)

Let’s see how to compute these steps assuming two 2D signals \(X_1\) and \(X_2\) both with mean zero.

The first step computes \(U\), which is a rotation matrix by an angle \(\theta\), in order to maximize the variance

\[Var(\theta) = \sum_{j=1}^N (X_1(j) \cos \theta + X_2(j) \sin \theta)^2\]

By making its derivative go zero, ie, \(dVar/d\theta = 0\) and solving it for \(\theta\), we get a maximum at:

\[\theta_0 = \frac{1}{2} \tan^{-1} \Big[ \frac{-2 \sum_j X_1(j) X_2(j)}{\sum_j X_2^2(j) - X_1^2(j)} \Big]\]

So, matrix \(U^T\) is

\[U^T = \left[ \begin{array}{rr} \cos \theta_0 & \sin \theta_0\\ -\sin \theta_0 & \cos \theta_0 \end{array} \right]\]

So the principal component axes are \(\theta_0\) and \(\theta_0+\pi/2\) (\(X_i\) are 2D) and their variances are:

\[\sigma_1 = \sum_{j=1}^N \big(X_1(j) \cos \theta_0 + X_2(j) \sin \theta_0\big)^2\]

\[\sigma_2 = \sum_{j=1}^N \big(X_1(j) \cos (\theta_0+\frac{\pi}{2}) + X_2(j) \sin(\theta_0+\frac{\pi}{2}\big)^2\]

This makes step 2:

\[\Sigma^{-1} = \left[ \begin{array}{cc} 1/\sqrt{\sigma_1} & 0 \\ 0 & 1/\sqrt{\sigma_2} \end{array} \right]\]

For step 3 we need to take this expression of normalized kurtosis,

\[K(\theta) = \frac{1}{\overline{X}_1^2+\overline{X}_2^2} \sum_{j=1}^N (X_1(j) \cos \theta + X_2(j) \sin \theta)^4\]

where \(\overline{X}_i^2\) are the data after the transformation of \(U\) and \(\Sigma^{-1}\).

We want to make it as small as possible to try to approximate the independence assumptions of the data. The angle \(\phi_0\) that minimizes \(K(\cdot)\) is given below in the code example (equation too large for my current laziness to write it in latex).

The rotation matrix \(V\) is then given by:

\[V = \left[ \begin{array}{rr} \cos \phi_0 & \sin \phi_0\\ -\sin \phi_0 & \cos \phi_0 \end{array} \right]\]

Putting all together, the reconstructed signal, \(\hat{S}\) is

\[\hat{S} = \left[ \begin{array}{rr} \cos \phi_0 & \sin \phi_0\\ -\sin \phi_0 & \cos \phi_0 \end{array} \right] \left[ \begin{array}{cc} 1/\sqrt{\sigma_1} & 0 \\ 0 & 1/\sqrt{\sigma_2} \end{array} \right] \left[ \begin{array}{rr} \cos \theta_0 & \sin \theta_0\\ -\sin \theta_0 & \cos \theta_0 \end{array} \right] \]

Eg: unmixing images

Let’s see an example where \(X_1\) and \(X_2\) are two images

# to install: source("http://bioconductor.org/biocLite.R"); biocLite("EBImage")
library("EBImage")

# plot a picture
show_pic <- function(pic) {
  dims <- dim(pic)
  plot(c(0, dims[1]), c(0, dims[2]), type='n', xlab="", ylab="")
  rasterImage(pic,0,0,dims[1],dims[2])        # present image
}

# plot two pictures side by side
show_both_pics <- function(pic1, pic2) {
  par(mfrow=c(1,2))
  show_pic(pic1)
  show_pic(pic2)
}

S1 <- readImage("sicily1.jpeg")
S2 <- readImage("sicily2.jpeg")
show_both_pics(S1, S2)

Let’s mix both images

A <- matrix(c(0.80, 0.20,
              0.5, 0.67), ncol=2, byrow=TRUE)

X1 <- A[1,1]*S1 + A[1,2]*S2
X2 <- A[2,1]*S1 + A[2,2]*S2
show_both_pics(X1, X2)

Step 1 is to find the angle with maximal variance:

x1v <- scale(as.vector(X1), center=TRUE, scale=FALSE)   # make it a vector and center data around mean
x2v <- scale(as.vector(X2), center=TRUE, scale=FALSE)

theta0 <- 0.5*atan( -2*sum(x1v*x2v) / sum(x1v^2-x2v^2) ); # compute 1st principal direction
theta0 # in radians
## [1] 0.7081695
Us <- matrix(c( cos(theta0), sin(theta0), 
               -sin(theta0), cos(theta0)), ncol=2,byrow=TRUE)

Step 2 is finding the scaling of the principal components:

sigma1 <- sum( (x1v*cos(theta0)     +x2v*sin(theta0))^2 )
sigma2 <- sum( (x1v*cos(theta0-pi/2)+x2v*sin(theta0-pi/2))^2 )
Sigma_inv <- matrix(c(1/sqrt(sigma1),       0, 
                          0,          1/sqrt(sigma2)), ncol=2, byrow=TRUE)

Step 3 is the rotation to separability:

X1bar <- Sigma_inv[1,1]*(Us[1,1]*X1+Us[1,2]*X2);
X2bar <- Sigma_inv[2,2]*(Us[2,1]*X1+Us[2,2]*X2);

x1vbar <- as.vector(X1bar)
x2vbar <- as.vector(X2bar)

phi0 <- 0.25*atan( -sum(2*(x1vbar^3)*x2vbar-2*x1vbar*(x2vbar^3)) / 
                    sum(3*(x1vbar^2)*(x2vbar^2)-0.5*(x1vbar^4)-0.5*(x2vbar^4)))

V <- matrix(c( cos(phi0), sin(phi0), 
              -sin(phi0), cos(phi0)), ncol=2,byrow=TRUE)

S1_hat <- normalize(V[1,1]*X1bar+V[1,2]*X2bar) # normalization to recover initial contrast
S2_hat <- normalize(V[2,1]*X1bar+V[2,2]*X2bar)

show_both_pics(S1_hat, S2_hat)

The second picture was retrieved quite nicely. The upper part of the first picture however didn’t. There is a possible problem: we are finding zeros of the derivative without checking if they are really minimums and maximums. This is especially problematic for the kurtosis being a maximization of a quartic formula, which most probably returns a local minima or inflexion point. We would need to find all zeros and select the one with minimum kurtosis. However, in this case, we just apply R’s optim.

Step 1 with optim:

var_theta <- function(theta) { sum( (x1v*cos(theta)+x2v*sin(theta))^2 ) }

theta_optim <- optim(pi/4, var_theta, control=list(fnscale=-1), method="CG") # maximize

var_theta(theta0)
## [1] 46730.37
var_theta(theta_optim$par) # it's larger!
## [1] 47726.29
theta0 <- theta_optim$par
Us <- matrix(c( cos(theta0), sin(theta0), 
               -sin(theta0), cos(theta0)), ncol=2,byrow=TRUE)

Step 2:

sigma1 <- sum( (x1v*cos(theta0)     +x2v*sin(theta0))^2 )
sigma2 <- sum( (x1v*cos(theta0-pi/2)+x2v*sin(theta0-pi/2))^2 )
Sigma_inv <- matrix(c(1/sqrt(sigma1),       0, 
                          0,          1/sqrt(sigma2)), ncol=2, byrow=TRUE)

Step 3:

X1bar <- Sigma_inv[1,1]*(Us[1,1]*X1+Us[1,2]*X2);
X2bar <- Sigma_inv[2,2]*(Us[2,1]*X1+Us[2,2]*X2);

x1vbar <- as.vector(X1bar)
x2vbar <- as.vector(X2bar)

kurtosis_theta <- function(theta) { sum( (x1v*cos(theta)+x2v*sin(theta))^4 ) }

kurtosis_optim <- optim(pi/4, kurtosis_theta, method="CG") # minimize (pi/4 just an initial value)

kurtosis_theta(phi0)
## [1] 785.1369
kurtosis_theta(kurtosis_optim$par) # it's smaller!
## [1] 141.2477
phi0 <- kurtosis_optim$par

V <- matrix(c( cos(phi0), sin(phi0), 
              -sin(phi0), cos(phi0)), ncol=2,byrow=TRUE)

S1_hat <- normalize(V[1,1]*X1bar+V[1,2]*X2bar) # normalization to recover initial contrast
S2_hat <- normalize(V[2,1]*X1bar+V[2,2]*X2bar)

show_both_pics(S1_hat, S2_hat)

And the results are much better!

Using fastICA package

library("fastICA")

data <- cbind(X1, X2)
model <- fastICA(data, n.comp=2)

S1_hat <- model$S[,1]
dim(S1_hat) <- dim(X1)
S2_hat <- model$S[,2]
dim(S2_hat) <- dim(X2)

par(mfrow=c(1,2))
show_pic(normalize(transpose(S1_hat)))
show_pic(normalize(transpose(S2_hat)))

This is so much worse that the fault is probably mine for not understand how to use it…