Saturday, November 3, 2012

Guest post: Functional Programming in R with an HMM example

Below is a guest post by Chris Glazner, who works on his Ph.D. in Statistics with Elizabeth Thompson here at UW. I especially recommend checking out Chris' slick example of using Reduce for HMM forward-backward algorithm.

Chris writes:

I spent some time over the summer looking at ways of using functional programming to implement the statistical methods in my dissertation research. Functional programming is a style which treats functions as the basic building block of computations. The flow of a program is expressed as the composition of functions, rather than a sequence of steps which manipulate variables. Programming this way tends to feel a little more like doing math, so I've found it to be a good fit for statistical programming. If you can describe your program as a function between sets (say, from a data vector to a confidence interval) you've already figured out the overall structure of a functional program to implement it. 
R is a functional language, meaning that functions can be given names, created anonymously, and passed to other functions. Functions whose inputs or outputs are themselves functions are called "higher order functions". A number of higher order functions are included in the R base library, including the popular apply family: apply, sapply, lapply, and mapply. These functions take a function and a list or vector and apply the function to every element of the list or vector. They replace for loops that look like this:  
for (i in 1:length(x)) f(x[i])  
Created by Pretty R at inside-R.org
where f is any suitable function. Replacing a loop like this with a function call simplifies code and makes it easier to understand.
Other higher order functions express other common patterns. Reduce captures the behavior of an accumulating loop like: 
y <- startValue
for (i in 1:length(x)) y <- g(y,x[i]) 
Created by Pretty R at inside-R.org 
This pattern may not seem as general as apply, but it encompasses many common calculations. If g is `+` and startValue is 0, then the call to Reduce is equivalent to sum(x). If g is `*` and startValue is 1, we have product(x). Reduce is known by many names in other languages and is a powerful abstraction for expressing computations. 
A nice application of Reduce arises in the Hidden Markov Models (HMMs). An HMM is an unobserved Markov chain along with data which are generated independently according to the hidden Markov state at each step. The goal is usually to infer something about the hidden process from the data. The standard method for calculating the marginal probabilities of the hidden states at each step is known as the forward-backward algorithm, which is explained in a nice tutorial by LR Rabiner. We can use Reduce to express this algorithm concisely. 
As an example we will implement the simple Occasionally Dishonest Casino model described in a classic Biologial Sequence Analysis book by Durbin, Eddy, Krogh, and Mitchison. This casino has a fair die and a biased die which rolls three 50% of the time. We observe repeated rolls of a single die, but we never know which one. The casino switches the dice according to a two-state Markov chain. We want to infer which die is in use at a given time. 
The model is fully specified by the Markov transition matrix, the emission distribution of the data conditional on the hidden state, and the initial distribution. Since this is R, we will specify these using vectors and matrices, but we could use functions if we wanted.
tMat <- t(matrix(c(0.95, 0.05 # fair -> biased
                 , 0.10, 0.90 # biased -> fair
                 ),2,2))
 
eMat <- cbind( rep(1/6,6)            # fair
              ,c(.1,.1,.5,.1,.1,.1)) # biased
initDist <- c(0.5,0.5)
Created by Pretty R at inside-R.org 
Now we need two functions to execute the forward and backward passes. These can be concisely expressed in matrix notation: 

fwd <- function(distr,obs) normalize(( t( tMat ) %*% distr ) * eMat[obs,] )
backwd <- function(obs,distr) normalize( tMat %*% ( distr * eMat[obs,] ))
normalize <- function(x) x/sum(x)
Created by Pretty R at inside-R.org 
The fwd function takes two inputs, a distribution on the hidden state space and the observation at the current node, and returns another distribution. Reduce uses the output of one call as the distr argument to the next call, with the obs argument taken from the list of data observations. We set the accum option to Reduce so that it collects the intermediate results from each call. Assuming the data are in a vector called dieRolls, we write 
dieRolls <- c(3,3,3,4,4,6,2,3,4)
forwardProbs <- Reduce(fwd,dieRolls,init=initDist,acc=T)
backwardsProbs <- Reduce(backwd,dieRolls,init=normalize(c(1,1)),acc=T,right=T)
marginalProbs <- mapply(`*`,forwardProbs,backwardsProbs)
Created by Pretty R at inside-R.org 
Since the backwards algorithm starts at the end of the data vector, we use the right argument (and switch the order of the arguments to backwd). The initial values for both passes are as described in the Rabiner tutorial. The higher order function mapply is used to combine the resulting lists with pointwise multiplication. 
The Rabiner tutorial presents this algorithm in terms of loops over arrays. This style is easy to translate into computer code, but it doesn't provide any intuition as to what is happening; I have a hard time connecting the abstract statistical ideas to the implementation. In contrast, Reduce expresses the recursive nature of the dynamic programming problem. Coding the algorithm becomes part of the research process, rather than a chore. Of course, some people can grasp the loops and arrays just fine, but if you are like me and have spent more time taking math and statistics classes than programming, you might want to check out functional programming.