**Problem formulation**: given a rooted binary phylogenetic tree with branch length and a continuous-time Markov chain model of discrete character evolution (e.g. DNA, amino acids, morphological traits), compute the probability of observing certain characters at the tips of the tree. I am going to stop short of actually computing this probability. Instead, I will compute what's called partial likelihoods for each node in the tree. The actual likelihood is a weighted sum of the partial likelihoods at the root of the phylogeny.

**C++ code with inline compilation**:

require(inline) require(RcppArmadillo) src <- ' /* Arguments: tE: edge matrix of the ape object phylo nIN: number of internal nodes (not necessary, but convinient to get it from phylo) tS: integer vector of tip states (-1=missing value) nS: state space size (e.g. 2 for a binary trait) pM: array of probability matrices for each edge of the tree Two important assumptions: 1. edges in the edge matrix and probability matrices are in the "pruningwise" order;

` see ?reorder.phylo for more details`

2. tip state vector is ordered according to the tip numbering in the edge matrix */ using namespace Rcpp; IntegerMatrix treeEdges(tE); // get the number of edges int numEdges = treeEdges.nrow(); int numIntNodes = as<int>(nIN); IntegerVector tipStates(tS); int numStates = as<int>(nS); NumericVector vecProbMat(pM); arma::cube cubeProbMat(vecProbMat.begin(), numStates, numStates, numEdges, false); // get the number of tips in the tree int numTips = tipStates.size(); // prepare a matrix for storing regular (backward) partial likelihoods arma::mat partialLike = arma::zeros<arma::mat>(numTips + numIntNodes, numStates); for (int i=0; i < numTips; i++){ if (tipStates[i] == -1){// -1 denotes a missing value partialLike.row(i) = arma::ones<arma::rowvec>(numStates); }else{ partialLike(i, tipStates[i]) = 1.0; } } // compute regular partial likelihoods for all internal nodes for (int i=0; i < numEdges; i+=2){ // parent=treeEdges(i,0) or treeEdges(i+1,0); treeEdges indices should be shifted by one partialLike.row(treeEdges(i,0)-1) = (partialLike.row(treeEdges(i,1)-1)*cubeProbMat.slice(i).t())%(partialLike.row(treeEdges(i+1,1)-1)*cubeProbMat.slice(i+1).t()); } return wrap(partialLike); ' partLike = cxxfunction(signature(tE="integer", nIN="integer", tS="integer", nS="integer", pM="numeric"),body=src, plugin="RcppArmadillo")

```
```

I am going to test the code with a simple two state model, for which transition probabilities are available in closed form:

two.state.trans.prob = function(forward.rate, backward.rate, elapsed.time){ total.rate = forward.rate + backward.rate return((matrix(c(rep(backward.rate,2),rep(forward.rate,2)),2,2) + matrix(c(forward.rate, -backward.rate, -forward.rate, backward.rate),2,2)* exp(-total.rate*elapsed.time))/total.rate) }

**R wrapper function that takes care of the prerequisites**:

two.state.part.like = function(my.tree, my.data, forward.rate, backward.rate){ ## reorder the edges in the "pruningwise" order my.tree = reorder(my.tree, order = "pr") if (!("phylo" %in% class(my.tree))) stop("Error: object \"my.tree\" is not of class \"phylo\"") if (is.null(my.tree$edge.length)) stop("Error: tree \" my.tree\" must have branch lengths.") ## reorder data on tips to match the order of the my.tree phylo object if (!is.null(names(my.data))) { if(!any(is.na(match(names(my.data), my.tree$tip.label)))){ my.data = my.data[my.tree$tip.label] }else{ warning('the names of argument "my.data" and the names of the tip labels did not match: the former were ignored in the analysis.') } } ## prepare transition probability matrices (this of course can and should be done in C++ as well) prob.array = array(0, dim=c(2,2,length(my.tree$edge.length))) for (i in 1:length(my.tree$edge.length)){ prob.array[,,i] = two.state.trans.prob(forward.rate, backward.rate, my.tree$edge.length[i]) } return(partLike(my.tree$edge, my.tree$Nnode, my.data, 2, prob.array)) }

```
```

**Testing:**

First, let's simulate a tree with tip states. This can be done by several packages in R. I am going to use diversitree package:

This produces the following tree:

The above agreement is encouraging.

## No comments:

## Post a Comment