Stochastic Models, Statistics, and Computation: October 2013

Monday, October 14, 2013

Passing Armadillo matrices by value and by reference

Rcpp and RcppArmadillo make it relatively easy to integrate R code and C++ code. In fact the resulting C++ code looks so much as R code that it is easy to forget some basic rules by which a C++ coder must play. This is especially true for someone like me, who wrote relatively little C++ code in the past.

Here is an example of a typical calculation that may get one in trouble. Suppose you have a function that takes two matrices from R, converts them into Armadillo matrices and passes these functions to other functions to do some repetitive calculations. For example, imagine passing transition and emission probability matrices to execute the forward-backward algorithm on multiple realizations of the same hidden Markov model.

Below you see my C++ code, saved in the file "matrix_code.cpp" that has two sets of functions: a) doStuff1(), twoMatrices1(); b) doStuff2(), twoMatrices2(). The functions twoMatrices1() and twoMatrices2() are exported to R via Rcpp attributes. The twoMatrices1() calls doStuff1() 100 times, while twoMatrices2() calls doStuff2() 100 times. The only difference between the two sets of functions is in that doStuff1() passes two Armadillo matrices by value, which requires copying these matrices. In contrast, doStuff2() passes Armadillo matrices by reference without calling copy-constructor under the hood.

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <Rcpp.h>

using namespace Rcpp;

// First passing Armadillo matrices by value
void doStuff1(arma::mat c, arma::mat d){
double x = c(1,1) + d(1,1);
}

// [[Rcpp::export]]
void twoMatrices1(NumericMatrix A, NumericMatrix B) {
arma::mat M1(A.begin(), A.nrow(), A.ncol(), false);
arma::mat M2(B.begin(), B.nrow(), B.ncol(), false);

for (int i = 0; i < 100; i++){
doStuff1(M1, M2);
}
}

// Now passign Armadillo matrices by reference
void doStuff2(const arma::mat& c, const arma::mat& d){
double x = c(1,1) + d(1,1);
}

// [[Rcpp::export]]
void twoMatrices2(NumericMatrix A, NumericMatrix B) {
arma::mat M1(A.begin(), A.nrow(), A.ncol(), false);
arma::mat M2(B.begin(), B.nrow(), B.ncol(), false);

for (int i = 0; i < 100; i++){
doStuff2(M1, M2);
}
}

Now I am going to call twoMatrices1() and twoMatrices2() from R, using rbenchmark package to compare their speed. I am going to use fairly large matrices (100x100), so the cost of moving them around is significant.

library(RcppArmadillo)
library(rbenchmark)
 
sourceCpp("matrix_code.cpp")
 
x = matrix(1,100,100)
y = matrix(1,100,100)
 
res <- benchmark(twoMatrices1(x, y), twoMatrices2(x, y), 
                 columns = c("test", "replications", "elapsed", "relative"), 
                 order="relative", replications=1000)
print(res)
 
                test replications elapsed relative
2 twoMatrices2(x, y)         1000   0.006    1.000
1 twoMatrices1(x, y)         1000   0.829  138.167

Created by Pretty R at inside-R.org

As you can see, passing two 100x100 Armadillo matrices by value is more than 100 times slower than passing these matrices by reference. Perhaps not a very surprising outcome, but an important one to keep in mind when porting R code to C++.

Friday, October 11, 2013

Dissertation Acknowledgments

Jonathan Eisen posted his dissertation acknowledgments. I think this is a great idea. Maybe it is just me, but when I read someone's dissertation, not necessarily as a committee member, I really enjoy reading the acknowledgements. For me, these personal comments place the work in the context of "real life" and personal relationships.

Here are my acknowledgments:

First and foremost, I would like to thank my mom and dad for creating many opportunities for my intellectual growth. Feeling my parents' constant love and support makes my scientific journey so much more meaningful! I am very thankful to my middle and high school mathematics teachers, Maya Radionovna Matihina and Svetlana Efimovna Gaydamak, for introducing me to the beautiful world of mathematical reasoning and abstract thinking. I am very grateful to my undergraduate advisor, Galina Evgenievna Samkova, for patiently listening to my naive ideas and taking seriously my modest attempts to produce new mathematical results. Special thanks goes to Stephen Krone, who hooked me onto mathematical biology by regularly taking me to the bioinformatics seminars at the University of Idaho. My first scientific collaborators, Zaid Abdo and Paul Joyce, became my very good friends and influenced my scientific career more than they probably realize.

UCLA Department of Biomathematics has been my second home for the last four years. I would like to thank the departmental staff, David Tomita, Martha Riemer, Wendy Hagar, and Fred Hughes, for taking care of administrative hurdles so well. During my first years at UCLA, I received a lot of advice and guidance from my fellow Biomathematics students. I am very grateful to Robert Rovetti for being a wonderful colleague and a good friend. Robert thoroughly read and constructively criticized Chapters 3 and 4 of this dissertation. Intellectual discussions with Benjamin Redelings, John O'Brien, and Alex Alekseyenko were certainly worth inhaling cigarette smoke and drinking really bad coffee from vending machines.

All members of my dissertation committee played important roles in my graduate school development. I enjoyed collaborating with Christina Kitchen, who is very passionate about both statistics and biology. I am grateful to Janet Sinsheimer for giving me many opportunities to assist her in graduate and undergraduate level teaching. Elliot Landaw provided me with excellent academic guidance at the beginning of my studies and was extremely supportive throughout all these years. Ken Lange has frequently given me always timely and wise pieces of advice. Chapter 6 would not be possible without
his enthusiasm about Markov chain induced counting processes and his encouragement to pursue this line of research.

The chair of my dissertation committee, Marc Suchard, is the greatest advisor one could wish for! I am very grateful to Marc for introducing me to interesting problems and for not revealing solutions to these problems, giving me an opportunity to become an independent researcher. It is very difficult to summarize everything I have learned from Marc in one paragraph. Perhaps the most important lesson that Marc taught me is how not to be afraid: not to be afraid to attack challenging problems, not to be afraid to seek rigorous solutions for these problems, and not to be afraid to go somewhere no one has gone before.

This dissertation would never be finished without support of my wife, Yuliya, who patiently tolerated my "writing weekends'' and many other strange habits of a finishing Ph.D. student.