Banging my head on the keyboard: MCMC in c++ part 1

If you read and followed through with the example of my last blog post, you should have a folder named “pleasework” with an src folder inside containing a c++ file called rcpp_hello_world.cpp containing 2 (or 3 functions if you didn’t edit the function main() directly) functions called rcpp_hello_world() and adding(). I’m going to show how to do a basic MCMC in c++ using for loops that can be called into R. First let’s set up a function that will just print the number of iterations to get the loop set up, which we’ll do with the following function:

pic1

Note: We also have to include the package time.h, which allows us to specify a variable as a time with the following line

#include <time.h>

Let’s talk about the code above, but first save your file, source it (remember your path is DEFINITELY different from mine) and run the function. It will iterate and print every 10,000 iterations along with the time that the for loop reached that iteration.

pic2

How cool right! This will keep track of what iteration we’re on, but there’s no need for this. Ok now let’s talk about the coding line for line, but first I want to note that the lines that start with // are comments, much like using % in R. Here it is again below:

pic1

void mcmc(int B){ } tells c++ that we are creating a new function called mcmc that takes a user value B that is an integer. The void term in front of the function mcmc tells c++ that the function mcmc will NOT RETURN ANYTHING.

time_t now; defines a variable now that is set equal to the current system time

int M; initializes an integer variable called M that we will use to execute the for loop.

for(M = 0; M < B; M++) { } sets up the for loop, with the { } containing what to do at each iteration. We set M=0 and the statement M<B; says that the for loop will terminate when M=B. Finally M++ indicates that after each loop, M will be set to M+1. This is how the for loop keeps track of what iteration we’re on. Even though I won’t use it here (since it doesn’t make as much sense), we could have done this for loop from B to 0 using the command for(M = B; M >0; M–) { } will start at M=B and go backwards to 0 because M—indicates that at each iteration in the loop, c++ will set M=M-1.

if( ( (M+1) % 10000 ) == 0) { }  is our first example of an if statement, which as any experienced MCMC-er knows is used in any metropolis or metropolis hastings steps. Here we are telling c++ that if (M+1)/10000 is a whole number, we will do something in { }.  Here explicitly ( (M+1) % 10000 ) is M+1 modulus 10000, or the remainder after dividing M+1 by 10000. [This has been one of my favorite math operators since seeing the Chinese remainder theorem in my elementary number theory class at LSU]. So the statement is saying that if the remainder is 0 of (M+1)/10000 then do { }.

Before jumping into { }, I want to explain why we’re dividing (M+1) and not M. In R, entries of vectors of length B are numbered 1,2,…,B. In c++, entries of vectors of length B are 0,1,2,…,B-1. So this is why we’re looking at M+1 instead because the first iteration is stored as iteration 0 in c++ and this is necessary when we tell c++ to store certain iterations of our sampler.

Finally we have the { } in the if statement,

time(&now);

Rprintf(“iteration: %d: %s\n”, M+1, ctime(&now));

Which tells c++ to erase the previous value of now and replace it with the current time. Rprintf is used to print what you see on your console. Within the quotations, iteration tells c++ to print iteration in R and %d: %s\n tells c++ what is going to come next that needs to be printed. %d tells c++ that M+1 is a double value (real number), : tells c++ to look for another specification for the final argument ctime(&now), which prints out the time. %s\n tells c++ that the final argument in the printing function is a string. Try taking out %d:%s\n and the only thing that will print is iteration because c++ is not being told what type of variables additional arguments that are being printed are. Likewise, if we delete :%s\n, it will print iteration and M+1 only.

Now that we have our loop working we can do some MCMC! But first we need to talk about storing only the iterations we want to keep which will save time because we are allocating less memory. It’s pretty standard for an MCMC to have some burnin, for example 25% burnin, so why should we store the first 25% of samples and waste memory? Let’s just do a simple example before ending this post, by storing the iterations in a vector after burnin. (Make sure that burn*B is an integer here, but we could add an if() statement that would quit the function if a user inputs a burn such that burn*B is not an integer). To do this we need to add a few lines outside the for loop and an if statement in the for loop that will tell c++ to save this iteration only if it’s after the burnin period.

First add this to the function BEFORE invoking the for loop.

int nstore= B-(B * burn); defines an integer nstore which is the size of the post burnin sample

NumericVector X(nstore); defines a numeric vector X of size nstore

int StorageEntry; defines an integer StorageEntry that we will use in the loop.

Additionally, you need to add NumericVector to the front of the function to tell c++ to return a numeric vector, i.e.

NumericVector mcmc1(int B, double burn){  }

Now within the for loop, we need to tell c++ how to map the for loop values to the entries of X. For example, if we just do 10 iterations with a 50% burnin, we need to map iteration 6 to X[1] and iteration 10 to X[5] (or X[0] and X[4] in c++). We do this in the following way:

if((M+1)> B * burn){

      StorageEntry = (M+1) – (B * burn);

      X[StorageEntry]=M;

}

Recall that c++ stores n-vectors as having entries 0,1,…,n-1 so we need to only keep samples when (M+1) > B*burn in the sample. In the previous example, the first iteration kept will be iteration 6 where M=5 in c++ since B*burn=5. We iteratively change the entry for storage in

StorageEntry = (M+1) – (B * burn); which keeps track of where to store the iteration in X. If M=5 in our example we have M+1-(B*burn)=6-5=1. If we were to do this in R, we would have it set to M-B*burn.

  X[StorageEntry]=M; Sets element StorageEntry of X to M. Obviously we could have set it to a posterior sample or something, but I wanted to just get it up and running.

Finally we need to add

return X;

after the for loop to make sure the function returns what we want. Here’s the code in it’s entirety:

pic3

Which produces the following output when sourced for B=20,000 and burn=.99.

pic4

So you can see it still prints every 10000 iterations and it returns a vector of the post burnin iteration numbers.

Note: Also I’ve confirmed, it ALWAYS takes running source(‘code’) twice to get it to work so make sure to do this.

Next post I’ll do a simple example of a metropolis hastings sampler in c++.

Happy Coding!

-Andrew G. Chapple

Advertisements

3 thoughts on “Banging my head on the keyboard: MCMC in c++ part 1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s