I’ve been avoiding this moment ever since I learned to program in R. Ironically, I dreaded the same for R after I had learned SAS-learning a new language. Several of my classmates have been suggesting Matlab, Python and C++ and I’ve avoided thus far as my computations didn’t take longer than 12 hours or so to complete. A new project has finally prodded me to move onto greener coding pastures. This project (details after it’s published) entails running two disperse Markov chains (run long enough to appear convergent) over 30,000 times. The fastest I could get the chains to run is about 20-40 seconds in R (depending on the running dimension in the reversible jump MCMC), which runs the full desired coding in about 3-5 days. The problem here is this project is to be used in real time clinical trials to decide if a trial should stop immediately due to superiority of one treatment to another or futility if it appears that the trial will result in a failure. Clinicians cannot use this software if it doesn’t produce a decision in a reasonable time, say overnight. This way Clinicians can give their patient data (treatment labels, survival times and censoring indicators) to the program in the afternoon and have an answer on whether or not to continue the trial in the morning. I’ve heard from several different degrees of speed increase of C++ from R ranging from 60x to 1,000x faster than R. Even on the lower end of this spectrum, a 60x increase in my sampling speed would take less than 1 second to run each chain which makes the problem computationally feasible.
Before begrudgingly attempting to learn C++, I tried my best to get R to adhere to this time constraint by using Rprof by Dr. Hadley Wickam (from my Rice University) to find any bottlenecks in my coding. At some point I’ll write a post on my experiences with this program because it is extremely useful. After I got it to as fast as I could I attempted the parallelization route by using the parLapply function in library(snow), hoping that by distributing this tasks to all 8 of my cores it could be done reasonably. Sadly, this didn’t accomplish the desired goal, but I plan to use RCPP and specifically RCPPParallel to call functions from C++ into R and distribute them to all my cores in R. I’m optimistic that the speed up for using C++ will be great enough to ignore parallelization totally but unless the sampler improves to about 1 second (about 8 hours to run everything sequentially) the parallelization will be necessary.
So I’ve decided to take the plunge, where to start? The first is picking a Integrated Development Environment to use and finding someone with some great introductory videos using that IDE. This is the first step for me at least but that’s partially due to my reliance and happiness with the IDE in R studio over using R alone for computations. Upon first search DEV C++ was the top google search, but looking deeper into other compilers, the sentiment is DEV C++ is out of date by several years. The YouTube user Programming Tutorials (who seems to have many coding videos) introduces C++ through using Eclipse, but I had some trouble getting this to work on Windows. This is because when you download Eclipse, it doesn’t come with a default compiler so you have to install your own compiler in the folder that contains Eclipse. Instead, Windows users seem to be sold on the C++ coding features in the MS Visual Studio from Microsoft office (also free) which comes with it’s own compiler. Two quick clicks on the Microsoft website and I’m beginning my journey to becoming C proficient.
Click the download link under visual studio and you’re off to the races (takes a little while to download since this program supports all sorts of languages)! I plan to write a few posts about my experiences learning C++ (and helpful resources I used), especially related to functions used in MCMC, like for loops and random number generation. I also plan to display R and C++ translations of one Gibbs sampler within my MCMC to illustrate how translations of your chain might look like.