I’m currently working on a research project where within a function I want to generate K posterior samples with disperse starting values via MCMC, assess convergence via scale reduction factors, and combine the latter half of these samples for future inference if convergence is met. It’s easy to run each chain one by one and store results from each as a list to be used to assess convergence but this has the drawback that if each chain takes t seconds, running all the chains will take K*t seconds.

Instead of doing each chain separately, we can use the R package Parallel to fit multiple chains at the same time while only sacrificing a slight amount of time (much less than K*t seconds). The basic idea is related to the hardware in the computer you use- each computer contains C=1,2,… core central processing units connected to the main system’s memory. Each of these cores can perform different computational tasks. Typically, R performs it’s computation on one of these central units and allows the others to be used for different tasks. You may have noticed that during lengthy computations (hours-days-weeks) you can still log into your favorite website to browse without any slowdown – this is because R is using one core while your internet program is using a different core. One complaint with this software is that if you use all your available core processing units for computation, your computer won’t let you do hardly anything else. So if you do have the desire to use all your cores, you need to leave your computer to run.

The parallel package takes advantage of this computational structure and assigns multiple jobs to it’s own core. This allows us to run K<=C chains in a reasonable amount of time. Most computers built for statistical computation have C>1 cores, so if you are one of the unfortunate souls living in the Stone Age, this package won’t work because it cannot divide the work load between workers. It is also worth noting here that not all computers have as many cores as your computer so if you are writing a package for submission to CRAN or GITHUB, you should specify as few cores as possible. For example, my unit contains 8 cores- but I wrote my coding with only 2 to avoid this issue.

The nitty gritty:

*detectCores() :* Returns the number of cores available for R to use (if 1 then you’re out of luck!)

Now once you know we have C cores, we can assign up to C to compute tasks. Personally, I like many other users suggest using C-1 cores to avoid slowdown of other computer programs. Now we specify the number of cores to use, say C-1, in R with

*clustername <- makeCluster(C-1)*

Now we have told R that if we submit a job to perform on clustername, it will do the same task on all C-1 clusters. Here’s where it gets a little tricky, since we are sending off jobs to different processors, we need to share the same unique objects of the function. For a simple example, say we want to compute the function f(x)=x+b on each cluster. We have to tell R via the parallel package what b is to be used in each cluster in the following manner:

b=2; clusterExport(clustername, “b”)

Where the second argument is replaced with a vector with entries surrounded by “ ” if we have more than one object to share between cores. The above function has an option envir= which specifies where to grab the variables from. The default is the global environment so if you want to use clusters within some function, you need to specify envir=environment() to tell R to look inside the function that has started for the object “b”.

Likewise, we need to share the functions that we share between cores using a separate function. An example is the following: say I need to call dmvnorm from the mvtnorm package to evaluate a density and a user made function called LK that evaluates the likelihood given the other parameters, then the following code will suffice:

*clusterEvalQ(clustername,c(library(mvtnorm), “LK”))*

as long as the function LK and the package mvtnorm exist in THE GLOBAL ENVIRONMENT. Now we want to run our function called fun with argument x as in above (i.e. f(x)=x+b). We run it with the following command:

*rbind(clusterCall(clustername,fun,x=3))*

*stopcluster(clustername)*

where the last part is the argument used to evaluate the function on each cluster. If we want to run multiple different chains, we could make a function called CHAIN (for example) that is a function of only the number of iterations (=B) to perform and returns a list containing the posterior samples. If we include a part in this function to randomly choose the starting values with some high variance, we effectively run multiple chains at disperse starting values. Say additionally that we have hyperparameters (a,b,c,d) used in CHAIN along with usage of the “LK” and mvtnorm functions, then the following coding will run these chains on 2 clusters with 100k iterations each:

*a=7,b=4,c=3,d=10 *(for example)**

*cluster=makeCluster(2)*

*clusterExport(cluster, c(“a”,”b”,”c”,”d”))*

*X=rbind(clusterCall(cluster,CHAIN,B=100,000))*

*stopcluster(cluster)*

*Which produces something like this which can be indexed via list arguments. *(for example)**

*> X*

*[,1] [,2]*

*[1,] List,6 List,6*

Each of these two lists contains 6 different posterior distributions (can contain vectors and matrices).

In conclusion, I want to discuss a few other uses for this. For example, consider a hierarchical model where we want to know the true mean test score for different counties j=1,…,J (parishes in Louisiana). Maybe we assume that Y_ji \sim N(\mu_j,\sigma) and that \mu_j \sim N(\mu , \sigma_mu) with additional prior distributions on \sigma, \mu and \simga_mu. If we were to use prior distributions that require the use of the Metropolis-Hastings algorithm for the posterior of \mu_j, we can use parallel processing to fit up to C of these \mu_j at a time!!! This ONLY works because the gibbs samplers for any \mu_j do not depend on any other \mu_k for k \ne j.

**When is it appropriate in general? I leave you with the following two examples:**

**You have an assembly line to create a car. Because workers who put the final paint detail on the car can’t do it until the car structurally is assembled- Parallelization is NOT APPROPRIATE****You have 6 orders of fried eggs for the 6 customers in your restaurant. Do you:****Tell one of your 6 employees to make the eggs-one at a time. OR****BE PARALLEL: Tell your 6 employees to each make one egg!**