2011/02/07

Simple example:How to use foreach and doSNOW packages for parallel computation.

update

************************************************************************************************
I checked whether this example was run collectly or not in Windows XP(32bit) only
************************************************************************************************



In R language, the members at Revolution R provide foreach and doSNOW packages for parallel computation. these packages allow us to compute things in parallel. So, we start to install these packages.
install.packages("foreach")
install.packages("doSNOW")
Created by Pretty R at inside-R.org


In foreach package, you can write the codes which are run not only in parallel but also in sequence. And, these are as following.
library(foreach)
#we get result as list
foreach(i = 1:3) %do% {sqrt(i)}
#we get result as vector with using .combine="c" option
foreach(i = 1:3,.combine = "c") %do% {sqrt(i)}
#if a result is "vector",we can get it as matrix with using .combine="cbind" option
foreach(i = 1:3,.combine = "cbind") %do% {letters[1:4]}
#if you define a function,you can use it as .combine option
#I wrote my function as returning same result that specify .combine="c" 
MyFunc <- function(x,y)c(x,y)
foreach(i = 1:3, .combine = "MyFunc") %do% {
  sqrt(i)
}
Created by Pretty R at inside-R.org


Next, we make clusters by doSNOW package for the purpose of parallel computation.
Because I have dual core machine, I specify two as the number of clusters.
> library(doSNOW)
> getDoParWorkers()
[1] 1
> getDoParName()
NULL
> registerDoSNOW(makeCluster(2, type = "SOCK"))
> getDoParWorkers()
[1] 2
> getDoParName()
[1] "doSNOW"
> getDoParVersion()
[1] "1.0.3"
Created by Pretty R at inside-R.org


Now, We are ready to compute things in parallel. It is easy for us to do that by foreach package. You only have to change "%do%" into "%dopar%". I compared the performance of parallel comutation to single computation as following.
> N <- 10^4
> system.time(foreach(i = 1:N,.combine = "cbind") %do% {
+   sum(rnorm(N))
+ })
   ユーザ   システム       経過
     57.52       0.48      59.60
> system.time(foreach(i = 1:N,.combine = "cbind") %dopar% {
+   sum(rnorm(N))
+ })
   ユーザ   システム       経過
     18.61       0.58      37.74
Created by Pretty R at inside-R.org
(I'm sorry that some terms are written in Japanese!)

You can understand the result of parallel computation is about twice as fast as single computation do !!!


Reference(including PDF)
-http://cran.r-project.org/web/packages/foreach/foreach.pdf
-http://cran.r-project.org/web/packages/foreach/vignettes/foreach.pdf
-http://cran.r-project.org/web/packages/foreach/vignettes/nested.pdf