R for modellers - Vignette 07

parLapply and friends

Julien Arino

Department of Mathematics

University of Manitoba*

* The University of Manitoba campuses are located on original lands of Anishinaabeg, Cree, Oji-Cree, Dakota and Dene peoples, and on the homeland of the Métis Nation.

Embarrassingly parallel problems

Some of the computational tasks we perform are embarrassingly parallel: they can be split into independent sub-tasks that can be run in parallel

For example, independent simulations of a stochastic process or evaluation of the output of a function at different parameter values

These problems are easy to parallelise

Benefits of parallelisation

If your task is simple enough, you get a speed-up that is close to the number of cores you have available
- If you have 8 threads and are evaluating a function at 8 different parameter values, then the 8 calls can happen simultaneously
If your task is more complex, you get a speed-up, but it will be less than the number of cores you have available
- There is some overhead associated with parallelisation
- Some tasks are already parallelised (e.g. matrix multiplication), which means that you do not get a speed-up if you parallelise them again

Amdahl’s law

\[ S_{\text{latency}}(s)= \frac {1}{(1-p)+p/s} \]

where - \(S_{\text{latency}}\) is the theoretical speedup of the execution of the whole task - \(s\) is the speedup of the part of the task that benefits from improved system resources - \(p\) is the proportion of execution time that the part benefiting from improved resources originally occupied

Setting up a cluster on a single machine

Example 1 - Sensitivity of \(\mathcal{R}_0\)

When on a single machine, a cluster is a subset of the compute threads on the machine
R uses the word cores even when referring to threads
For example, if your laptop has 4 cores/8 threads, you can create a cluster with 2-8 “cores”
Each core is a worker

Workers need information to work

This is probably the most confusing part about parallel computing (until you think about it)
Workers are separated R instances which receive their work orders from the main R instance (the dispatcher), so they must be given all the information needed to carry out the work
This needs to be done prior to the computation, as workers receive minimal information from the dispatcher at runtime…

Needs package parallel
Detect number of cores using detectCores()
Initiate cluster with number of cores
Make workers aware of info they need to work

Example 1 - Sensitivity of \(\mathcal{R}_0\)

Sensitivity of \(\mathcal{R}_0\) to changes in parameters

We take the value of \(\mathcal{R}_0\) in an SIR model with demography and proportional (standard) incidence \[ \mathcal{R}_0 = \frac{\beta}{d+\gamma} \] We want to know how the value of \(\mathcal{R}_0\) changes as a function of changes in the parameters \(\beta,\gamma,d\)

This is sensitivity analysis and will be covered in detail in another Vignette

We define the function whose value we want to compute

R0 = function(p) {
  return(p$beta/(p$gamma+p$d))
}

Create a wrapper function

We create a “wrapper” of the function R0

one_run_R0 = function(p) {
  return(R0(p))
}

In more complex problems, this wrapper function is really important and can be much more evolved.. It can also take other parameters held constant

Set up variations of the parameters

We use the sensitivity library to generate sample values of the parameters. The function parameterSets takes a list with minimum and maximum values for each parameter and a number of samples to generate, so we first make the list

pars.list = list(
  beta = c(0.001, 0.5),
  gamma = c(1/10, 1/2),
  d = c(1/(100*365.25), 1/(20*365.25)))

Generate the parameter values sample

pars.sobol = parameterSets(par.ranges = pars.list, 
                           samples = nb_sims, 
                           method = "sobol")
pars.sobol = as.data.frame(pars.sobol)
colnames(pars.sobol) = names(pars.list)
# For parLapply, we need to make a list of lists
pars.sobol = split(pars.sobol, seq(nrow(pars.sobol)))

Non-parallel version

tictoc::tic("sequential")
result = lapply(X = pars.sobol,
                FUN = one_run_R0)
tictoc::toc()

Parallel version

# Detect number of cores, use all but 1
no_cores <- parallel::detectCores() - 1
# Initiate cluster
tictoc::tic("whole parallel phase")
cl <- parallel::makeCluster(no_cores)
# Export needed variables
parallel::clusterExport(cl,
                        c("R0",
                        "one_run_R0"))
# Run computation
tictoc::tic("parLapply")
result = parallel::parLapply(cl = cl, X = pars.sobol,
                             fun =  one_run_R0)
tictoc::toc()
# Stop cluster
parallel::stopCluster(cl)
tictoc::toc()

Remember Amdahl’s law ?

nb_sims = 1000
sequential: 0.015 sec elapsed
parLapply: 0.096 sec elapsed
cluster: 1.125 sec elapsed

nb_sims = 10000
sequential: 0.093 sec elapsed
parLapply: 0.114 sec elapsed
cluster: 1.034 sec elapsed

nb_sims = 1e+05
sequential: 0.878 sec elapsed
parLapply: 0.506 sec elapsed
cluster: 1.416 sec elapsed

nb_sims = 1e+06
sequential: 15.807 sec elapsed
parLapply: 14.833 sec elapsed
cluster: 16.881 sec elapsed