Asynchronous CPA • jlmerclusterperm

This vignette showcases using future to run CPA asynchronously in a background process.

See more tutorials and vignettes on the Articles page.

Setup

library(jlmerclusterperm)
jlmerclusterperm_setup(verbose = FALSE)

Minimal example data and CPA specification:

chickweights <- ChickWeight
chickweights$Time <- as.integer(factor(chickweights$Time))
chickweights_spec <- make_jlmer_spec(
  formula = weight ~ 1 + Diet,
  data = chickweights,
  subject = "Chick", time = "Time"
)

Example CPA (1000 simulations):

set_rng_state(123L)
CPA <- clusterpermute(
  chickweights_spec,
  threshold = 2.5,
  nsim = 1000,
  progress = FALSE
)

The details of the actual CPA is skipped because that’s not the focus of this vignette, but this is the same example from the readme just with more nsim.

Basic idea

The basic idea behind this asynchronous CPA strategy is to start a background R process whose sole job is to send the instructions for CPA over to Julia. It’s asynchronous because running the CPA this way will not block your interactive R session. Since most of the work is being done in Julia anyways, this parallelization has virtually no impact on the performance of evaluating R code.

The future package allows such asynchronous evaluation of R code. It’s a big package implementing a complex topic. You can read more about the project on the futureverse but we just show the bare minimum here.

For more advanced users, note that we’re use parallelization only for its asynchronous properties (non-blocking evaluation of R code in a background process). It is not recommended to start multiple R processes with each running a CPA because the Julia session is shared and already multithreaded (if you can spare more cores, set that in options("jlmerclusterperm.nthreads") before calling jlmerclusterperm_setup()).

Later updates to jlmerclusterperm may wrap this workflow in a more principled way, but for now this vignette serves as a minimally working example for asynchronously running a CPA.

Walkthrough

Setup for async CPA

Three things are in order for this workflow:

Load the future package
Initialize the multisession future
Grab options from the jlmerclusterperm package environment

library(future)
plan(multisession)
pkgopts <- as.list(jlmerclusterperm:::.jlmerclusterperm)

Please treat the .jlmerclusterperm internal variable as a read-only object - it’s unexported and not meant to be manipulated.

Creating the future object

We start by creating a special object of class <Future> using future::future(). We can simply pass the clusterpermute() code to the first argument of future(), but with a slight modification to ensure that the background process connects to the same Julia session.

The template is as follows. First, you make the pkgopts object (defined above) available for the future (via the globals argument). Then, inside the future expression ensure that the background process shares the same Julia session (via list2env(...)).

# Not run
future(
  {
    list2env(pkgopts, jlmerclusterperm:::.jlmerclusterperm)
    ## Your CPA code below ##
  },
  globals = structure(TRUE, add = "pkgopts")
)

A future object replicating the 1000-simulation CPA from above will look like the following:

f <- future(
  {
    list2env(as.list(pkgopts), jlmerclusterperm:::.jlmerclusterperm)
    ## Your CPA code below ##
    set_rng_state(123L)
    clusterpermute(
      chickweights_spec,
      threshold = 2.5,
      nsim = 1000
    )
  },
  globals = structure(TRUE, add = "pkgopts")
)

Once the future is created, it immediately starts executing the code in a background process. It is important that you do not evaluate the future object (f) directly. Instead, you should query it with future::resolved() - this simply tells you whether the background evaluation has completed or not:

resolved(f)
#> [1] FALSE

Here, we use a while loop to show that the background CPA is non-blocking. We are evaluating R code in the interactive session simultaneously as the CPA is running:

i <- 0
while (!resolved(f)) {
  Sys.sleep(0.5)
  i <<- i + 1
  cat(sprintf("Elapsed: %.01fs", i / 2), "\n")
}
#> Elapsed: 0.5s 
#> Elapsed: 1.0s 
#> Elapsed: 1.5s

We reach the end of the while loop at 1.5 seconds, which is approximately how long the CPA took to complete (we confirm this in the next section). At this point the future has completed and its result is available for collection.

resolved(f)
#> [1] TRUE

Although the R code here was just a crude progress alert using Sys.sleep(), you can freely evaluate any R code except other functions that call Julia. You can make plots, clean data, write up analyses, etc., but just don’t run another CPA while one is already running because the Julia process is shared.

Collecting the results

We can collect the output of the background process with future::value(). But first, we double check to make sure that the background process has indeed finished evaluating:

resolved(f)
#> [1] TRUE

Once that’s confirmed, we can use value() to collect the results from the future and assign it to a variable for further inspection:

CPA_async <- value(f)
#> Connecting to Julia TCP server at localhost:11984 ...
#> ℹ Detecting empirical clusters and calculating cluster-mass statistics.
#> ✔ Detecting empirical clusters and calculating cluster-mass statistics. [56ms]
#> 
#> ℹ Sampling cluster-mass statistics from a bootstrapped null distribution.
#> ✔ Sampling cluster-mass statistics from a bootstrapped null distribution. [985m…
#> 
#> ℹ Calculating the probability of the observed cluster-mass statistics.
#> ✔ Calculating the probability of the observed cluster-mass statistics. [18ms]
#>

Note how value() also prints the messages encountered while executing the CPA code. We see this because we did not specify progress = FALSE in the clusterpermute() call passed to the future. For our purposes, just note that adding up the times from the messages is similar to what we saw from the while loop (1.5 seconds).

The output of the asynchronous CPA (CPA_async) is identical to the initial CPA that we ran at the beginning (CPA) because both were ran with the default seed of 1 and the RNG counter value of 123L:

identical(CPA_async, CPA)
#> [1] TRUE

A separate vignette covers the Julia RNG.

In practice, you may want to use a random seed (via set_rng_seed()) for the background CPA.