This vignette showcases using future to run CPA asynchronously in a background process.
See more tutorials and vignettes on the Articles page.
Setup
library(jlmerclusterperm)
jlmerclusterperm_setup(verbose = FALSE)
Minimal example data and CPA specification:
chickweights <- ChickWeight
chickweights$Time <- as.integer(factor(chickweights$Time))
chickweights_spec <- make_jlmer_spec(
formula = weight ~ 1 + Diet,
data = chickweights,
subject = "Chick", time = "Time"
)
Example CPA (1000 simulations):
set_rng_state(123L)
CPA <- clusterpermute(
chickweights_spec,
threshold = 2.5,
nsim = 1000,
progress = FALSE
)
The details of the actual CPA is skipped because that’s not the focus
of this vignette, but this is the same example from the readme
just with more nsim
.
Basic idea
The basic idea behind this asynchronous CPA strategy is to start a background R process whose sole job is to send the instructions for CPA over to Julia. It’s asynchronous because running the CPA this way will not block your interactive R session. Since most of the work is being done in Julia anyways, this parallelization has virtually no impact on the performance of evaluating R code.
The future package allows such asynchronous evaluation of R code. It’s a big package implementing a complex topic. You can read more about the project on the futureverse but we just show the bare minimum here.
For more advanced users, note that we’re use parallelization
only for its asynchronous properties (non-blocking evaluation
of R code in a background process). It is not recommended to start
multiple R processes with each running a CPA because the Julia
session is shared and already multithreaded (if you can spare more
cores, set that in options("jlmerclusterperm.nthreads")
before calling jlmerclusterperm_setup()
).
Later updates to jlmerclusterperm
may wrap this workflow
in a more principled way, but for now this vignette serves as a
minimally working example for asynchronously running a CPA.
Walkthrough
Setup for async CPA
Three things are in order for this workflow:
- Load the future package
- Initialize the
multisession
future - Grab options from the
jlmerclusterperm
package environment
Please treat the .jlmerclusterperm
internal variable as
a read-only object - it’s unexported and not meant to be
manipulated.
Creating the future object
We start by creating a special object of class
<Future>
using future::future()
. We can
simply pass the clusterpermute()
code to the first argument
of future()
, but with a slight modification to ensure that
the background process connects to the same Julia session.
The template is as follows. First, you make the pkgopts
object (defined above) available for the future (via the
globals
argument). Then, inside the future expression
ensure that the background process shares the same Julia session (via
list2env(...)
).
# Not run
future(
{
list2env(pkgopts, jlmerclusterperm:::.jlmerclusterperm)
## Your CPA code below ##
},
globals = structure(TRUE, add = "pkgopts")
)
A future object replicating the 1000-simulation CPA from above will look like the following:
f <- future(
{
list2env(as.list(pkgopts), jlmerclusterperm:::.jlmerclusterperm)
## Your CPA code below ##
set_rng_state(123L)
clusterpermute(
chickweights_spec,
threshold = 2.5,
nsim = 1000
)
},
globals = structure(TRUE, add = "pkgopts")
)
Once the future is created, it immediately starts
executing the code in a background process. It is important that you
do not evaluate the future object (f
)
directly. Instead, you should query it with
future::resolved()
- this simply tells you whether the
background evaluation has completed or not:
resolved(f)
#> [1] FALSE
Here, we use a while
loop to show that the background
CPA is non-blocking. We are evaluating R code in the interactive session
simultaneously as the CPA is running:
i <- 0
while (!resolved(f)) {
Sys.sleep(0.5)
i <<- i + 1
cat(sprintf("Elapsed: %.01fs", i / 2), "\n")
}
#> Elapsed: 0.5s
#> Elapsed: 1.0s
#> Elapsed: 1.5s
We reach the end of the while loop at 1.5 seconds, which is approximately how long the CPA took to complete (we confirm this in the next section). At this point the future has completed and its result is available for collection.
resolved(f)
#> [1] TRUE
Although the R code here was just a crude progress alert using
Sys.sleep()
, you can freely evaluate any R code
except other functions that call Julia. You can make
plots, clean data, write up analyses, etc., but just don’t run another
CPA while one is already running because the Julia process is
shared.
Collecting the results
We can collect the output of the background process with
future::value()
. But first, we double check to make sure
that the background process has indeed finished evaluating:
resolved(f)
#> [1] TRUE
Once that’s confirmed, we can use value()
to collect the
results from the future and assign it to a variable for further
inspection:
CPA_async <- value(f)
#> Connecting to Julia TCP server at localhost:11984 ...
#> ℹ Detecting empirical clusters and calculating cluster-mass statistics.
#> ✔ Detecting empirical clusters and calculating cluster-mass statistics. [56ms]
#>
#> ℹ Sampling cluster-mass statistics from a bootstrapped null distribution.
#> ✔ Sampling cluster-mass statistics from a bootstrapped null distribution. [985m…
#>
#> ℹ Calculating the probability of the observed cluster-mass statistics.
#> ✔ Calculating the probability of the observed cluster-mass statistics. [18ms]
#>
Note how value()
also prints the messages encountered
while executing the CPA code. We see this because we did not specify
progress = FALSE
in the clusterpermute()
call
passed to the future. For our purposes, just note that adding up the
times from the messages is similar to what we saw from the while loop
(1.5 seconds).
The output of the asynchronous CPA (CPA_async
) is
identical to the initial CPA that we ran at the beginning
(CPA
) because both were ran with the default seed of 1 and
the RNG counter value of 123L
:
identical(CPA_async, CPA)
#> [1] TRUE
A separate vignette covers the Julia RNG.
In practice, you may want to use a random seed (via
set_rng_seed()
) for the background CPA.