Geller et al. 2020

This tutorial is a walk through of the choice of t vs. chisq statistic in a (mixed-effects) CPA.

See more tutorials and vignettes on the Articles page.

Background

The data comes from a lexical decision task using pupilometry (specifically, measure of pupil dilation) to study differences in processing difficulty between print vs. cursive script. The data is available as part of the gazer package by Geller, Winn, Mahr, & Mirman 2020.

We will follow Jason Geller’s tutorial which used the clusterperm package to conduct a cluster-based permutation analysis (CPA) of differences in pupil size between the print and cursive conditions.

The data from the package (cursive_agg_data) looks slightly different from that used in the tutorial, so to ensure full reproducibility we use the exact data that the tutorial used:

library(dplyr)
cursive_agg_data <- as_tibble(read.csv("https://raw.githubusercontent.com/jgeller112/drjasongeller/a612056bd50e7f1e9280880cbb81016e2ca11511/blog/posts/2020-07-10-CBPT/blog_data.csv"))
cursive_agg_data
#> # A tibble: 2,184 × 5
#>        X subject script  timebins aggbaseline
#>    <int> <chr>   <chr>      <int>       <dbl>
#>  1     1 10b.edf cursive        0      36.9  
#>  2     2 10b.edf cursive      100      27.5  
#>  3     3 10b.edf cursive      200      19.6  
#>  4     4 10b.edf cursive      300      12.6  
#>  5     5 10b.edf cursive      400       5.60 
#>  6     6 10b.edf cursive      500       0.968
#>  7     7 10b.edf cursive      600      -5.98 
#>  8     8 10b.edf cursive      700      -5.96 
#>  9     9 10b.edf cursive      800       3.01 
#> 10    10 10b.edf cursive      900      12.9  
#> # ℹ 2,174 more rows

The data comes prepared for this analysis out of the box. The following columns are relevant for our analysis:

subject: Unique identifier for subjects
script: A within-participant factor variable ("cursive", "print")
time: A continuous measure of time from 0-2500ms at 100ms intervals
aggbaseline: The response variable representing normalized (“baseline-corrected”) pupil size

As the name of the cusrive_agg_data variable suggests, the data has been aggregated within subject, collapsing across trials.

The following reproduces the figure from the tutorial:

library(ggplot2)
script_fig <- ggplot(cursive_agg_data, aes(timebins, aggbaseline)) +
  stat_summary(aes(linetype = script), geom = "line", linewidth = 1)
script_fig

For within-participant predictors like script, the permutation algorithm will randomly swap the labels for condition between the trials within each subject. This preserves the temporal structure of the trial-level data (no swapping below the trial-level grouping) as well as the subject-level grouping structure (no swapping of trials across participants).

Outline

This case study vignette showcases five features of doing a CPA with jlmerclusterperm:

Prepping data for CPA using make_jlmer_spec()
clusterpermute() with default statistic = "t"
A comparison between “t” and “chisq”
Specifying random effects
Contrast coding

Load the package and start the Julia instance with jlmerclusterperm_setup() before proceeding.

library(jlmerclusterperm)
jlmerclusterperm_setup(verbose = FALSE)

A) Prepping a specification object

We start with a simple specification object to model aggbaseline using script as a predictor for the script_fig data:

simple_spec <- make_jlmer_spec(
  formula = aggbaseline ~ 1 + script,
  data = cursive_agg_data
)
simple_spec
#> ── jlmer specification ───────────────────────────────────────── <jlmer_spec> ──
#> Formula: aggbaseline ~ 1 + scriptprint
#> Predictors:
#>   script: scriptprint
#> Data:
#> # A tibble: 2,184 × 2
#>   aggbaseline scriptprint
#>         <dbl>       <dbl>
#> 1        36.9           0
#> 2        27.5           0
#> 3        19.6           0
#> # ℹ 2,181 more rows
#> ────────────────────────────────────────────────────────────────────────────────

For a sanity check, we fit a global model to the data…

jlmer(simple_spec)
#> <Julia object of type StatsModels.TableRegressionModel>
#> ────────────────────────────────────────────────────────────────────────
#>                 Coef.  Std. Error      z  Pr(>|z|)  Lower 95%  Upper 95%
#> ────────────────────────────────────────────────────────────────────────
#> (Intercept)   46.7221     2.79622  16.71    <1e-61    41.2417    52.2026
#> scriptprint  -10.2582     3.95445  -2.59    0.0095   -18.0088    -2.5076
#> ────────────────────────────────────────────────────────────────────────

…and check that it’s comparable to what we expect from lm():

summary(lm(formula = aggbaseline ~ 1 + script, data = cursive_agg_data))$coefficients
#>              Estimate Std. Error   t value     Pr(>|t|)
#> (Intercept)  46.72214   2.796221 16.709029 4.512338e-59
#> scriptprint -10.25818   3.954454 -2.594083 9.547816e-03

The full specification object for a CPA must also declare the grouping structures present in the data. The rule of thumb is that every observation (row) in the data must be uniquely identified by a combination of columns for subject, trial, and time. Because CPA is over a time series data time must always be specified, and subject must also always be specified for the permutation algorithm to respect subject-level grouping of the data.

The cursive_agg_data data collapses across trials within subject, so there is no column for trial. However, we cannot leave the trial argument unspecified because observations are not uniquely identified by subject and time alone. There are instead 2 rows per subject-time combination, one for each script condition:

cursive_agg_data %>%
  count(subject, timebins)
#> # A tibble: 1,092 × 3
#>    subject timebins     n
#>    <chr>      <int> <int>
#>  1 10b.edf        0     2
#>  2 10b.edf      100     2
#>  3 10b.edf      200     2
#>  4 10b.edf      300     2
#>  5 10b.edf      400     2
#>  6 10b.edf      500     2
#>  7 10b.edf      600     2
#>  8 10b.edf      700     2
#>  9 10b.edf      800     2
#> 10 10b.edf      900     2
#> # ℹ 1,082 more rows

Therefore we need a “dummy” column for trial to distinctly mark “cursive” vs. “script” trials. We save this new data as cursive_agg:

cursive_agg <- cursive_agg_data %>%
  mutate(trial_type = paste0(script, "_agg"))
cursive_agg
#> # A tibble: 2,184 × 6
#>        X subject script  timebins aggbaseline trial_type 
#>    <int> <chr>   <chr>      <int>       <dbl> <chr>      
#>  1     1 10b.edf cursive        0      36.9   cursive_agg
#>  2     2 10b.edf cursive      100      27.5   cursive_agg
#>  3     3 10b.edf cursive      200      19.6   cursive_agg
#>  4     4 10b.edf cursive      300      12.6   cursive_agg
#>  5     5 10b.edf cursive      400       5.60  cursive_agg
#>  6     6 10b.edf cursive      500       0.968 cursive_agg
#>  7     7 10b.edf cursive      600      -5.98  cursive_agg
#>  8     8 10b.edf cursive      700      -5.96  cursive_agg
#>  9     9 10b.edf cursive      800       3.01  cursive_agg
#> 10    10 10b.edf cursive      900      12.9   cursive_agg
#> # ℹ 2,174 more rows

This makes all observations uniquely identified by the columns for subject, trial, and time:

cursive_agg %>%
  count(subject, timebins, trial_type) %>%
  distinct(n)
#> # A tibble: 1 × 1
#>       n
#>   <int>
#> 1     1

The final specification object looks like the following:

cursive_agg_spec <- make_jlmer_spec(
  formula = aggbaseline ~ 1 + script,
  data = cursive_agg,
  subject = "subject", trial = "trial_type", time = "timebins"
)
cursive_agg_spec
#> ── jlmer specification ───────────────────────────────────────── <jlmer_spec> ──
#> Formula: aggbaseline ~ 1 + scriptprint
#> Predictors:
#>   script: scriptprint
#> Groupings:
#>   Subject: subject
#>   Trial: trial_type
#>   Time: timebins
#> Data:
#> # A tibble: 2,184 × 5
#>   aggbaseline scriptprint subject trial_type  timebins
#>         <dbl>       <dbl> <chr>   <chr>          <int>
#> 1        36.9           0 10b.edf cursive_agg        0
#> 2        27.5           0 10b.edf cursive_agg      100
#> 3        19.6           0 10b.edf cursive_agg      200
#> # ℹ 2,181 more rows
#> ────────────────────────────────────────────────────────────────────────────────

B) CPA with default `statistic = "t"`

The CPA output from the original tutorial (using 100 simulations) is copied below for comparison:

#> #> effect   b0     b1  sign        cms           p
#> #> script    0    100     1  10.121215   0.1584158
#> #> script  900   1000    -1  8.756152    0.1782178
#> #> script 1900   2500     1  82.198279   0.0099010

Using a threshold of 2 with the default statistic = "t", clusterpermute() returns the following:

set_rng_state(123L)
clusterpermute(
  cursive_agg_spec,
  statistic = "t", # Default value spelled out
  threshold = 2,
  nsim = 100,
  progress = FALSE
)
#> $null_cluster_dists
#> ── Null cluster-mass distribution (t > 2) ────────────── <null_cluster_dists> ──
#> scriptprint (n = 100)
#>   Mean (SD): 0.000 (0.00)
#>   Coverage intervals: 95% [0.000, 0.000]
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> $empirical_clusters
#> ── Empirical clusters (t > 2) ────────────────────────── <empirical_clusters> ──
#> scriptprint
#>   [2000, 2300]: -8.748 (p=0.0099)
#> ────────────────────────────────────────────────────────────────────────────────

The results look very different from the original. This is due to the following:

The cluster-mass statistic (cms) is lower. This is because we ran clusterpermute() with the default statistic = "t".
Different clusters are identified. This is because our simple model does not account for subject-level variation, whereas the original did this with the error term in the formula aggbaseline ~ script + Error(subject).
The sign on the cluster is reversed. This is because the defaults of aov() used in the original tutorial are different from the default contrast coding of our regression model.

We now address these issue in turn, building up to a CPA that closely replicate the results of the tutorial.

C) Comparing “t” vs. “chisq”

The original tutorial used clusterperm::cluster_nhds() to conduct a CPA, which fits ANOVA models by time. There, the timewise statistic used is chi-squared and the threshold is determined from the p-value of the chi-squared statistic.

Using chi-squared statistics with p-value threshold is also supported in clusterpermute() using statistic = "chisq" (instead of the default "t"):

set_rng_state(123L)
clusterpermute(
  cursive_agg_spec,
  statistic = "chisq",
  threshold = 0.05, # Threshold is now the p-value of the chi-squared statistic
  nsim = 100,
  progress = FALSE
)
#> $null_cluster_dists
#> ── Null cluster-mass distribution (chisq p < 0.05) ───── <null_cluster_dists> ──
#> script (n = 100, df = 1)
#>   Mean (SD): 0.000 (0.00)
#>   Coverage intervals: 95% [0.000, 0.000]
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> $empirical_clusters
#> ── Empirical clusters (chisq p < 0.05) ───────────────── <empirical_clusters> ──
#> script (df = 1)
#>   [2000, 2300]: -19.085 (p=0.0099)
#> ────────────────────────────────────────────────────────────────────────────────

This returns the same cluster but now with a numerically larger cluster-mass statistic, as expected (chi-squared is asymptotic to t^2).

Below, we compare the shape of the timewise statistics between “t” with threshold of 2 and “chisq” with threshold of p=0.05:

timewise_ts <- compute_timewise_statistics(cursive_agg_spec, statistic = "t")
timewise_chisqs <- compute_timewise_statistics(cursive_agg_spec, statistic = "chisq")
library(ggplot2)
timewise_fig <- ggplot(mapping = aes(x = time, y = statistic)) +
  geom_line(
    aes(color = "fixed-t"),
    linewidth = 1.5,
    data = tidy(timewise_ts)
  ) +
  geom_line(
    aes(color = "fixed-chisq"),
    linewidth = 1.5,
    data = tidy(timewise_chisqs)
  ) +
  geom_hline(
    aes(yintercept = c(-2, 2, qchisq(.95, df = 1), -qchisq(.95, df = 1))),
    color = rep(c("#00BFC4", "#F8766D"), each = 2), linetype = 2
  ) +
  scale_color_manual(
    values = setNames(
      c("#E69F00", "#56B4E9", "#009E73", "#F0E442"),
      c("fixed-t", "fixed-chisq", "re-intercept-chisq", "re-max-chisq")
    )
  )
timewise_fig

We find the same clusters identified between 2000ms-2300ms for both “t” and “chisq”, with the peaks more pronounced for “chisq”. The differences between the two are inconsequential for this example, but may produce different results in other cases.

The chi-squared statistic (which jlmerclusterperm computes via a likelihood ratio test) is often preferred for testing single parameters because it makes less assumptions and tend to be more robust (glmmFAQ). But because goodness-of-fit tests are at the level of a predictor in a model formula, “chisq” is less interpretable for multi-level predictors (k-1 > 1). For teasing apart the contribution of the individual levels of a multi-level predictor, using statistic = "t" is more appropriate.

D) Specifying random effects

The biggest missing component at this point is the subject random effects, which the original tutorial captures via the error term Error(subject). There isn’t a strict equivalent to this in regression, but specifying random intercepts by subject with (1 | subject) gets us very close:

cursive_agg_spec_re <- make_jlmer_spec(
  formula = aggbaseline ~ 1 + script + (1 | subject),
  data = cursive_agg,
  subject = "subject", time = "timebins", trial = "trial_type"
)
set_rng_state(123L)
system.time({
  re_CPA <- clusterpermute(
    cursive_agg_spec_re,
    statistic = "chisq",
    threshold = 0.05,
    nsim = 100,
    progress = FALSE
  )
})
#>    user  system elapsed 
#>    0.03    0.08    7.33

re_CPA
#> $null_cluster_dists
#> ── Null cluster-mass distribution (chisq p < 0.05) ───── <null_cluster_dists> ──
#> script (n = 100, df = 1)
#>   Mean (SD): 1.132 (12.58)
#>   Coverage intervals: 95% [-21.081, 32.649]
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> $empirical_clusters
#> ── Empirical clusters (chisq p < 0.05) ───────────────── <empirical_clusters> ──
#> script (df = 1)
#>   [0, 100]: -9.770 (p=0.1881)
#>   [900, 1000]: 8.521 (p=0.2475)
#>   [1900, 2500]: -73.440 (p=0.0099)
#> ────────────────────────────────────────────────────────────────────────────────

We repeat the results from the original tutorial below for comparison:

#> #> effect   b0     b1  sign        cms           p
#> #> script    0    100     1  10.121215   0.1584158
#> #> script  900   1000    -1  8.756152    0.1782178
#> #> script 1900   2500     1  82.198279   0.0099010

We now plot the timewise statistics from the random intercept models:

timewise_chisqs_re <- compute_timewise_statistics(cursive_agg_spec_re, statistic = "chisq")
timewise_fig_re <- timewise_fig +
  geom_line(
    aes(color = "re-intercept-chisq"),
    linewidth = 1.5,
    data = tidy(timewise_chisqs_re)
  )
timewise_fig_re

At this point we may wonder whether the results change much if we used a maximal model with the random effects structure (1 + script | subject). We first create another specification object with a maximal formula:

cursive_agg_spec_re_max <- make_jlmer_spec(
  formula = aggbaseline ~ 1 + script + (1 + script | subject),
  data = cursive_agg,
  subject = "subject", time = "timebins", trial = "trial_type"
)

Then compute the timewise statistics:

timewise_chisqs_re_max <- compute_timewise_statistics(cursive_agg_spec_re_max, statistic = "chisq")
#> ℹ 1 singular fit (3.85%).

We find that the chisq statistics from the maximal model is virtually identical to that from the more parsimonious, intercept-only model (the two lines overlap):

timewise_fig_re +
  geom_line(
    aes(color = "re-max-chisq"),
    linewidth = 1.5,
    data = tidy(timewise_chisqs_re_max)
  )

Parsimony is incredibly important for simulation - while jlmerclusterperm is fast, model complexity is still a major bottleneck. When adding the extra random effect terms (random slope and correlation) has negligible effects on the statistic, removing them is likely to be inconsequential for the CPA itself.

We show this in the following CPA run, which uses the maximal model. Notice how the results are identical to the intercept-only re_CPA but substantially slower:

set_rng_state(123L)
system.time({
  re_max_CPA <- clusterpermute(
    cursive_agg_spec_re_max,
    statistic = "chisq",
    threshold = 0.05,
    nsim = 100,
    progress = FALSE
  )
})
#>    user  system elapsed 
#>    0.01    0.07   11.48

re_max_CPA
#> $null_cluster_dists
#> ── Null cluster-mass distribution (chisq p < 0.05) ───── <null_cluster_dists> ──
#> script (n = 100, df = 1)
#>   Mean (SD): 1.132 (12.58)
#>   Coverage intervals: 95% [-21.081, 32.649]
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> $empirical_clusters
#> ── Empirical clusters (chisq p < 0.05) ───────────────── <empirical_clusters> ──
#> script (df = 1)
#>   [0, 100]: -9.770 (p=0.1881)
#>   [900, 1000]: 8.521 (p=0.2475)
#>   [1900, 2500]: -73.440 (p=0.0099)
#> ────────────────────────────────────────────────────────────────────────────────

We run the above maximal mixed model CPA purely for demonstration purposes. The random effects structure is actually unidentifiable given the aggregated data, where at any point in time there are only 2 observations from each subject (the mean of each condition).

Lastly, back in our figure of the data, we annotate the clusters detected with the chisq p-value threshold of 0.05 from the random intercept CPA:

empirical_clusters_df <- tidy(extract_empirical_clusters(timewise_chisqs_re, threshold = 0.05))
script_fig +
  geom_segment(
    aes(
      x = start, xend = end, y = -Inf, yend = -Inf,
      color = factor(sign(sum_statistic))
    ),
    linewidth = 10,
    inherit.aes = FALSE,
    data = empirical_clusters_df
  ) +
  geom_text(
    aes(
      y = -Inf, x = start + (end - start) / 2,
      label = paste("Cluster", id)
    ),
    vjust = -2,
    inherit.aes = FALSE,
    data = empirical_clusters_df
  ) +
  labs(color = "Sign of cluster")

E) Contrast coding

The last piece of the puzzle is the flipped sign of the effect. Whereas we detect a negative cluster when the line for cursive is over the line for print, the original tutorial reports a positive cluster (output repeated below):

#> #> effect   b0     b1  sign        cms           p
#> #> script    0    100     1  10.121215   0.1584158
#> #> script  900   1000    -1  8.756152    0.1782178
#> #> script 1900   2500     1  82.198279   0.0099010

Fixing this is trivial - it just takes a different choice of contrast.

For example, we can flip the levels of the factor to make “print” the reference level:

rev_contrast_df <- cursive_agg
rev_contrast_df$script <- factor(rev_contrast_df$script, levels = c("print", "cursive"))
contrasts(rev_contrast_df$script)
#>         cursive
#> print         0
#> cursive       1

reverse_contrast_spec <- make_jlmer_spec(
  formula = aggbaseline ~ 1 + script + (1 | subject),
  data = rev_contrast_df,
  subject = "subject", time = "timebins", trial = "trial_type"
)
set_rng_state(123L)
clusterpermute(
  reverse_contrast_spec,
  statistic = "chisq",
  threshold = 0.05,
  nsim = 100,
  progress = FALSE
)
#> $null_cluster_dists
#> ── Null cluster-mass distribution (chisq p < 0.05) ───── <null_cluster_dists> ──
#> script (n = 100, df = 1)
#>   Mean (SD): -1.132 (12.58)
#>   Coverage intervals: 95% [-32.649, 21.081]
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> $empirical_clusters
#> ── Empirical clusters (chisq p < 0.05) ───────────────── <empirical_clusters> ──
#> script (df = 1)
#>   [0, 100]: 9.770 (p=0.1881)
#>   [900, 1000]: -8.521 (p=0.2475)
#>   [1900, 2500]: 73.440 (p=0.0099)
#> ────────────────────────────────────────────────────────────────────────────────

Or we could also sum-code and set “print” to -1 and “cursive” to 1:

sum_contrast_df <- cursive_agg
sum_contrast_df$script <- factor(sum_contrast_df$script)
contrasts(sum_contrast_df$script) <- contr.sum(2)
contrasts(sum_contrast_df$script)
#>         [,1]
#> cursive    1
#> print     -1

sum_contrast_spec <- make_jlmer_spec(
  formula = aggbaseline ~ 1 + script + (1 | subject),
  data = sum_contrast_df,
  subject = "subject", time = "timebins", trial = "trial_type"
)
set_rng_state(123L)
clusterpermute(
  sum_contrast_spec,
  statistic = "chisq",
  threshold = 0.05,
  nsim = 100,
  progress = FALSE
)
#> $null_cluster_dists
#> ── Null cluster-mass distribution (chisq p < 0.05) ───── <null_cluster_dists> ──
#> script (n = 100, df = 1)
#>   Mean (SD): -1.132 (12.58)
#>   Coverage intervals: 95% [-32.649, 21.081]
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> $empirical_clusters
#> ── Empirical clusters (chisq p < 0.05) ───────────────── <empirical_clusters> ──
#> script (df = 1)
#>   [0, 100]: 9.770 (p=0.1881)
#>   [900, 1000]: -8.521 (p=0.2475)
#>   [1900, 2500]: 73.440 (p=0.0099)
#> ────────────────────────────────────────────────────────────────────────────────

Because CPA operates over the domain of the timewise statistics (and not the effect size), whether the difference between “print” and “cursive” is expressed as a unit of 1 (in treatment coding) or 2 (in contrast coding) has no bearing on the CPA.

To conclude, we re-run and time a 1000-simulation CPA using the intercept-only model with the new treatment coding of script where “print” is the reference level.

set_rng_state(123L)
system.time({
  final_CPA <- clusterpermute(
    reverse_contrast_spec,
    statistic = "chisq",
    threshold = 0.05,
    nsim = 1000,
    progress = FALSE
  )
})
#>    user  system elapsed 
#>    0.03    0.07   10.19

final_CPA
#> $null_cluster_dists
#> ── Null cluster-mass distribution (chisq p < 0.05) ───── <null_cluster_dists> ──
#> script (n = 1000, df = 1)
#>   Mean (SD): -0.107 (16.50)
#>   Coverage intervals: 95% [-36.966, 41.530]
#> ────────────────────────────────────────────────────────────────────────────────
#> 
#> $empirical_clusters
#> ── Empirical clusters (chisq p < 0.05) ───────────────── <empirical_clusters> ──
#> script (df = 1)
#>   [0, 100]: 9.770 (p=0.2108)
#>   [900, 1000]: -8.521 (p=0.2358)
#>   [1900, 2500]: 73.440 (p=0.0090)
#> ────────────────────────────────────────────────────────────────────────────────

Finally, annotating just the significant cluster on the figure:

signif_clusters_df <- tidy(final_CPA$empirical_clusters) %>%
  filter(pvalue < 0.05)
signif_clusters_df
#> # A tibble: 1 × 7
#>   predictor id    start   end length sum_statistic  pvalue
#>   <chr>     <fct> <dbl> <dbl>  <dbl>         <dbl>   <dbl>
#> 1 script    3      1900  2500      7          73.4 0.00899

script_fig +
  geom_segment(
    aes(x = start, xend = end, y = -Inf, yend = -Inf),
    color = "steelblue", linewidth = 10,
    inherit.aes = FALSE,
    data = signif_clusters_df
  ) +
  geom_text(
    aes(
      y = -Inf, x = start + (end - start) / 2,
      label = paste("p =", signif(pvalue, 3))
    ),
    vjust = -2,
    inherit.aes = FALSE,
    data = signif_clusters_df
  )

Background

Outline

A) Prepping a specification object

B) CPA with default statistic = "t"

C) Comparing “t” vs. “chisq”

D) Specifying random effects

E) Contrast coding

B) CPA with default `statistic = "t"`