Notes from attending and speaking at my first R conference
I’ve had my eyes on useR!
2022 ever since they announced it - I had my package {ggtrace}
that I’d been developing for about half a year by the time calls for
abstracts were announced, and thought that it’d make a good submission
for my “debut” into the “academic” R world. I’ve built up a bit of a
resistance against going to online conferences over the covid years but
I’ve never been to an R conference before and I’ve heard good things
about the last virtual useR!
2021, so I was actually very excited about the prospect of
attending.
In short I’m very very glad I did and it was very educational! It was a great first R conference for me.
Also if you’re wondering why I’m writing this over a month after the conference ended, it’s because I had to immediately switch gears to rstudioconf prep (sorry!).
The submission process was incredibly simple,1 and that low barrier to entry was part of the reason that convinced me to apply. The submission form consisted of a 250-word abstract that I typed up into a textbox input field, along with relevant links about my project.2
I also applied for the diversity scholarship, which was a separate form sent to me after I got accepted for a talk. I’ve never applied for these kind of things before, but when it came to R I felt like I could really use this to advocate for myself and others who share my background (students in traditionally “humanities” and “soft science” departments) to convince their programs about the benefits of students being involved in the R community. I’m fortunate enough to be in a program that’s recently been making a lot of effort to incorporate R/programming into doing science,3 but me, being selfish, wanted even more R in my life. The conference generously offered me the diversity scholarship and I am incredibly honored! It covered my registration fee and allowed me to attend two workshops, which I’ll talk about later.
Here’s the timeline of how things went for me:
Here are my notes from the workshops and some of the talks I attended, in no particular order. Keep in mind that I was attending from home in Korea, so I missed a few live talks and didn’t have great notes for others (I’ll be catching up through the recordings).
Recordings are available on the conference youtube channel
Isabella Bicalho Frazeto Introduction to dimensional reduction in R: First off, big respect to Isabella for leading a 3.5 hour workshop all by herself. Content-wise, the workshop was a balanced, not-too-overwhelming overview of dimensional reduction,4 starting with a mini lecture of dimensional reduction methods in Part 1, followed by a hands-on walkthrough of packages and functions in R implementing these methods in Parts 2 and 3. I knew only very little about dimensional reduction going in, and I came away from it with at least a conceptual understanding of PCA and ICA, as well as the important difference between the two in theory and practice. Getting an answer to “what the heck is PCA/ICA?” was my main goal going into the workshop, so I’m very satisfied with what I learned. Isabella was also nice enough to answer my question about the difference between PCA and fPCA during break time - she had never heard about it before,5 but she did a quick research on it live, read a whole wikipedia article about fPCA, and then dumb-ed it down for me all in the span of like 3 minutes.
Danielle
Navarro, Jonathan Keane, & Stephanie Hazlitt
Larger-than-memory data workflows with Apache Arrow: I
don’t personally work with big data much but a lot of people around me
do. And when they do, they often have run into issues trying to analyze
their data in-memory, so I figured I’d learn Arrow and “spread the
word,” so to speak. The workshop was AWESOME and used a ton of carefully crafted learning
materials, which are also available online. To be honest I took this
live in the middle of the night for me in Korea time, so I wasn’t that
engaged with the live exercises, but the online materials are so well
organized that I feel like I didn’t miss much between that and the
workshop recording. Beyond the actual content of workshop exercises, I
especially liked that a good chunk of time was spent going through the
design of Arrow and where Arrow fits in the data-analysis pipeline and
how it compares to alternatives in the data storage ecosystem. Also
{arrow}
+ {duckdb}
looks very cool - I’m now
an official convert!
Paula Moraga: This was so interesting even for me knowing nothing about public health and spatial statistics going in, and I learned a lot from the talk. It was very information-packed and really showed how you could use R in literally all aspects of research. I was especially impressed by the idea that you could analyze your uncertainty about a measure (e.g., %vaccinated in a region) as data itself, and use that to inform future data collection decisions. It was refreshing to hear uncertainty being talked about in that way because I often only hear people talking about proactive solutions for minimizing uncertainty, or about improving precision in quantifying uncertainty. Paula’s work was like a special kind of applied statistics which explicitly leverages the fact that you can drive future data collection practices using properties of uncertainty in current/prior data.
Amanda Cox: Words cannot express how good this talk was. Amanda Cox is a force of nature. I was super looking forward to this given my interests in data viz and data journalism, and I was not disappointed. There were so many “mind = blown” moments but to keep things short, here are my two favorites:
You Draw It: How Family Income Predicts Children’s College Chances: Everyone knows that income and education are highly linked, but just how tight is the connection? Turns out that when the NYT team was investigating this question, they ended up with data that showed a suspiciously linear trend to the point of appearing almost uninteresting and sterile. But they turned this mundaneness of the data into a surprise factor by testing it against people’s expectations, allowing readers to draw their own trend line to compare to the data. Not only did that get people engaged, people got creative with what they drew6 resulting in the piece creating a very rich interactive experience. The team correctly determined that the data was boring and uninteresting to plot, but then they turned it around and made something magnificent out of the situation.
One Race, Every Medalist Ever: Amanda introduced this piece as an example of how R could be used anywhere, even in places where it didn’t need to be used.7 She made us watch a whooping 2:30 minute video about olympic sprinters, only to reveal that R was involved in something seemingly trivial.8 I love the fact that she did this build-up in the talk though - I wouldn’t have fully appreciated the point if she just said it outright.
Amelia McNamara Teaching modeling in introductory statistics: A comparison of formula and tidyverse syntaxes: I’ve been hearing about this project on twitter for a while and this was my first time listening to a talk about it. I think this is super neat and this kind of “meta” research on R use, with R, is something that I’ve been interested in as well9. A lot of thought went into the design of the teaching and I admired the robustness in the experimental design and how much thought went into it. It’s also just always interesting to hear about how things are like to novices - you don’t get that from introspection anymore as experienced users.
Jonathan Love jamovi
: An
R-based statistical spreadsheet for the masses: I’ve actually never
seen a WYSIWYG-style statistical program in action. I was fortunate
enough to go through school during the R revolution in university intro
stats courses,10 but I always appreciated people who
worked on these GUI-based tools for beginners because boy learning R at
first was so hard! I was really surprised at the fact that jamovi could
produce full reports and that it also had like 40-50
community-contributed modules. Also TIL jamovi’s sister program jasp
stands for “Jonathan’s Awesome Statistical Program,”
Carsten Lange A better way to teach
histograms, using the TeachHist package: I love it when a package
just aims to do one thing and does that one thing well.
{TeachHist}
is that package - it runs a shiny app that
teaches intro stats students the motivation and intuition behind
histograms as a data viz tool, and how to read statistical properties of
the data off of histograms. Purists will not like the design of the
functions in the package (e.g., you have big functions with lots of
arguments vs. modules composing a histogram like in
{ggplot2}
) but I really liked how this was designed with
non-programmer students in mind.11 I also really liked
that the talk live-demoed the package/app, since that also demos the
real-world case of when you’d be using it (live, in front of
students).
Kirsten B. Gorman et al. The untold story of
palmerpenguins
: This talk was actually a call for a
lot of self-reflection for me, because to be honest when
{palmerpenguins}
first came out and there was big hype
around it on twitter, I didn’t understand really understand why. I
thought, “it’s just a data package,” and found the dataset to be
underwhelming (as in, it’s not exactly a big dataset). Of course, I’ve
done a complete 180° since then and I adore this package now - the data
just works for many different kinds of simple reprexes that I
make for teaching and answering questions.12
But also there’s the side of the “story of palmerpenguins” that’s not at
all about the data itself, which was the focus of this talk - the side
about the people and the community around this project and all the
effort they put in to make palmerpenguins
dethrone the
iris
dataset and get representation in other big-name
places like tensorflow.
Because, you know what? If someone asked me to make a data package go
viral I’d have no idea where to even start, and that thought racing past
my head while I was listening to this talk was really humbling. I guess
that’s really what a lot of what making a good data package for
pedagogical purposes is, right?13 Its partly about the
actual data, but it’s also about the framing, the “marketing”,
the cuteness factor of penguins, all the hand-made art, the awesome
pkgdown website,14 and more. The fact that you can get
people excited about a data package is insane, and I really liked
hearing about all the small-to-big issues the
{palmerpenguins}
team had to think through.15
Like, did you know about palmerpenguins::path_to_file()
for
teaching students how to read in files from a path?
{gifski}
, {sinab}
, {string2path}
).
As expected, all the technical details of the talk went over my head,
but it was well-motivated and I was made aware of the fact that there
are competing frameworks for R-Rust interoperability (rextender
and cargo
).
I wasn’t planning on picking up Rust anytime soon, so I guess I’ll watch
out for a CRAN taskview or something.Patrick Weiss et al. Tidy Finance with
R: You know those projects that you stumble upon and it’s your
first time hearing about it but it’s had years of work behind it and
you’re like “how did I not know about this before”? Well this is one of
those, and the Tidy Finance with
R project deserves more love! I’m not in this field but it’s always
exciting when people write intro R/data science books because there are
always hidden gems in there, even if you’re an experienced R user and
not from that field. For example, there’s a lot of cool
timeseries/{lubridate}
stuff in the book, so I’ll probably
be using this as reference for those at least.
Bryan Shalloway Five ways to do “tidy”
pairwise operations: I’ve actually been following this project for
a while as Bryan has been
tweeting about it, and I think it’s really neat. We don’t often do
operations over pairs of vectors, but this kind of workflow is such a
well-defined class of problems to tackle that it’s definitely worthwhile
spending time to figure out the right design for implementing it (kind
of like how {slider}
does that for the class of sliding-window functions). The talk was a
great overview of something I knew close to nothing about, like the fact
that {corrr}
has the colpair_map()
functional
for arbitrary pairwise operations.
ggplot2
Jonathan Carroll ggeasy
: Easy
access to ggplot2
commands: I’ve been hearing about
this package a lot and I think it’s one-of-a-kind package addresses a
very common need among ggplot users. It works surprisingly well since
theme()
is modular to begin with,17
so you can layer the shorthand ggeasy::easy_*()
functions
as if you’re adding layers - no extra overhead for learners/users! I
asked a question about dealing with vocabulary differences (e.g.,
“panel” = “facet” = “small multiple”) and Jonathan answered that the aim
is to support all variants, so go open that PR if your dialect/idiolect
isn’t represented!
Nicola Rennie Learning ggplot2
with generative art: I LOVED this talk! The talk started with a
great point about how it’s kinda intimidating/hard for people to get
into generative art because you don’t just start by copying code for
generative art - art is about the process, so it doesn’t lend
itself to how people normally learn things in programming, which is by
copying code just to get something working first and then going from
there. I totally related to that because that was my case too. I feel
more confident about it now, after seeing examples from the talk about
how code can inform art and vice-versa. For example, there was a really
handy trick of “adding arbitrary layers” in a plot by just adding whole
ggplots with transparent backgrounds on top of each other using {patchwork}
.
I’m definitely stealing this idea, for art or not.
James Otto (& David Kahle)
ggdensity
: Improved bivariate density visualization in
R: {ggdensity}
is an example of my favorite kind of packages. It offers a drop-in
replacement for a standard function that improves upon its defaults
(from ggplot2::geom_density_2d_filled()
to
ggdensity::geom_hdr()
), while providing additional
capabilities that satisfies folks who want/need to do more complicated
things. Like I’ve never thought more than 30 seconds about the design
choices of making a density plot - “It’s just a density plot, what more
do I need?” And then now I’m like “okay I want all of those cool new
features.” I’m gonna take this moment to put it on record that I
actually used {ggdensity}
the DAY that its CRAN release was
announced, to make a plot in a conference proceedings paper which
(fingers crossed) will get accepted soon. We got reviews back recently
and the reviewers really liked the plot, so this talk was extra
special.
Me Stepping into ggplot2
internals with ggtrace
: I spoke in this session and
here’s the repo to the talk materials
if you want to check it out. I don’t know what to say for my own talk
but I’ll just mention that I prepped this talk with the rstudioconf talk
in mind, so the two talks touch on different aspects of {ggtrace}
/ggplot
internals. For this talk, I focused more on giving a walkthrough of how
the Stat and Geom ggprotos work under the hood, exposing its
functional-programming nature. I’ll probably have more to say about
{ggtrace}
when I write my rstudioconf reflection, so stay
tuned!
Nothing like the 1-2 page length, properly formatted abstracts like in academic conferences.↩︎
I linked to the Github repository and package website.↩︎
For example, our department offered its first data science course for undergrads last year, which I was the head TA for.↩︎
Actually I still haven’t figures out the proper(?) term between dimension/dimension-al/dimension-ality reduction.↩︎
Which makes sense - turns out that fPCA is kinda domain-specific and designed for the analysis of acoustic data specifically.↩︎
A move that the team also pre-empted with semi-customized feedback.↩︎
Big mood - me too.↩︎
It was used for the last few seconds of the video to create the bell sound effect to “show” what the finish times would “sound” like, to emphasize the point that all sprinters are similarly fast in the grand scheme of things.↩︎
When I TA-ed for an undergrad data science course last year, we used Google Colab, and I wondered the pros and cons of that vs. RStudio Cloud.↩︎
When I was a junior, my alma mater Northwestern University started a data science program with a handful of teaching professor hires to focus on educating undergrads.↩︎
Like, do you know what all the
possible arguments to geom_histogram()
is? You probably
don’t and they’re actually pretty hard to find, but you wouldn’t have
that problem if you had one function with a bunch of arguments that you
can find on just the function’s help page with no redirects.↩︎
And it has a lot of “data lessons” built-in, like you can use it to demonstrate the Simpson’s paradox and k-means clustering.↩︎
In fact, one of the really insightful parts of the talk was Alison talking on the slide “Why has it been so popular?”.↩︎
Which I embarassingly never visited before this talk.↩︎
e.g., Adelie is always capitalized because it’s named after a person, but optional for chinstraps and gentoos. The data itself capitalizes all of them.↩︎
I didn’t even know Twitter Space was a thing before this, despite spending a lot of time on Twitter.↩︎
You can chain
+ theme()
’s.↩︎