June Choe: Reflections on useR! 2022

Context

I’ve had my eyes on useR! 2022 ever since they announced it - I had my package {ggtrace} that I’d been developing for about half a year by the time calls for abstracts were announced, and thought that it’d make a good submission for my “debut” into the “academic” R world. I’ve built up a bit of a resistance against going to online conferences over the covid years but I’ve never been to an R conference before and I’ve heard good things about the last virtual useR! 2021, so I was actually very excited about the prospect of attending.

In short I’m very very glad I did and it was very educational! It was a great first R conference for me.

Also if you’re wondering why I’m writing this over a month after the conference ended, it’s because I had to immediately switch gears to rstudioconf prep (sorry!).

Submission process

The submission process was incredibly simple,¹ and that low barrier to entry was part of the reason that convinced me to apply. The submission form consisted of a 250-word abstract that I typed up into a textbox input field, along with relevant links about my project.²

I also applied for the diversity scholarship, which was a separate form sent to me after I got accepted for a talk. I’ve never applied for these kind of things before, but when it came to R I felt like I could really use this to advocate for myself and others who share my background (students in traditionally “humanities” and “soft science” departments) to convince their programs about the benefits of students being involved in the R community. I’m fortunate enough to be in a program that’s recently been making a lot of effort to incorporate R/programming into doing science,³ but me, being selfish, wanted even more R in my life. The conference generously offered me the diversity scholarship and I am incredibly honored! It covered my registration fee and allowed me to attend two workshops, which I’ll talk about later.

Here’s the timeline of how things went for me:

March 4: I submitted my abstract for a talk
March 15: Abstract deadline for talks and elevator pitches
April 19: Notification of acceptance
April 27: Invitation to apply for the diversity scholarship
May 13: Notification of diversity scholarship offer

Walkshops and talks I attended

Here are my notes from the workshops and some of the talks I attended, in no particular order. Keep in mind that I was attending from home in Korea, so I missed a few live talks and didn’t have great notes for others (I’ll be catching up through the recordings).

Recordings are available on the conference youtube channel

Workshops

Isabella Bicalho Frazeto Introduction to dimensional reduction in R: First off, big respect to Isabella for leading a 3.5 hour workshop all by herself. Content-wise, the workshop was a balanced, not-too-overwhelming overview of dimensional reduction,⁴ starting with a mini lecture of dimensional reduction methods in Part 1, followed by a hands-on walkthrough of packages and functions in R implementing these methods in Parts 2 and 3. I knew only very little about dimensional reduction going in, and I came away from it with at least a conceptual understanding of PCA and ICA, as well as the important difference between the two in theory and practice. Getting an answer to “what the heck is PCA/ICA?” was my main goal going into the workshop, so I’m very satisfied with what I learned. Isabella was also nice enough to answer my question about the difference between PCA and fPCA during break time - she had never heard about it before,⁵ but she did a quick research on it live, read a whole wikipedia article about fPCA, and then dumb-ed it down for me all in the span of like 3 minutes.
Danielle Navarro, Jonathan Keane, & Stephanie Hazlitt Larger-than-memory data workflows with Apache Arrow: I don’t personally work with big data much but a lot of people around me do. And when they do, they often have run into issues trying to analyze their data in-memory, so I figured I’d learn Arrow and “spread the word,” so to speak. The workshop was AWESOME and used a ton of carefully crafted learning materials, which are also available online. To be honest I took this live in the middle of the night for me in Korea time, so I wasn’t that engaged with the live exercises, but the online materials are so well organized that I feel like I didn’t miss much between that and the workshop recording. Beyond the actual content of workshop exercises, I especially liked that a good chunk of time was spent going through the design of Arrow and where Arrow fits in the data-analysis pipeline and how it compares to alternatives in the data storage ecosystem. Also {arrow} + {duckdb} looks very cool - I’m now an official convert!

Keynotes

Paula Moraga: This was so interesting even for me knowing nothing about public health and spatial statistics going in, and I learned a lot from the talk. It was very information-packed and really showed how you could use R in literally all aspects of research. I was especially impressed by the idea that you could analyze your uncertainty about a measure (e.g., %vaccinated in a region) as data itself, and use that to inform future data collection decisions. It was refreshing to hear uncertainty being talked about in that way because I often only hear people talking about proactive solutions for minimizing uncertainty, or about improving precision in quantifying uncertainty. Paula’s work was like a special kind of applied statistics which explicitly leverages the fact that you can drive future data collection practices using properties of uncertainty in current/prior data.
Amanda Cox: Words cannot express how good this talk was. Amanda Cox is a force of nature. I was super looking forward to this given my interests in data viz and data journalism, and I was not disappointed. There were so many “mind = blown” moments but to keep things short, here are my two favorites:
- You Draw It: How Family Income Predicts Children’s College Chances: Everyone knows that income and education are highly linked, but just how tight is the connection? Turns out that when the NYT team was investigating this question, they ended up with data that showed a suspiciously linear trend to the point of appearing almost uninteresting and sterile. But they turned this mundaneness of the data into a surprise factor by testing it against people’s expectations, allowing readers to draw their own trend line to compare to the data. Not only did that get people engaged, people got creative with what they drew⁶ resulting in the piece creating a very rich interactive experience. The team correctly determined that the data was boring and uninteresting to plot, but then they turned it around and made something magnificent out of the situation.
- One Race, Every Medalist Ever: Amanda introduced this piece as an example of how R could be used anywhere, even in places where it didn’t need to be used.⁷ She made us watch a whooping 2:30 minute video about olympic sprinters, only to reveal that R was involved in something seemingly trivial.⁸ I love the fact that she did this build-up in the talk though - I wouldn’t have fully appreciated the point if she just said it outright.

R in Teaching

Amelia McNamara Teaching modeling in introductory statistics: A comparison of formula and tidyverse syntaxes: I’ve been hearing about this project on twitter for a while and this was my first time listening to a talk about it. I think this is super neat and this kind of “meta” research on R use, with R, is something that I’ve been interested in as well⁹. A lot of thought went into the design of the teaching and I admired the robustness in the experimental design and how much thought went into it. It’s also just always interesting to hear about how things are like to novices - you don’t get that from introspection anymore as experienced users.
Jonathan Love jamovi: An R-based statistical spreadsheet for the masses: I’ve actually never seen a WYSIWYG-style statistical program in action. I was fortunate enough to go through school during the R revolution in university intro stats courses,¹⁰ but I always appreciated people who worked on these GUI-based tools for beginners because boy learning R at first was so hard! I was really surprised at the fact that jamovi could produce full reports and that it also had like 40-50 community-contributed modules. Also TIL jamovi’s sister program jasp stands for “Jonathan’s Awesome Statistical Program,”
Carsten Lange A better way to teach histograms, using the TeachHist package: I love it when a package just aims to do one thing and does that one thing well. {TeachHist} is that package - it runs a shiny app that teaches intro stats students the motivation and intuition behind histograms as a data viz tool, and how to read statistical properties of the data off of histograms. Purists will not like the design of the functions in the package (e.g., you have big functions with lots of arguments vs. modules composing a histogram like in {ggplot2}) but I really liked how this was designed with non-programmer students in mind.¹¹ I also really liked that the talk live-demoed the package/app, since that also demos the real-world case of when you’d be using it (live, in front of students).
Kirsten B. Gorman et al. The untold story of palmerpenguins: This talk was actually a call for a lot of self-reflection for me, because to be honest when {palmerpenguins} first came out and there was big hype around it on twitter, I didn’t understand really understand why. I thought, “it’s just a data package,” and found the dataset to be underwhelming (as in, it’s not exactly a big dataset). Of course, I’ve done a complete 180° since then and I adore this package now - the data just works for many different kinds of simple reprexes that I make for teaching and answering questions.¹² But also there’s the side of the “story of palmerpenguins” that’s not at all about the data itself, which was the focus of this talk - the side about the people and the community around this project and all the effort they put in to make palmerpenguins dethrone the iris dataset and get representation in other big-name places like tensorflow. Because, you know what? If someone asked me to make a data package go viral I’d have no idea where to even start, and that thought racing past my head while I was listening to this talk was really humbling. I guess that’s really what a lot of what making a good data package for pedagogical purposes is, right?¹³ Its partly about the actual data, but it’s also about the framing, the “marketing”, the cuteness factor of penguins, all the hand-made art, the awesome pkgdown website,¹⁴ and more. The fact that you can get people excited about a data package is insane, and I really liked hearing about all the small-to-big issues the {palmerpenguins} team had to think through.¹⁵ Like, did you know about palmerpenguins::path_to_file() for teaching students how to read in files from a path?

Building the R Community 1

Njoki Lucy et al. Building an R-Ladies community during the Covid-19 pandemic: I had been hearing A LOT about the RLadies Nairobi team on twitter (from Njoki, Shel, and others) leading up to this talk, and I’ve always been super impressed at their efforts. They’re so innovative and social-media-savvy despite being a new group. For example, a while back they hosted a Twitter Space¹⁶ and invited Shannon Pileggi for a chat. I I attended that and thought it was so impressive and creative - they have practically mastered the art of getting people together online. One big wow moment from the talk was that they apparently started planning the chapter before covid, and had to pivot to starting it online, all in the span of like half a year!

Interfaces with C, C++, Rust, and V

David B. Dahl Writing R extensions in Rust: I’m not familiar with any “low level” programming languages, but Rust has been catching my eye recently because a few R folks were doing cool stuff in it (e.g., {gifski}, {sinab}, {string2path}). As expected, all the technical details of the talk went over my head, but it was well-motivated and I was made aware of the fact that there are competing frameworks for R-Rust interoperability (rextender and cargo). I wasn’t planning on picking up Rust anytime soon, so I guess I’ll watch out for a CRAN taskview or something.

Expanding Tidyverse

Patrick Weiss et al. Tidy Finance with R: You know those projects that you stumble upon and it’s your first time hearing about it but it’s had years of work behind it and you’re like “how did I not know about this before”? Well this is one of those, and the Tidy Finance with R project deserves more love! I’m not in this field but it’s always exciting when people write intro R/data science books because there are always hidden gems in there, even if you’re an experienced R user and not from that field. For example, there’s a lot of cool timeseries/{lubridate} stuff in the book, so I’ll probably be using this as reference for those at least.
Bryan Shalloway Five ways to do “tidy” pairwise operations: I’ve actually been following this project for a while as Bryan has been tweeting about it, and I think it’s really neat. We don’t often do operations over pairs of vectors, but this kind of workflow is such a well-defined class of problems to tackle that it’s definitely worthwhile spending time to figure out the right design for implementing it (kind of like how {slider} does that for the class of sliding-window functions). The talk was a great overview of something I knew close to nothing about, like the fact that {corrr} has the colpair_map() functional for arbitrary pairwise operations.

Learning `ggplot2`

Jonathan Carroll ggeasy: Easy access to ggplot2 commands: I’ve been hearing about this package a lot and I think it’s one-of-a-kind package addresses a very common need among ggplot users. It works surprisingly well since theme() is modular to begin with,¹⁷ so you can layer the shorthand ggeasy::easy_*() functions as if you’re adding layers - no extra overhead for learners/users! I asked a question about dealing with vocabulary differences (e.g., “panel” = “facet” = “small multiple”) and Jonathan answered that the aim is to support all variants, so go open that PR if your dialect/idiolect isn’t represented!
Nicola Rennie Learning ggplot2 with generative art: I LOVED this talk! The talk started with a great point about how it’s kinda intimidating/hard for people to get into generative art because you don’t just start by copying code for generative art - art is about the process, so it doesn’t lend itself to how people normally learn things in programming, which is by copying code just to get something working first and then going from there. I totally related to that because that was my case too. I feel more confident about it now, after seeing examples from the talk about how code can inform art and vice-versa. For example, there was a really handy trick of “adding arbitrary layers” in a plot by just adding whole ggplots with transparent backgrounds on top of each other using {patchwork}. I’m definitely stealing this idea, for art or not.
James Otto (& David Kahle) ggdensity: Improved bivariate density visualization in R: {ggdensity} is an example of my favorite kind of packages. It offers a drop-in replacement for a standard function that improves upon its defaults (from ggplot2::geom_density_2d_filled() to ggdensity::geom_hdr()), while providing additional capabilities that satisfies folks who want/need to do more complicated things. Like I’ve never thought more than 30 seconds about the design choices of making a density plot - “It’s just a density plot, what more do I need?” And then now I’m like “okay I want all of those cool new features.” I’m gonna take this moment to put it on record that I actually used {ggdensity} the DAY that its CRAN release was announced, to make a plot in a conference proceedings paper which (fingers crossed) will get accepted soon. We got reviews back recently and the reviewers really liked the plot, so this talk was extra special.
Me Stepping into ggplot2 internals with ggtrace: I spoke in this session and here’s the repo to the talk materials if you want to check it out. I don’t know what to say for my own talk but I’ll just mention that I prepped this talk with the rstudioconf talk in mind, so the two talks touch on different aspects of {ggtrace}/ggplot internals. For this talk, I focused more on giving a walkthrough of how the Stat and Geom ggprotos work under the hood, exposing its functional-programming nature. I’ll probably have more to say about {ggtrace} when I write my rstudioconf reflection, so stay tuned!

Nothing like the 1-2 page length, properly formatted abstracts like in academic conferences.↩︎
I linked to the Github repository and package website.↩︎
For example, our department offered its first data science course for undergrads last year, which I was the head TA for.↩︎
Actually I still haven’t figures out the proper(?) term between dimension/dimension-al/dimension-ality reduction.↩︎
Which makes sense - turns out that fPCA is kinda domain-specific and designed for the analysis of acoustic data specifically.↩︎
A move that the team also pre-empted with semi-customized feedback.↩︎
Big mood - me too.↩︎
It was used for the last few seconds of the video to create the bell sound effect to “show” what the finish times would “sound” like, to emphasize the point that all sprinters are similarly fast in the grand scheme of things.↩︎
When I TA-ed for an undergrad data science course last year, we used Google Colab, and I wondered the pros and cons of that vs. RStudio Cloud.↩︎
When I was a junior, my alma mater Northwestern University started a data science program with a handful of teaching professor hires to focus on educating undergrads.↩︎
Like, do you know what all the possible arguments to geom_histogram() is? You probably don’t and they’re actually pretty hard to find, but you wouldn’t have that problem if you had one function with a bunch of arguments that you can find on just the function’s help page with no redirects.↩︎
And it has a lot of “data lessons” built-in, like you can use it to demonstrate the Simpson’s paradox and k-means clustering.↩︎
In fact, one of the really insightful parts of the talk was Alison talking on the slide “Why has it been so popular?”.↩︎
Which I embarassingly never visited before this talk.↩︎
e.g., Adelie is always capitalized because it’s named after a person, but optional for chinstraps and gentoos. The data itself capitalizes all of them.↩︎
I didn’t even know Twitter Space was a thing before this, despite spending a lot of time on Twitter.↩︎
You can chain + theme()’s.↩︎

Reflections on useR! 2022