2023 Year in Review

reflections

Reflections and updates on what I’ve been up to in 2023

June Choe (University of Pennsylvania Linguistics)https://live-sas-www-ling.pantheon.sas.upenn.edu/
2023-12-31
New year's eve celebration fireworks at Long Beach, CA.

Figure 1: New year’s eve celebration fireworks at Long Beach, CA.

Intro

I’ve been seeing a couple folks on Mastodon sharing their “year in review” blog posts, and I thought that was really cool, so I decided to write my own too! I’m mostly documenting for myself but hopefully this also serves as an update of a sort for my friends over the internet since I’ve been pretty silent online this year.

Research

Being the Good Grad Student™ I am, I’m forefronting my academia happenings first. In numbers, I published one paper, gave two talks, and presented three posters. I’m not super proud of those numbers: I think they’re a lot less than what people might expect from a 4th year PhD student. But a lot of effort went into each1 and 2023 overall has been a great year for refining and narrowing down on my dissertation topic.2 I did a ton of readings and I hope it pays off for next year when I actually get started on writing the thing.

I already document my research happenings elsewhere and I know that the primarily audience of my blog isn’t linguists, so I won’t expand on that more here.

Blogging

2023 was the year when it became painfully obvious to me that I don’t have much in terms of a portfolio in the sense of the buzzword-y “data science portfolio” that industry recruiters purportedly look for. This ironically coincided with another realization I had, which is that I’m increasingly becoming “the department tech/stats guy” where I take on many small tasks and favors from faculty and other students here and there; I truly do enjoy doing this work, but it’s completely invisible to my CV/resume. I’m still navigating this weird position I’m in, but I’ve found some nice tips3 and at least I still have another year until I’m on the job market to fully figure this out.

The reason why I put the above rant under the “Blogging” section is because my blog is the closest thing I have a portfolio - there’s not much here, but it’s a public-facing space I own where I get to show people what I know and how I think. So in 2023 I was more conscious about what I blog about and how. The change was subtle - my blog persona is still my usual self, but I’ve tried to diversify the style of my blogs. Whereas I mostly wrote long-form, tutorial-style blog posts in the past, I only wrote one such post (on dplyr::slice()) this year. My other blog posts were one reflecting on how to better answer other people’s questions, and another where I nerd out on the internals of {tidyselect} with little regard for its practicality.4.

All in all, I wrote three blog posts this year (not including this one). This is the usual rate of publishing blog posts for me, but I hope to write more frequently next year (and write shorter posts overall, and in less formal tone).

R stuff

I didn’t think I’d have much to say about the R stuff I did this year until I sat down to write this blog. Even though this year was the busiest I’ve ever been with research, it turns out that I still ended up doing quite a bit of R stuff in my free time. I’ll cover this chronologically.

At the beginning of the year, I was really lucky to receive the student paper award from the Statistical Computing and Graphics section of the ASA, writing about {ggtrace}.5 In the paper, I focused on {ggtrace} as a pedagogical tool for aspiring {ggplot2} extension developers. In the process, I rediscovered the power of reframing ggplot internals as data wrangling and went back to {ggtrace} to add a couple convenience functions for interactive use-cases. After over two years since its inception, {ggtrace} now feels pretty complete in terms of its core features (but suggestions and requests are always welcome!).

In Spring, I began writing {jlmerclusterperm}, a statistical package implementing the cluster-based permutation test for time series data, using mixed-effects models. This was a new challenge for me for two reasons. First, I wrote much of the package in Julia - this was my first time writing Julia code for “production” and within an R package.6 Second, I wrote this package for a seminar on eye movements that I was taking that Spring in the psychology department. I wrote {jlmerclusterperm} in an intense burst - most of it was complete by the end of May and I turned in the package as my final.7 I also gave a school-internal talk on it in April; my first time talking about R in front of an entirely academic audience.

In Summer, I continued polishing {jlmerclusterperm} with another ambitious goal of getting it to CRAN, at the suggestion of a couple researchers who said they’d like to use it for their own research. The already-hard task of getting through my first CRAN submission was compounded by the fact that the package contained Julia code - it took nine resubmissions in the span of two months to finally get {jlmerclusterperm} stably on CRAN.8

Group photo taken at SMLP2023.

Figure 2: Group photo taken at SMLP2023.

At the beginning of Fall, I attended the Advanced Frequentist stream of the SMLP2023 workshop, taught by Phillip Alday, Reinhold Kliegl and Douglas Bates. The topic was mixed-effects regression models in Julia, one that I became very excited about especially after working on {jlmerclusterperm}. It was an absolute blast and I wish that everyone in linguistics/psychology research appreciated good stats/data analysis as much as the folks I met there. The workshop was far away in Germany (my first time ever in Europe!) and I’m really thankful to MindCORE for giving me a grant to help with travel expenses.

For most of Fall, I didn’t do much R stuff, especially with the start of the Fall semester and a big conference looming on the horizon. But the little time I did spend on it, I worked on maintenance and upkeep for {openalexR}, one of my few collaborative projects. It’s also one of the few packages for which I’m an author of that I actually frequently use myself. I used {openalexR} a lot during the Fall semester for conducting literature reviews in preparation for my dissertation proposal, so I had a few opportunities to catch bugs and work on other improvements. I also spent a lot of my time in the Fall TA-ing for an undergraduate data science class that we recently started offering in our department. This was actually my third year in a row TA-ing it, so it went pretty smoothly. I even learned some new quirky R behaviors from my students along the way.

In October, I virtually attended the R/Pharma conference and joined a workshop on data validation using the {pointblank} package by Rich Iannone. I had used {pointblank} a little before, but I didn’t explore its features much because I thought it had some odd behaviors that I couldn’t comprehend. The workshop cleared up some of the confusion for me, and Rich made it clear in the workshop that he welcomed contributions to improve the package. So I made a PR addressing the biggest pain point I personally had with using {pointblank}. This turned out to be a pretty big undertaking which took over a month to complete. In the process, I become a co-author of {pointblank}, and I merged a series of PRs that improved the consistency of function designs, among other things.

The last R thing I did this year was actually secretly Julia - in December I gave a school-internal workshop on fitting mixed effects in Julia, geared towards an academic audience with prior experience in R. I advocated for a middle-ground approach where you can keep doing everything in R and RStudio, except move just the modelling workflow into Julia. I live-coded some Julia code and ran it from RStudio, which I think wasn’t too difficult to grasp.9 I have a half-baked package of addins to make R-Julia interoperability smoother in RStudio; I hope to wrap it up and share it some day.

That brings me to the present moment, where I’m currently taking a break from FOSS to focus on my research, as my dissertation proposal defense is coming up soon. I will continue to be responsive with maintaining {jlmerclusterperm} during this time (since there’s an active user-base of researchers who find it useful) but my other projects will become low priority. I also don’t think I’ll be starting a new project any time soon, but in the near future I hope I come up with something cool that lets me test-drive {S7}!

Personal

This year, I tried to be less of a workaholic. I think I did an okay job at it, and it mostly came in the form of diversifying my hobbies (R used to be my only hobby since starting grad school). I got back into ice skating10 and, briefly, swimming,11 and I’m fortunate that both are available literally two blocks away from my department. My girlfriend and I got really into escape rooms this year, mostly playing online ones due to budget constraints.12 I also got back into playing Steam games13 and racked up over 300 hours on Slay the Spire, mostly from the ~2 weeks recovering from covid in September.14

And of course, I have many people to thank for making this a wonderful year.15 Happy new year to all!


  1. I was the first author for all research that I presented, as is often the case in linguistics.↩︎

  2. Broadly, how kids learn words with overlapping meanings like “dalmatian”<“dog”<“animal” from the language input.↩︎

  3. Like this blog post.↩︎

  4. A style heavily inspired by some of my favorite R bloggers like Matt Dray and Jonathan Carroll↩︎

  5. Coincidentally, my girlfriend also won a student award this year from another ASA - the Acoustical Society of America.↩︎

  6. I can’t recommend {JuliaConnectoR} enough for this.↩︎

  7. I’m actually quite proud of myself for pulling this off - writing an R package for the final was unprecedented for the class.↩︎

  8. In the process, I received the elusive CRAN Note for exceeding 6 updates in under a month (CRAN recommends one update every 1-2 months).↩︎

  9. Using some tricks described in the workshop materials.↩︎

  10. I used to play ice hockey competitively as a kid.↩︎

  11. Turns out that swimming does not play well with my preexisting ear conditions.↩︎

  12. Most recently we played Hallows Hill which I think is the best one we’ve played so far↩︎

  13. I’m very into roguelike genres but haven’t really played video games since high school.↩︎

  14. For the fellow nerds, I reached A20 on Ironclad, Defect, and Watcher. I’m working my way up for Silent.↩︎

  15. I’m feeling shy so this goes in the footnotes. In roughly chronological order, I’m firstly indebted to Sam Tyner-Monroe who encouraged me write up {ggtrace} for the ASA paper award after my rstudio::conf talk on it last year. I’m grateful to Gina Reynolds and Teun van den Brand (and others in the ggplot extension club) for engaging in many insightful data viz/ggplot internals discussions with me. I’m also grateful to my FOSS collaborators, especially Trang Le, from whom I’ve learned a lot about code review and package design principles while working on {openalexR} together. Last but not least, I owe a lot to Daniel Sjoberg and Shannon Pileggi for a recent development that I’m not ready to publicly share yet 🤫.↩︎