Skip to contents

Short version

Are you an experienced user wanting to fully master the grammar of ggplot? Start with my rstudio::conf(2022) talk on {ggtrace}!

Are you an aspiring developer wanting to extend ggplot? Start with my useR! 2022 talk on {ggtrace}!

Long version

There’s a lot that can be said about ggplot internals, and what you can get out of using ggtrace depends on how comfortable you are with ggplot and ggplot internals. Whether you’re here because you want to become a better ggplot user or because you’re an aspiring extension developer, ggtrace has you covered for all stages of your ggplot journey!

Traditionally, the ggplot community has been thought to split between users (people who use ggplot to make plots) and developers (people who write extension packages and contribute to ggplot2). I believe that this binary distinction is outdated for many reasons, a recent one being that the capacity of the user is ever-expanding and encroaching on the “internals” territory. While the distinction between user-facing code and ggplot internals is clear, that doesn’t map neatly onto the user-developer dichotomy.

Here is one attempt at trying to address that issue. The following outlines the five “stages” of the ggplot2 journey, and where ggtrace fits in.

Stage 1 (experienced user)

You are someone who is comfortable with using ggplot but have not heard about ggplot internals before. If you are wondering why you’d even bother learning about ggplot internals, see part 1 of my blog post series on delayed aesthetic evaluation, a case study of a set of somewhat niche {ggplot2} functions that lie at the intersection of user-facing code and ggplot internals. You might also want to start with another blog post of mine on stat_*() layers, for extra scaffolding. These will give you a practical background on ggplot internals, just to get started. Hopefully this will also get you excited about the internals too.

Learning objectives:

  • Every layer has a stat and a geom. The stat_*() and geom_*() layer functions are two sides of the same coin.

  • The job of users ends with the ggplot code we write. The job of the internals is to spell out the assumptions behind the concise code that we write as users to make the figure.

  • Each layer has an underlying dataframe representation that contains only the kind of information that’s relevant for drawing that layer.

  • We can use ggtrace to reference internals snapshots of a layer’s data for declaring more complex aesthetic mappings and using unconventional stat-geom pairs in a layer.

Stage 2 (curious user)

You are someone who recognizes the division of labor between the user and the internals, and feel empowered by this knowledge. To continue to get a better sense of what the internals does, I recommend watching my rstudio::conf(2022) talk on {ggtrace} and ggplot internals. If that felt a little too fast/dense, you can watch a slower (hour-long), broken down version of me covering the same content. This is where I showcase the inspect family of workflow functions in ggtrace and it is also where you will be introduced to ggproto. The talks cover the content from Chapter 20.2 of the ggplot2 book, but with emphasis on practicality for the user. I’d actually recommend reading that entire book chapter anyways (or read it with a recording of me going over Part 1 and Part 2 of the chapter), as it covers many fundamental concepts that take some time to digest. As you read through it, I also highly recommend referencing Emi Tanaka’s awesome slides on ggplot2 internals and Bob Rudis’s short chapter on demystifying ggplot2 as companion guides.

Learning objectives:

  • A ggplot object is not the ggplot figure itself. A ggplot object merely contains the instructions for plotting the figure. The figure is what you get as a result of executing those instructions in the internals, by print()/plot()-ing the ggplot object.

  • The the process of making a figure in the internals happens in steps, by first making each layer’s data drawing-ready (ggplot_build()) and then drawing the plot (ggplot_gtable()).

  • The internals are implemented in the ggproto object oriented system which is difficult to grok, but a lot of it is just data wrangling. We can get pretty far in understanding the internals by using ggtrace to just focus on the part of the internals that takes the user-supplied data and turns it into a “drawing-ready” data.

Stage 3 (aspiring developer)

You are someone who is aware of the existence of ggproto, and you are interested in knowing more about the implementational details of the internals. You can gear up for a deep dive into the internals by first watching Thomas Lin Pedersen’s rstudio::conf(2020) talk on extending ggplot, which gives a nice overview of what kind of ggproto objects and methods exist and what the most relevant ones are / what they do. In case you want a more comprehensive documentation on ggproto (though not necessary to read them at this stage), check out the package vignette on ggproto and Brodie Gaslam’s even more comprehensive unofficial reference for ggplot internals.

You can follow up on Thomas’s talk by watching my useR! 2022 talk on {ggtrace} and ggplot internals, which showcases all workflows in ggtrace (inspect, capture, highjack) for interacting with ggproto methods. The part 2 of my blog post series on delayed aesthetic evaluation picks up where I left off in that talk to expand on the possible extension points to different Stat methods.

That leads nicely into the case study chapter of the ggplot2 book. It’s a huge chapter, so if you want some diversity you can also reference Emi’s slides on writing ggplot2 extensions ggplot and read the package vignette on extending ggplot side-by-side, which touches on the same topics but with different examples. In the process, you’ll inevitably encounter {grid}, which is itself a scary beast. There are a lot of resources on {grid}, most notably the R Graphics (3rd edition) book, but there are also some resources written with ggplot in mind, like yet another one of Emi’s slides on {grid}, and functions for interacting with ggplot’s gtable graphical objects from {lemon} and {gridExtra}.

Learning objectives:

  • ggproto methods are called step-by-step in the internals to the execute instructions for plotting.

  • The ggproto objects Stat and Geom do a lof of the work, and offer powerful extension points.

  • ggprotos are mostly stateless and ggproto methods are essentially functions, though they defy common expectations about how a function should look like and behave. ggtrace allows you to interact with these ggproto methods as if they are stand-alone functions, so you can learn their behavior through trial and error.

Stage 4 (developer)

You understand the role that ggproto objects and methods play in the internals and you are excited about writing your own extensions. At this point you are now a developer - your training wheels are off and you’re in the territory of figuring things out for yourself.

Being a developer requires a new skill - debugging. A few people have written on the topic of debugging ggplot internals, including Hiroaki Yutani’s blog post on using browser() for debugging ggproto methods and Dewey Dunnington’s {ggdebug} package which gives you freakishly powerful control over the internals. People have also written packages for less “intrusive” ways of debugging and interacting with the internals, including {gginnards} by Pedro J. Aphalo and {gggrid} by Paul Murrell.

Standing on the shoulders of these giants, ggtrace aims to offer the best of both worlds for developers, with high-level workflow functions in the form of ggtrace_{action}_{value}(), the low-level functions ggtrace() and with_ggtrace(), and the interactive debugging functions ggedit() and ggdebugonce(). There’s not a whole lot of new stuff that ggtrace offers in this space (the package isn’t even that much code) but it embodies a transformative reframing of ggplot internals as functional programming, the kind that we’re familiar with as R users. By treating ggproto methods like functions, we can leverage our existing debugging skills for understanding the extending ggplot internals.

Learning objectives:

  • Only a small subset of ggprotos are exported by ggplot2 and available for subclassing, and only a handful of ggproto methods are productive extension points. A big part of developing an extension is locating the appropriate extension point.

  • ggtrace allows developers to work backwards from getting the desired output to work first, then identifying the implementational details that need to be changed to produce that output. In other words, you can find a hack that works first through trial and error, then develop on a principled way of implementing the solution by following best practices for extending ggplot.

  • In the process of writing new ggproto objects and ggproto methods, developers need to debug frequently. ggtrace functions implement different strategies of debugging, spanning the spectrum of interactive to programmatic.

Stage 5 (contributor)

Lastly, ggplot2 is always evolving, and you can take part in the process! It might not be obvious as users, but the internals are undergoing constant change. So at some point you might also want to start keeping an eye on the Github issues. It’s also a nice place to be because you can eavesdrop on the thoughts and insights from the core developers (e.g., should users be able to specify pieces of a scale_*() modularly?, how should the guides system be converted to ggproto?, should layers get “state”?), which can help you understand the motivation behind how and why things in the internals are designed that way. If reading those discussions inspire new ideas or strong feelings, you can submit an issue or PR to make your voice heard.