Analysis of @everycolorbot’s tweets

data visualization ggplot2 rtweet colors

And why you should avoid neon colors

June Choe (University of Pennsylvania Linguistics)https://live-sas-www-ling.pantheon.sas.upenn.edu/
10-22-2020

Introduction

I, along with nearly two-hundred thousand other people, follow @everycolorbot on twitter. @everycolorbot is a twitter bot that tweets an image of a random color every hour (more details on github). It’s 70% a source of inspiration for new color schemes, and 30% a comforting source of constant in my otherwise hectic life.

What I’ve been noticing about @everycolorbot’s tweets is that bright, highly saturated neon colors (yellow~green) tend to get less likes compared to cool blue colors and warm pastel colors. You can get a feel of this difference in the number of likes between the two tweets below, tweeted an hour apart:

This is actually not a big surprise. Bright pure colors are very harsh and straining to the eye, especially on a white background.1 For this reason bright colors are almost never used in professional web design, and are also discouraged in data visualization.

So here’s a mini experiment testing that claim: I’ll use @everycolorbot’s tweets (more specifically, the likes on the tweets) as a proxy for likeability/readability/comfortableness/etc. It’ll be a good exercise for getting more familiar with different colors! I’m also going to try a simple descriptive analysis using the HSV color representation, which is a psychologically-motivated mental model of color that I like a lot (and am trying to get a better feel for).

HSV cylinder

Figure 1: HSV cylinder

Setup

Using {rtweet} requires authentication from twitter. The steps to do so are very well documented on the package website so I wouldn’t expect too much trouble setting it up if it’s your first time using it. But just for illustration, here’s what my setup looks like:

api_key <- 'XXXXXXXXXXXXXXXXXXXXX'
api_secret_key <- 'XXXXXXXXXXXXXXXXXXXXX'
access_token <- "XXXXXXXXXXXXXXXXXXXXX"
access_token_secret <- "XXXXXXXXXXXXXXXXXXXXX"

token <- create_token(
  app = "XXXXXXXXXXXXXXXXXXXXX",
  consumer_key = api_key,
  consumer_secret = api_secret_key,
  access_token = access_token,
  access_secret = access_token_secret
)

After authorizing, I queried the last 10,000 tweets made by @everycolorbot. It ended up only returning about a 1/3 of that because the twitter API only allows you to go back so far in time, but that’s plenty for my purposes here.

colortweets <- rtweet::get_timeline("everycolorbot", 10000)

dim(colortweets)
  [1] 3238   90

As you see above, I also got back 90 variables (columns). I only care about the time of the tweet, the number of likes it got, and the color it tweeted, so those are what I’m going to grab. I also want to clean things up a bit for plotting, so I’m going to grab just the hour from the time and just the hex code from the text.

colortweets_df <- colortweets %>% 
  select(created_at, text, favorite_count) %>%
  mutate(
    created_at = lubridate::hour(created_at),
    text = paste0("#", str_extract(text, "(?<=0x).*(?= )"))
  ) %>% 
  rename(
    likes = favorite_count,
    hour = created_at,
    hex = text
  )

And here’s what we end up with:

likes hour hex
36 15 #65f84e
65 14 #32fc27
89 13 #997e13
140 12 #ccbf09
303 11 #665f84
75 10 #b32fc2

Here is the link to this data if you’d like to replicate or extend this analysis yourself.

Analysis

Below is a bar plot of colors where the height corresponds to the number of likes. It looks cooler than your usual bar plot because I transformed the x dimension into polar coordinates. My intent in doing this was to control for the hour of day in my analysis and visualize it like a clock (turned out better than expected!)

colortweets_df %>% 
  arrange(-likes) %>% 
  ggplot(aes(hour, likes, color = hex)) +
  geom_col(
    aes(size = likes),
    position = "dodge",
    show.legend = FALSE
  ) +
  scale_color_identity() +
  theme_void() +
  theme(
    plot.background = element_rect(fill = "#222222", color = NA),
  ) +
  coord_polar()

I notice at least two interesting contrasts in this visualization:

So maybe we can say that there are four clusters here:

  1. Least liked: Bright neon colors + highly saturated dark colors

  2. Lesser liked: Bright pure/near-pure colors

  3. More liked: Darker pastel RGB

  4. Most liked: Lighter pastel mixed colors

Now’s let’s try to quantitatively describe each cluster.

First, as a sanity check, I’m just gonna eyeball the range of likes for each cluster using an un-transformed version of the above plot with units. I think we can roughly divide up the clusters at 100 likes, 200 likes, and 400 likes.

colortweets_df %>% 
  arrange(-likes) %>% 
  ggplot(aes(hour, likes, color = hex)) +
  geom_col(
    aes(size = likes),
    position = "dodge",
    show.legend = FALSE
  ) +
  geom_hline(
    yintercept = c(100, 200, 400), 
    color = "white", 
    linetype = 2, 
    size = 2
  ) +
  scale_y_continuous(breaks = scales::pretty_breaks(10)) +
  scale_color_identity() +
  theme_void() +
  theme(
    plot.background = element_rect(fill = "#222222", color = NA),
    axis.line.y = element_line(color = "white"),
    axis.text.y = element_text(
      size = 14,
      color = "white",
      margin = margin(l = 3, r = 3, unit = "mm")
    )
  )

If our initial hypothesis about the four clusters are true, we should see these clusters having distinct profiles. Here, I’m going to use the HSV representation to quantitatively test this. To convert our hex values into HSV, I use the as.hsv() function from the {chroma} package - an R wrapper for the javascript library of the same name.

colortweets_df_hsv <- colortweets_df %>% 
  mutate(hsv = map(hex, ~as_tibble(chroma::as.hsv(.x)))) %>% 
  unnest(hsv)

And now we have the HSV values (hue, saturation, value)!

likes hour hex h s v
36 15 #65f84e 111.88235 0.6854839 0.9725490
65 14 #32fc27 116.90141 0.8452381 0.9882353
89 13 #997e13 47.91045 0.8758170 0.6000000
140 12 #ccbf09 56.00000 0.9558824 0.8000000
303 11 #665f84 251.35135 0.2803030 0.5176471
75 10 #b32fc2 293.87755 0.7577320 0.7607843

What do we get if we average across the dimensions of HSV for each cluster?

colortweets_df_hsv <- colortweets_df_hsv %>% 
  mutate(
    cluster = case_when(
      likes < 100 ~ "Center",
      between(likes, 100, 200) ~ "Inner-Mid",
      between(likes, 201, 400) ~ "Outer-Mid",
      likes > 400 ~ "Edge"
    ),
    cluster = fct_reorder(cluster, likes)
  )

colortweets_df_hsv %>% 
  group_by(cluster) %>% 
  summarize(across(h:v, mean), .groups = 'drop')
  # A tibble: 4 x 4
    cluster       h     s     v
  * <fct>     <dbl> <dbl> <dbl>
  1 Center     121. 0.770 0.710
  2 Inner-Mid  191. 0.734 0.737
  3 Outer-Mid  197. 0.589 0.710
  4 Edge       249. 0.304 0.832

This actually matches up pretty nicely with our initial analysis! We find a general dislike for green colors (h value close to 120) over blue colors (h value close to 240), as well as a dislike for highly saturated colors (intense, bright) over those with low saturation (which is what gives off the “pastel” look). To help make the hue values more interpretable, here’s a color wheel with angles that correspond to the hue values in HSV.2

Hue color wheel

Figure 2: Hue color wheel

But we also expect to find within-cluster variation along HSV. In particular, hue is kind of uninterpretable on a scale so it probably doesn’t make a whole lot of sense to take a mean of that. So back to the drawing plotting board!

Since saturation and value do make more sense on a continuous scale, let’s draw a scatterplot for each cluster with saturation on the x-axis and value on the y-axis. I’m also going to map hue to the color of each point, but since hue is abstract on its own, I’m actually just going to replace it with the hex values (i.e., the actual color).

colortweets_df_hsv %>% 
  ggplot(aes(s, v, color = hex)) +
  geom_point() +
  scale_color_identity() +
  lemon::facet_rep_wrap(~cluster) +
  theme_void(base_size = 16, base_family = "Montserrat Medium") +
  theme(
    plot.margin = margin(3, 5, 5, 5, "mm"),
    strip.text = element_text(margin = margin(b = 3, unit = "mm")),
    panel.border = element_rect(color = "black", fill = NA),
    panel.background = element_rect(fill = "grey75", color = NA)
  )

Here’s the mappings spelled out again:


Our plot above reinforce what we’ve found before. Colors are more likeable (literally) the more they…

  1. Move away from green: Neon-green dominates the least-liked cluster, and that’s a blatant fact. Some forest-greens survive to the lesser-liked cluster, but is practically absent in the more-liked cluster and most-liked cluster. It looks like the only way for green to be redeemable is to either mix in with blue to become cyan and turquoise, which dominates the more-liked cluster, or severly drop in saturation to join the ranks of other pastel colors in the most-liked cluster.

  2. Increase in value and decrease in saturation: It’s clear that the top-left corner is dominated by the more-liked and the most-liked cluster. That region is, again, where pastel colors live. They’re calmer than the bright neon colors that plague the least-liked cluster, and are more liked than highly-saturated and intense colors like those in the top right of the Outer-Mid panel. So perhaps this is a lesson that being “colorful” can only get you so far.

Conclusion

Obviously, all of this should be taken with a grain of salt. We don’t know the people behind the likes - their tastes, whether they see color differently, what medium they saw the tweet through, their experiences, etc.

And of course, we need to remind ourselves that we rarely see a color just by itself in the world. It contrasts and harmonizes with other colors in the environment in very complex ways.

But that’s what kinda makes our analysis cool - despite all these complexities, we see evidence for many things that experts working with color emphasize: avoid pure neon, mix colors, etc. This dataset also opens us up to many more types of analyses (like an actual cluster analysis) that might be worth looking into.

Good stuff.


  1. Dark mode ftw!↩︎

  2. While all color wheels look the same, they aren’t all oriented the same. When using HSV, make sure to reference the color wheel where the red is at 0, green is as 120, and blue is at 240.↩︎