Instead of the more idiomatic substr()
approach, you can also use gsub()
with the {n}
quantifier to extract first n
characters.
ipsum <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
nchar(ipsum)
#> [1] 123
Or substring()
, for “compatibility with S”
To cut the text up to 30 chars, you can use substr()
:
substr(ipsum, 1, 30)
#> [1] "Lorem ipsum dolor sit amet, co"
If you wanted to add a trailing ellipse, you can paste0()
onto that output:
paste0(substr(ipsum, 1, 30), "...")
#> [1] "Lorem ipsum dolor sit amet, co..."
Oddly enough, the assignment variant `substr<-`(ipsum, 31, nchar(ipsum), "...")
fails to achieve this effect. What you get instead is “…” masking a span of the same length, ranging index 31-to-33 of the input string.
But if you use gsub()
instead, you can express all that in a single call:
gsub("^(.{30}).*$", "\\1...", ipsum)
#> [1] "Lorem ipsum dolor sit amet, co..."
A nice feature of gsub()
is that if a pattern isn’t matched, it simply skips the replacement. This takes care of the often annoying conditional where you’d only want the trailing “…” if there’s actually more content past the snippet.
gsub("^(.{30}).*$", "\\1...", "short text")
#> [1] "short text"
Another often desired effect is to avoid truncating mid-word. This is simple to express in regex, by simply topping off the pattern group with any non-whitespace character (\\S*
) that could be matched.
gsub("^(.{30}\\S*).*$", "\\1...", ipsum)
#> [1] "Lorem ipsum dolor sit amet, consectetur..."
At that point you can also change the substitution pattern to place a space after the last word, to emphasize the word boundary:
gsub("^(.{30}\\S*).*$", "\\1 ...", ipsum)
#> [1] "Lorem ipsum dolor sit amet, consectetur ..."
Needless to say, you can also truncate from both ends:
gsub("^(.{30}\\S*).*(\\S*.{30})$", "\\1 ... \\2", ipsum)
#> [1] "Lorem ipsum dolor sit amet, consectetur ... labore et dolore magna aliqua."
sessionInfo()
sessionInfo()
#> R version 4.4.1 (2024-06-14 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=English_United States.utf8
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_4.4.1 litedown_0.7.1 tools_4.4.1 codetools_0.2-20
#> [5] xfun_0.52 commonmark_1.9.5