# Regular Expression Syntax in R
This document introduces the basics of regular expressions as used in R. For more information about R's regular expression syntax, see
?regex. For a comprehensive list of regular expression operators, see this ICU guide on regular expressions.
grep to find a string in a character vector
# General syntax: # grep(<pattern>, <character vector>) mystring <- c('The number 5', 'The number 8', '1 is the loneliest number', 'Company, 3 is', 'Git SSH tag is firstname.lastname@example.org', 'My personal site is www.personal.org', 'path/to/my/file') grep('5', mystring) #  1 grep('@', mystring) #  5 grep('number', mystring) #  1 2 3
x|y means look for "x" or "y"
grep('5|8', mystring) #  1 2 grep('com|org', mystring) #  5 6
. is a special character in Regex. It means "match any character"
grep('The number .', mystring) #  1 2
Be careful when trying to match dots!
tricky <- c('www.personal.org', 'My friend is a cyborg') grep('.org', tricky) #  1 2
To match a literal character, you have to escape the string with a backslash (
\). However, R tries to look for escape characters when creating strings, so you actually need to escape the backslash itself (i.e. you need to double escape regular expression characters.)
grep('\.org', tricky) # Error: '\.' is an unrecognized escape in character string starting "'\." grep('\\.org', tricky) #  1
If you want to match one of several characters, you can wrap those characters in brackets (
grep('', mystring) #  3 4 grep('[@/]', mystring) #  5 7
It may be useful to indicate character sequences. E.g.
[0-4] will match 0, 1, 2, 3, or 4,
[A-Z] will match any uppercase letter,
[A-z] will match any uppercase or lowercase letter, and
[A-z0-9] will match any letter or number (i.e. all alphanumeric characters)
grep('[0-4]', mystring) #  3 4 grep('[A-Z]', mystring) #  1 2 4 5 6
R also has several shortcut classes that can be used in brackets. For instance,
[:lower:] is short for
[:upper:] is short for
A-z0-9. Note that these whole expressions must be used inside brackets; for instance, to match a single digit, you can use
[[:digit:]] (note the double brackets). As another example,
[@[:digit:]/] will match the characters
grep('[[:digit:]]', mystring) #  1 2 3 4 grep('[@[:digit:]/]', mystring) #  1 2 3 4 5 7
Brackets can also be used to negate a match with a carat (
^). For instance,
[^5] will match any character other than "5".
grep('The number [^5]', mystring) #  2