17.2 Regular expressions to find more flexible patterns

Special characters used for pattern recognition:

Character Description
$ Find pattern at the end of the string
^ Find pattern at the beginning of the string
{n} The previous pattern should be found exactly n times
{n,m} The previous pattern should be found between n and m times
+ The previous pattern should be found at least 1 time
* One or more allowed, but optional
? One allowed, but optional

Match your own pattern inside []

Pattern Description
[abc] matches a, b, or c
1 matches a, b or c at the beginning of the element
^A[abc]+ matches A as the first character of the element, then either a, b or c
^A[abc]* matches A as the first character of the element, then optionally either a, b or c
^A[abc]{1}_ matches A as the first character of the element, then either a, b or c (one time!) followed by an underscore
[a-z] matches every character between a and z
[A-Z] matches every character between A and Z
[0-9] matches every number between 0 and 9
  • Match anything contained between brackets (here either g or t) once:
grep(pattern="[gt]", 
    x=c("genomics", "proteomics", "transcriptomics"), 
    value=TRUE)
  • Match anything contained between brackets once AND at the start of the element:
grep(pattern="^[gt]",
        x=c("genomics", "proteomics", "transcriptomics"),
        value=TRUE)
  • Create a vector of email addresses:
vec_ad <- c("marie.curie@yahoo.es", "albert.einstein01@hotmail.com", 
    "charles.darwin1809@gmail.com", "rosalind.franklin@aol.it")
  • Keep only email addresses finishing with “es”:
grep(pattern="es$",
        x=vec_ad,
        value=TRUE)

  1. abc↩︎