Regular expressions to find more flexible patterns
Special characters used for pattern recognition:
$ |
Find pattern at the end of the string |
^ |
Find pattern at the beginning of the string |
{n} |
The previous pattern should be found exactly n times |
{n,m} |
The previous pattern should be found between n and m times |
+ |
The previous pattern should be found at least 1 time |
* |
One or more allowed, but optional |
? |
One allowed, but optional |
Match your own pattern inside []
[abc] |
matches a, b, or c |
|
matches a, b or c at the beginning of the element |
^A[abc]+ |
matches A as the first character of the element, then either a, b or c |
^A[abc]* |
matches A as the first character of the element, then optionally either a, b or c |
^A[abc]{1}_ |
matches A as the first character of the element, then either a, b or c (one time!) followed by an underscore |
[a-z] |
matches every character between a and z |
[A-Z] |
matches every character between A and Z |
[0-9] |
matches every number between 0 and 9 |
- Match anything contained between brackets (here either g or t) once:
grep(pattern="[gt]",
x=c("genomics", "proteomics", "transcriptomics"),
value=TRUE)
- Match anything contained between brackets once AND at the start of the element:
grep(pattern="^[gt]",
x=c("genomics", "proteomics", "transcriptomics"),
value=TRUE)
- Create a vector of email addresses:
vec_ad <- c("marie.curie@yahoo.es", "albert.einstein01@hotmail.com",
"charles.darwin1809@gmail.com", "rosalind.franklin@aol.it")
- Keep only email addresses finishing with “es”:
grep(pattern="es$",
x=vec_ad,
value=TRUE)