Regular expressions are a great way to parse, match and modify strings of all sorts. Let’s learn some different ways that we can use them in Apex by way of some examples that you can run in the Developer Console.
Pattern.matches(…)
First we will learn how to use Pattern.matches(…)
for pattern matching. Let’s say that you want to test a string to see if it looks like a number. Try running the following as anonymous code in the Salesforce Developer Console:
/**
* Takes a string and returns a boolean indicating whether or not it looks
* numeric.
*/
static Boolean looksNumeric(String toTest) {
return Pattern.matches('^-?(?:\\d+|(?=.))(?:\\.\\d+)?$', toTest);
}
// Different strings to test.
String[] stringsToTest = new String[]{
'Hello 0 world',
'40',
'72.',
'.56',
'12345.67890',
'00000.00000'
};
// Test each of the above strings to see if they look numeric.
for (String toTest : stringsToTest) {
System.debug(String.format('"{0}" does{1} look numeric.', new String[]{
toTest,
looksNumeric(toTest) ? '' : ' NOT'
}));
}
Running the above code will output the following:
"Hello 0 world" does NOT look numeric.
"40" does look numeric.
"72." does NOT look numeric.
".56" does look numeric.
"12345.67890" does look numeric.
"00000.00000" does look numeric.
As you can see we are able to see if the given string matches the regular expression but it isn’t the most efficient because every time the function is called we are compiling the regular expression on the fly. Let’s see how we can get around that in the next section.
Pattern.compile(…)
Now we are going to be using Pattern.compile(…)
along with Pattern#.matcher(…)
and Matcher#.matches()
for more efficient pattern matching. Let’s say that we want to do the same thing that we did above but this time we dont want to have to compile the regular expression on the fly every time. In that case we can do something like this:
// Define constant regular expression used by looksNumeric().
final Pattern PAT_NUMERIC = Pattern.compile('^-?(?:\\d+|(?=.))(?:\\.\\d+)?$');
/**
* Takes a string and returns a boolean indicating whether or not it looks
* numeric.
*/
static Boolean looksNumeric(String toTest) {
return PAT_NUMERIC.matcher(toTest).matches();
}
// Different strings to test.
String[] stringsToTest = new String[]{
'Hello 0 world',
'40',
'72.',
'.56',
'12345.67890',
'00000.00000'
};
// Test each of the above strings to see if they look numeric.
for (String toTest : stringsToTest) {
System.debug(String.format('"{0}" does{1} look numeric.', new String[]{
toTest,
looksNumeric(toTest) ? '' : ' NOT'
}));
}
As you can see in this example, this function is great to test a regular expression against an entire string. On the other hand, this will not work if you want to simply test part of a string against the regular expression.
Matcher#.find()
By using an instance of a Matcher
that is bound to a regular expression Pattern
and a String
you can determine whether or not the regular expression can be found in any part of the given string. Here is an example:
// Pattern for checking if a string contains a vowel.
static Pattern patVowel = Pattern.compile('(?i)[AEIOU]');
/**
* Returns a boolean indicating if `strToTest` contains a vowel.
*/
static Boolean containsVowel(String strToTest) {
Matcher m = patVowel.matcher(strToTest);
return m.find();
}
// Test containsVowel().
for (String strToTest : new String[]{'123', 'Johnny', 'Rhythm'}) {
String msg = JSON.serialize(strToTest)
+ ' does '
+ (containsVowel(strToTest) ? '' : 'NOT ')
+ 'contain any vowels.';
System.debug(msg);
}
If you run the above code in the Salesforce Developer Console as anonymous Apex the following will be output:
"123" does NOT contain any vowels.
"Johnny" does contain any vowels.
"Rhythm" does NOT contain any vowels.
You may have noticed that the regular expression starts with (?i)
. This is a flag indicating that the regular expression is going to be case-insensitive.
What else can we do Matcher
?
Matcher#.group(), Matcher#.start() and Matcher#.end()
In reality, by using the Matcher#.find()
function we are looking to see if a first match can be found but we do not have to stop there. Let’s say that we want to find all of the instances of a vowel. Building on our previous example we can go further and use 3 additional function to get specifics:
// Find the position of all of the vowel groupings in strToSearch.
String strToSearch = 'Beautiful';
Pattern patVowels = Pattern.compile('(?i)[AIEOU]([AEIOU]+)?');
Matcher mVowels = patVowels.matcher(strToSearch);
while (mVowels.find()) {
System.debug(JSON.serializePretty(new Map {
'group()' => mVowels.group(),
'start()' => mVowels.start(),
'end()' => mVowels.end(),
'group(1)' => mVowels.group(1),
'start(1)' => mVowels.start(1),
'end(1)' => mVowels.end(1)
}));
}
Running the above will result in the following output:
{
"group()" : "eau",
"start()" : 1,
"end()" : 4,
"group(1)" : "au",
"start(1)" : 2,
"end(1)" : 4
}
{
"group()" : "i",
"start()" : 5,
"end()" : 6,
"group(1)" : null,
"start(1)" : -1,
"end(1)" : -1
}
{
"group()" : "u",
"start()" : 7,
"end()" : 8,
"group(1)" : null,
"start(1)" : -1,
"end(1)" : -1
}
As you can see from the results, the Match#.group(…)
function can be used to either get the entire group that was found or a capture group (if you specify the index of the capture group). The Match#.start(…)
function can be used to either get the starting index of the entire group that was found or the starting index of a capture group (if you specify the index of the capture group). The Match#.end(…)
function can be used to either get the ending index of the entire group that was found or the ending index of a capture group (if you specify the index of the capture group). It is important to note that although you can specify a named capture group, you can only reference it by name when using replaceAll()
or replaceFirst()
.
Using Named Capture Groups
Let’s say that we want to consistently separate first names from last names in a string that contains both and we want to use named capture groups to do it:
// The names of the people to parse.
String[] names = new String[]{
'Erin Lansdale',
'McWire, Tobey Spider',
'John Harold Lionel Jacobs',
'Sparks, Janet'
};
// Loop through the names and parse them showing each first and last name.
for (String name : names) {
String newName = name.replaceFirst(
'(?\\w+)(?: .+)? (?\\w+)|(?\\w+), (?\\w+)(?: .+)?',
'First="${f}${f2}", Last="${l}${l2}"'
);
String msg = JSON.serialize(name)
+ ' turns into this: '
+ newName;
System.debug(msg);
}
Of course, the above is a simple example and shouldn’t necessarily be used in production code but it shows us how to reference named capture groups. The output is as follows:
"Erin Lansdale" turns into this: First="Erin", Last="Lansdale"
"McWire, Tobey Spider" turns into this: First="Tobey", Last="McWire"
"John Harold Lionel Jacobs" turns into this: First="John", Last="Jacobs"
"Sparks, Janet" turns into this: First="Janet", Last="Sparks"
As you can see we are able to reference named capture groups by using ${name}
, where “name” is replaced with the actual name of the capture group. You could also reference the same capture group by the number of the group. For example, in the Apex code we could have used this instead:
// The names of the people to parse.
String[] names = new String[]{
'Erin Lansdale',
'McWire, Tobey Spider',
'John Harold Lionel Jacobs',
'Sparks, Janet'
};
// Loop through the names and parse them showing each first and last name.
for (String name : names) {
String newName = name.replaceFirst(
'(?\\w+)(?: .+)? (?\\w+)|(?\\w+), (?\\w+)(?: .+)?',
'First="$1$4", Last="$2$3"'
);
String msg = JSON.serialize(name)
+ ' turns into this: '
+ newName;
System.debug(msg);
}
That could would still produce the same output as before, but, of course, it doesn’t make much sense to reference a named capture group by index unless you are just trying to save on the length of your code.
You can make backreferences to named capture groups by using \k<name>
, where “name” is replaced with the actual name of the capture group. Here is an example showing how to use backreferences:
String input = 'He asked, "how are you?", but she is \'shy\'.';
String output = input.replaceAll(
'(?["']).+?\\k',
'[$0] (${quote})'
);
System.debug(output);
The above results in this being printed out:
He asked, ["how are you?"] ("), but she is ['shy'] (').
There are many other things that you can do with regular expressions in Apex. Hopefully this gives you a good head start. Happy coding! š