Regular Expression Examples

One of the things that I love about string manipulation is the existence of regular expressions. For this reason, I have decided to share a few examples that may help those who are learning about regular expressions so that can understand them a bit better.


JavaScript – General Variable Name

In many languages, a variable must start off with a letter and may be followed by letters, numbers, and/or the underscore character.  Knowing this, we could use the following regular expression in JavaScript to match a variable name:

var expVarIFlag = /^[A-Z]\w*$/i;

Basically, the above regular expression will only match a string if it matches the pattern for a variable name. The reason that I only put [A-Z] and not [A-Za-z] is because in JavaScript you can specify an “i” flag after the regular expression which indicates that the expression will be case-insensitive. Another thing to note is that I used the \w class which basically represents a word. A word in regular expressions typically means any letter (A to Z regardless of case), any digit (0 to 9), or the underscore character. The reason I used the asterisk instead of the plus sign is because a variable may be just one letter.

NOTE: Although this regular expression may work for other languages, in JavaScript, a variable name can also start off with an underscore or dollar sign.


PostgreSQL – Date (MM/DD/YYYY)

Even though using a regular expression shouldn’t be the way to completely validate a date, you can do so partially with the following in PostgreSQL:

SELECT id, text
FROM answers
WHERE text ~ '^(0\\d|1[012])/([012]\\d|3[01])/\\d{4}$';

The above query will pull all of the answers with a text that matches the pattern to see if it looks like a valid date.

  1. First it specifies that the first two characters are a 0 followed by another digit or a 1 followed by either a 0, 1, or 2.
  2. Next should come a forward slash.
  3. Next should be either…
    1.  0, 1, or 2 followed by any digit
    2. or 3 followed by a 0 or 1
  4. Finally should be another forward slash followed by four digits.

One thing to notice is that in order to properly escape the class inside of a string (which is what we have to do here in PostgreSQL), you have to escape the backslash so that it will be interpreted as one backslash in front of the next character thus rendering “\\w” as “\w“.


PHP – Hexadecimal Color Code

In CSS, a color code can be in many different forms. One accepted form is hexadecimal. The hex form can be three characters or six characters long. It can start off with a number sign, but this symbol isn’t required. Knowing all of this, we could use the following in PHP to validate the hex color:

$pattern = '/^#?([0-9A-F]{3}){1,2}$/i';
$validHex = preg_match($pattern, $_GET['hex']);

The preg_match() function is used to validate the GET parameter called “hex” against our regular expression:

  1. First it specifies that the first character may be a number sign (#).
  2. Next I have defined a parenthesized group which matches any three hexadecimal digits.
  3. After that, I am specifying that my parenthesized group pattern may appear once or two times in a row and that no other characters should follow.
  4. Finally, you will notice that I am again using the “i” flag to indicate that this is a case-insensitive pattern.

Python – Simple Image File Names

Let’s use Python now to check to see if a file name looks like a valid image name:

# Import the regular expression library
import re

# Defining the compiled regular expression.
pat = "^[^/\\?%*:|\"<>]+\\.(jpg|png|gif|bmp)$"
reImg = re.compile(pat, re.I)

# Getting the file name from the user
fileName = raw_input("File name:  ")

# Determine if the file name is an image name
isImage = reImg.match(fileName) is not None

The regular expression created does the following:

  1. First makes sure that the string starts off with one more characters which are none of the following:  /  \  ?  %  *  :  |  ”  <  >
  2. In the end it checks that a dot is found followed by one of the following extensions which must appear at the end of the string:
    1. jpg
    2. png
    3. gif
    4. bmp
  3. It is also important to note that by using “re.I“, I specified that casing would be ignored.

The code should basically prompt the user for a file name and then validate the string entered to determine if it matches the regular expression for an image.  The boolean value indicating whether or not it is an image is stored in the isImage variable.


VBScript – Format Large Integer With Commas

The following is how you could use a regular expression to insert commas into a number (integer):

' Setup the RegExp for testing if input is an integer.
Dim re : Set re = new RegExp
re.Pattern = "^(0|-?[1-9]\d*)$"

' Get the input integer from the user.
input = InputBox("Enter an integer", "Your Integer", 123456789)

' If the input is an integer...
If re.Test(input) Then
  ' Modify the pattern to input the commas correctly.
  re.Pattern = "(\d)(?=(\d{3})+$)"
  re.Global = True

  ' Reformat the integer, if given.
  newInput = re.Replace(input, "$1,")

  ' Display the input formatted with commas.
  MsgBox input & " became " & newInput

' If the input is not an integer, tell the user so.
Else
  MsgBox "The input given wasn't recognized as an integer."
End If

The first regular expression basically tests to make sure that the input is either simply a zero or one or more digits with the first one being non-zero. In other words, the first pattern makes sure that the input is an integer that doesn’t start with a zero (unless it is zero). The second regular expression is what is used to insert the comma(s) in the right place(s). It finds every instance in which one digit is followed by at least one group of three digits. By starting the group off with “?=” I am ensuring that the matched group will not be skipped on the next pass through.

JavaScript – String.prototype.expandTabs() Revisited

New & Improved Definition

Last year I wrote a version of this function which was to mimic the Python expandtabs function. The funny thing is I didn’t remember that I wrote the function in the first place. Therefore while working on a project, I wrote the following function which is actually shorter and faster than the original:

String.prototype.expandTabs = function(tabSize) {
  var spaces = (new Array((tabSize = tabSize || 8 ) + 1)).join(" ");
  return this.replace(/([^\r\n\t]*)\t/g, function(a, b) {
    return b + spaces.slice(b.length % tabSize);
  });
};

Minimum Tab Size

If for some reason the above function, although it mimics my previous version (except it doesn’t limit the tab size to 32), is lacking in your mind because it doesn’t provide a way of specifying the minimum tabSize, you can alternatively use the following function definition:

String.prototype.expandTabs = function(tabSize, minTabSize) {
  tabSize = tabSize || 8;
  minTabSize = minTabSize || 1;
  var spaces = (new Array(Math.ceil((minTabSize - 1) / tabSize) * tabSize + tabSize + 1)).join(" ");
  return this.replace(/([^\r\n\t]*)\t/g, function(a, b) {
    return b + spaces.slice(0, Math.ceil((b.length + minTabSize) / tabSize) * tabSize - b.length);
  });
};

Although the above definition is more complete, I feel like there is a better calculation for the amount of spaces that I am simply overlooking. All-in-all, I am happy with both of these newer versions of String.prototype.expandTabs(). I guess having a bad memory has its advantages at times. 😉

String.prototype.expandTabs()

I personally love both JavaScript and Python. For this reason, from time to time I like to port functions that are available in Python, over to JavaScript. The following is a port of the expandTabs function.

String.prototype.expandTabs = function(tabSize) {
  tabSize = Math.min(Math.max(parseInt(+tabSize || 8), 1), 32);
  return this.replace(/^.*\t.*$/gm, function(line) {
    var allBefore = "";
    return line.replace(/(.*?)(\t+)/g, function(match, before, tabs) {
      allBefore += before;
      tabs = new Array(1 + tabSize * tabs.length - allBefore.length % tabSize).join(" ");
      allBefore += tabs;
      return before + tabs;
    });
  });
};

The biggest difference between this JavaScript version and the Python version is the limitation of a 32 character tab size. If you want to increase the maximum tab size, simply change 32 to something else.