One of the things that I love about string manipulation is the existence of regular expressions. For this reason, I have decided to share a few examples that may help those who are learning about regular expressions so that can understand them a bit better.


JavaScript – General Variable Name

In many languages, a variable must start off with a letter and may be followed by letters, numbers, and/or the underscore character.  Knowing this, we could use the following regular expression in JavaScript to match a variable name:


var expVarIFlag = /^[A-Z]\w*$/i;

Basically, the above regular expression will only match a string if it matches the pattern for a variable name. The reason that I only put [A-Z] and not [A-Za-z] is because in JavaScript you can specify an “i” flag after the regular expression which indicates that the expression will be case-insensitive. Another thing to note is that I used the \w class which basically represents a word. A word in regular expressions typically means any letter (A to Z regardless of case), any digit (0 to 9), or the underscore character. The reason I used the asterisk instead of the plus sign is because a variable may be just one letter.

NOTE: Although this regular expression may work for other languages, in JavaScript, a variable name can also start off with an underscore or dollar sign.


PostgreSQL – Date (MM/DD/YYYY)

Even though using a regular expression shouldn’t be the way to completely validate a date, you can do so partially with the following in PostgreSQL:


SELECT id, text
FROM answers
WHERE text ~ '^(0\\d|1[012])/([012]\\d|3[01])/\\d{4}$';

The above query will pull all of the answers with a text that matches the pattern to see if it looks like a valid date.

  1. First it specifies that the first two characters are a 0 followed by another digit or a 1 followed by either a 0, 1, or 2.
  2. Next should come a forward slash.
  3. Next should be either…
    1.  0, 1, or 2 followed by any digit
    2. or 3 followed by a 0 or 1
  4. Finally should be another forward slash followed by four digits.

One thing to notice is that in order to properly escape the class inside of a string (which is what we have to do here in PostgreSQL), you have to escape the backslash so that it will be interpreted as one backslash in front of the next character thus rendering “\\w” as “\w“.


PHP – Hexadecimal Color Code

In CSS, a color code can be in many different forms. One accepted form is hexadecimal. The hex form can be three characters or six characters long. It can start off with a number sign, but this symbol isn’t required. Knowing all of this, we could use the following in PHP to validate the hex color:


$pattern = '/^#?([0-9A-F]{3}){1,2}$/i';
$validHex = preg_match($pattern, $_GET['hex']);

The preg_match() function is used to validate the GET parameter called “hex” against our regular expression:

  1. First it specifies that the first character may be a number sign (#).
  2. Next I have defined a parenthesized group which matches any three hexadecimal digits.
  3. After that, I am specifying that my parenthesized group pattern may appear once or two times in a row and that no other characters should follow.
  4. Finally, you will notice that I am again using the “i” flag to indicate that this is a case-insensitive pattern.

Python – Simple Image File Names

Let’s use Python now to check to see if a file name looks like a valid image name:


# Import the regular expression library
import re

# Defining the compiled regular expression.
pat = "^[^/\\?%*:|\"<>]+\\.(jpg|png|gif|bmp)$"
reImg = re.compile(pat, re.I)

# Getting the file name from the user
fileName = raw_input("File name:  ")

# Determine if the file name is an image name
isImage = reImg.match(fileName) is not None

The regular expression created does the following:

  1. First makes sure that the string starts off with one more characters which are none of the following:  /  \  ?  %  *  :  |  ”  <  >
  2. In the end it checks that a dot is found followed by one of the following extensions which must appear at the end of the string:
    1. jpg
    2. png
    3. gif
    4. bmp
  3. It is also important to note that by using “re.I“, I specified that casing would be ignored.

The code should basically prompt the user for a file name and then validate the string entered to determine if it matches the regular expression for an image.  The boolean value indicating whether or not it is an image is stored in the isImage variable.


VBScript – Format Large Integer With Commas

The following is how you could use a regular expression to insert commas into a number (integer):


' Setup the RegExp for testing if input is an integer.
Dim re : Set re = new RegExp
re.Pattern = "^(0|-?[1-9]\d*)$"

' Get the input integer from the user.
input = InputBox("Enter an integer", "Your Integer", 123456789)

' If the input is an integer...
If re.Test(input) Then
  ' Modify the pattern to input the commas correctly.
  re.Pattern = "(\d)(?=(\d{3})+$)"
  re.Global = True

  ' Reformat the integer, if given.
  newInput = re.Replace(input, "$1,")

  ' Display the input formatted with commas.
  MsgBox input & " became " & newInput

' If the input is not an integer, tell the user so.
Else
  MsgBox "The input given wasn't recognized as an integer."
End If

The first regular expression basically tests to make sure that the input is either simply a zero or one or more digits with the first one being non-zero. In other words, the first pattern makes sure that the input is an integer that doesn’t start with a zero (unless it is zero). The second regular expression is what is used to insert the comma(s) in the right place(s). It finds every instance in which one digit is followed by at least one group of three digits. By starting the group off with “?=” I am ensuring that the matched group will not be skipped on the next pass through.


Leave a Reply

Your email address will not be published. Required fields are marked *