Tag Archives: Regular Expressions

JavaScript – Getting Function Parameter Names

Two years ago I wrote a post about how to pass arguments by name in JavaScript. Recently I have started to ramp a new project call YourJS and found a need to be able to read the names of the parameters of the given function. The following getParamNames() function takes an arbitrary function and returns an array of its parameter names:

Using this function is quite simple. Let’s say that getParamNames() and the function below are defined:

function repeat(string, times, opt_delimiter) {
  opt_delimiter = arguments.length > 2 ? opt_delimiter + '' : '';
  return new Array(times + 1).join(opt_delimiter + string).replace(opt_delimiter, '');
}

Running getParamNames(repeat) will result in the following:

>>> getParamNames(repeat)
["string", "times", "opt_delimiter"]

Running getParamNames(getParamNames) will result in the following:

>>> getParamNames(getParamNames)
["fn"]

Pretty cool, right?!?! Have fun! :cool:

JavaScript Snippet – String.prototype.after()

WARNING:
Extending native prototypes is frowned upon by many JS engineers but can be helpful as long as the extensions are properly documented in the codebase.
SCRIPTER’S DISCRETION IS ADVISED. :lol:

Yesterday I added a post about String.prototype.before(...). As is explained within the post, this function can be used to easily extract a substring before a given target. Of course, if you can get what comes before a target you should be able to get what comes after a target too, right? Here is a function that makes that possible:

As indicated in the comments, this function can take a target (string or regular expression) to key off of to extract the desired substring. It also takes an optional second argument used for indicating the occurrence of the target to key off of. For instance, if you wanted to get the string that comes after the second comma in "Uno, dos, tres, cuatro, y cinco" you could use the following code:

var str = "Uno, dos, tres, cuatro, y cinco";
var afterFirstTwo = str.after(',', 2);

Here are some other examples that I used to test this function:

As usual, feel free to use this function in your own projects. Have fun! :cool:

JavaScript Snippet – String.prototype.before()

WARNING:
Extending native prototypes is frowned upon by many JS engineers but can be helpful as long as the extensions are properly documented in the codebase.
SCRIPTER’S DISCRETION IS ADVISED. :lol:

Even though its NOT encouraged to extend native prototypes, at times you may find that doing so is pretty useful. One extension that you may find useful is String.prototype.before(...) which can be used to return the substring before a specified target. Here is the definition:

First I think its worth mentioning that depending on your preference you could choose to rename this function as String.prototype.leftOf(...) to clearly identify what it does. This function takes at least a target to find in the string which is either a string or a regexp. You may also optionally pass in a second argument which indicates the occurrence of the target to key off of. For example, if you wanted to extract the substring before the second comma in "one, two, three, and four" you could do something like the following:

var str = "one, two, three, and four";
var firstTwo = str.before(',', 2);

Of course, that is just one simple example of what you can do with this prototype extension. Check out some tests that exemplify how to use this helpful function:

Most likely if you like this function you will also want to check out the String.prototype.after(...) post. Enjoy using this helpful utility function. :cool:

JavaScript Snippet – Simple CSV Parser

As I mentioned in yesterday’s post, I was recently working on a quick way to parse CSVs into an array of arrays and alternatively into an array of dictionaries keyed on the values in the first row. I ended up landing on the following definition:

As you can see the function is annotated for anyone that may be interested in using. Let’s see example of how it could be used. Here is an example string that can be parsed:

ID,First Name,Last Name,Address,Last Purchase Date,Purchase Amount,Comment,Return Customer
1,Don,Knots,"123 Main St.,
Duggietown, ET 12342",10/23/2013,23.43,"""Doesn't like cheese"" according to his mom.",Y
2,Cher,Vega,"92 Victor Ln.
Rutrow, DA 39252",01/12/2013,588.1,,N
3,Tina,Ray,"1111 Yomdip Circle
Bribloop, EV 92341",02/03/2013,234.2,,Y
4,Charlie,Bucket,"745 Caca Pl.
Hastiville, JS 92293",05/06/2013,345.4,,N

Below is an example of processing the above CSV first as an array of arrays, then as an array of dictionaries (objects), and lastly as an array of dictionaries with typed values:

As you can see from the jPaq Proof above, this parser works well with the majority of the CSVs that you would need to process. Still, in the case that you need a fully-fledged CSV parser, Papa Parse seems to be a pretty good solution. Have fun! 8-)

Regular Expressions – Extra Ending Match

A few days ago I was working on a very primitive version of a CSV reader for a quick JS project and while testing my regex out, I noticed that I was getting an extra match at the end. Here is the JavaScript function that I had:

function parseCSV(str) {
  var row = [], rows = [row];
  str.replace(/([^",\r\n]*|"((?:[^"]+|"")*)")(,|\r|\r?\n|$)/g, function(match, cell, quoted, delimiter) {
    row.push(quoted ? quoted.replace(/""/g, '"') : cell);
    if (delimiter && delimiter != ',') {
      rows.push(row = []);
    }
  });
  return rows;
}

Interestingly, if you pass "Name,DOB" into parseCSV() an extra cell will be added:

> parseCSV('Name,DOB')
[["Name", "DOB", ""]]

Without diving into the function too much to see why it should work with a good regex, you will notice that finding all of the matches for my CSV parsing regex produces an interesting result:

> 'Name,DOB'.match(/([^",\r\n]*|"((?:[^"]+|"")*)")(,|\r|\r?\n|$)/g)
["Name", "DOB", ""]

After thoroughly analyzing my regex, I started to think there was something wrong with the JS implementation of the regular expression engine. I also thought there might be something wrong with my regular expression so I made a simpler one and saw the following:

> 'hello,world'.match(/[^,]*(?:,|$)/g)
["hello", "world", ""]

That seems pretty strange, right? I continued investigating and asked around the office to see if others thought it was weird and they agreed. Therefore I decided to try it out in Python to see if there was just a peculiarity in the JS engine:

>>> import re
>>> re.findall('[^,]*(?:,|$)', 'hello,world')
['hello', 'world', '']

Finally, I started thinking about how I would create a regular expression engine that would look for all instances of the empty string and still not result in an infinite loop. The reason I did this was because I knew that running the following doesn’t result in an infinite loop in Python:

>>> import re
>>> re.findall('', 'Yo')
['', '', '']

I tried the same thing in JS:

> 'Yo'.match(/(?:)/g)
["", "", ""]

It seems that if the matched substring has a length of zero, the next search will start one character past the match’s starting/ending index. On the other hand, if the match contains at least one character, the next search will start at the index after the last character matched. Therefore, let’s consider my simplified regular expression again:

> 'hello,world'.match(/[^,]*(?:,|$)/g)
["hello", "world", ""]
  1. The first time the regex engine tries to find a match it uses a greedy search to find zero or more word characters right before a comma or the end of the string.
  2. It finds hello, and the end index is 6.
  3. Now since the last match was not the empty string the regex engine doesn’t try to advance the starting position of the next search and simply evaluates against "world".
  4. It finds world and the end index is 11 (5 plus the offset of 6).
  5. Again since the last match was not the empty string the regex engine doesn’t try to advance the starting position of the next search simply evaluates against the empty string.
  6. Since the empty string matches the regular expression, the third string found is the empty string and the ending index remains 11.
  7. Finally the regex engine looks to see if the previous match was the empty string and since it was, it tries to advance the starting index by one but realizes that is outside the bounds of the string and therefore there is no need to continue.

The steps listed above are simply used to validate the reasoning behind why regex engines would find three matches for matching /[^,]*(,|$)/g against "hello,world". In the event that I would want to use a similar regex which doesn’t allow the empty string at the end, I could use /(?!$)[^,]*(?:,|$)/g. In conclusion, even though I thought I knew all of the strange edge cases for regexes in JS, I found that I still have more to learn! 8-)

JavaScript Snippet – Undo Camel Case

Probably due to it being so late, I was looking for code to uncamelize (undo camel-casing) any string. I came across what claimed to be a solution in PHP but unfortunately did nothing but lowercased my string. Therefore I decided to write my own solution:

function uncamelize(s) {
  return s.replace(/[A-Z]/g, '_$&').toLowerCase();
}

Believe it or not, the solution is that simple. Here is an example of using it:

An interesting thing to note about this uncamelize implementation is that it uses the $& pattern to reuse the substring matched by the regular expression. Even though this substring is commonly used it is documented as shown here.

JavaScript – Browser Differences When Splitting With RegExp

At times I use the String#split function in order to create an array from a string based on a RegExp delimiter. Unfortunately, today I realized that some browsers act differently than others. Let’s consider the following example:

var splitWithRegExpWorks = ','.split(/,/).length == 2;

In most browsers, the value of the above variable would be true which is what you would probably expect. Unfortunately, I found that the value in IE8 is false.

In fact, all of the following in most browsers will alert true, but in IE8 (and perhaps other strange browsers) false is alerted:

You can test any of the above snippets by simply click on the code block. FYI, you will not be able to run the code snippets as is in the console on this site because i’m using SyntaxHighlighter which seems to redefine the split function (thusly fixing the issue :-D here).

In conclusion, it is important to realize that depending on the JavaScript engine being used, String.prototype.split can return different results if using a regular expression as the delimiter.

JavaScript – Modify URL Parameters

Recently I had to develop a solution which involved changing the HREF attributes of links in the navigation bar of a site. This had to be done on the JavaScript side because at my company we are the 3rd-party JavaScript solution for modifying other companies’ sites. That being said, I came up with one solution because the URL parameter was always the same but I figured I would publicly release the code for doing this in such a way that you can modify the value of any URL parameter:

/**
 * @license Copyright 2013 - Chris West - MIT Licensed
 */
(function(expCharsToEscape, expEscapedSpace, expNoStart, undefined) {
  /**
   * Modifies the given URL, returning it with the given parameter
   * changed to the given value.  The parameter is added if it didn't
   * already exist.  The parameter is removed if null or undefined is
   * specified as the value.
   * @param {string} url  The URL to be modified.
   * @param {string} paramName  The URL parameter whose value will be
   *     modified.
   * @param {string} paramValue  The value to assign.  This will be
   *     escaped using encodeURIComponent.
   * @return {string}  The updated URL.
   */
  modURLParam = function(url, paramName, paramValue) {
    paramValue = paramValue != undefined
      ? encodeURIComponent(paramValue).replace(expEscapedSpace, '+')
      : paramValue;
    var pattern = new RegExp(
      '([?&]'
      + paramName.replace(expCharsToEscape, '\\$1')
      + '=)[^&]*'
    );
    if(pattern.test(url)) {
      return url.replace(
        pattern,
        function($0, $1) {
          return paramValue != undefined ? $1 + paramValue : ''; 
        }
      ).replace(expNoStart, '$1?');
    }
    else if (paramValue != undefined) {
      return url + (url.indexOf('?') + 1 ? '&' : '?')
        + paramName + '=' + paramValue;
    }
    else {
      return url;
    }
  };
})(/([\\\/\[\]{}().*+?|^$])/g, /%20/g, /^([^?]+)&/);

The following are some example calls to this function:

// Initial URL
var url = 'http://example.com/';
alert(url);

// http://example.com/?q=search+term
url = modURLParam(url, 'q', 'search term');
alert(url);

// http://example.com/?q=search+term&name=Guillermo
url = modURLParam(url, 'name', 'Guillermo');
alert(url);

// http://example.com/?q=search+term+2&name=Guillermo
url = modURLParam(url, 'q', 'search term 2');
alert(url);

// http://example.com/?name=Guillermo
url = modURLParam(url, 'q');
alert(url);

// http://example.com/?name=Guillermo&q=termino
url = modURLParam(url, 'q', 'termino');
alert(url);

// http://example.com/?name=Guillermo
url = modURLParam(url, 'q', null);
alert(url);

Feel free to re-use the code! 8)

JavaScript – Substitution Groups & Functions

One of the things that I often end up doing is using JavaScript’s replace function with a regular expression and a callback function. Recently, though, I have been thinking about redeveloping an HTML application (HTA) that I made a long time ago for PCs. This HTA gave me the ability to write regular expressions and replacement strings which could alter the matched groups in ways that normal string substitution doesn’t allow without using callback functions. Let’s take the following file names as examples:

  • 001-the-nephews-come-to-town.mpg
  • 002-scroogey-scroogers.mp4
  • 32-hewi’s-lost-pet.mp4
  • 109-copped-by-the-coppers.mpeg

The file renamer that I use to have would allow these files to be renamed as the following by using /^(\d+)(.+)(?=\.\w+$)/ as the regular expression and "Episode ${1,RLZ} -${2,D2S,PROPER}" as the replacement:

  • Episode 1 – The Nephews Come To Town.mpg
  • Episode 2 – Scroogey Scroogers.mp4
  • Episode 32 – Hewi’s Lost Pet.mp4
  • Episode 109 – Copped By The Coppers.mpeg

As the first step towards achieving my goal, I wrote the following sub function which can accomplish this:

(function() {
  var functions = {};
  var hasOwnProperty = functions.hasOwnProperty;
  this.sub = function(subject, reTarget, strReplacement, objFns) {
    if(!reTarget && !strReplacement && !objFns) {
      for(var key in subject) {
        if(hasOwnProperty.call(subject, key)) {
          functions[key] = subject[key];
        }
      }
    }
    else {
      return subject.replace(reTarget, function(match) {
        var args = arguments;
        var reGroups = [];
        var i = args.length - 2;
        while(--i > 0) {
          reGroups.push(i);
        }
        reGroups = '(' + reGroups.join('|') + ')';
        reGroups = new RegExp('\\$(?:' + reGroups + '|\\{' + reGroups + '((?:,\\w+)*)\\})', 'g');
        return strReplacement.replace(reGroups, function(match, index, index2, fnsToUse) {
          fnsToUse = fnsToUse ? fnsToUse.slice(1).split(',') : [];
          var ret = args[index || index2];
          for(var i = 0, len = fnsToUse.length; i < len; i++) {
            var fnName = fnsToUse[i];
            var fn;
            if((objFns && hasOwnProperty.call(objFns, fnName) && (fn = objFns[fnName]))
                || (hasOwnProperty.call(functions, fnName) && (fn = functions[fnName]))) {
              ret = fn(ret, args);
            }
          }
          return ret;
        });
      });
    }
  };
})();

This function actually acts differently depending on the parameters that are supplied. The first way in which the function can be called is with four parameters:

  1. subject – string:
    The string that is to be modified.
  2. reTarget – RegExp:
    The regular expression used to capture substrings.
  3. strReplacement – string:
    The replacement string which can have the normal captured group expressions (eg. $1, $3, etc.), the new type of captured group expressions (eg. ${1}, ${3,RLZ}, etc.) and/or normal substrings.
  4. objFns – Object:
    Optional object whose keys correspond to functions referenced in the new type of group expressions. Each value should be a function where the first parameter passed to it will be the matched group and the second will be the arguments object which is normally passed to the callback function. The functions should return the string replacement for the captured group expression.
var filename = sub(
  "023-we-are-the-tigers.jpg",
  /^(\d+)(.+)(?=\.\w+$)/,
  "Episode ${1,RLZ} -${2,D2S,PROPER}",
  {
    RLZ: function(match) {
      return match.replace(/^0+(?!$)/, '');
    },
    D2S: function(match) {
      return match.replace(/-/g, ' ');
    },
    PROPER: function(match) {
      return match.toProperCase();
    },
    UPPER: function(match) {
      return match.toUpperCase();
    },
    CHARCODE: function(match) {
      return '<' + match + '=' + match.charCodeAt(0) + '>';
    }
  }
);
alert(filename);  // "Episode 23 - We Are The Tigers.jpg"

var str = sub(
  "abcdefghijklmnopqrstuvwxyz",
  /([aeiou])/g,
  "<${1,UPPER}>",
  {
    UPPER: function(match) {
      return match.toUpperCase();
    }
  }
);
alert(str);  // "<A>bcd<E>fgh<I>jklmn<O>pqrst<U>vwxyz" 

The reason the fourth parameter is optional is because those captured group expression replacement functions could be predefined for all calls to this sub function. If the function is just called with one parameter, that parameter should be the same as the fourth parameter outlined previously. The captured group expression replacement functions will remain persistent for the remainder of the page/program session:

sub({
  RLZ: function(match) {
    return match.replace(/^0+(?!$)/, '');
  },
  D2S: function(match) {
    return match.replace(/-/g, ' ');
  },
  PROPER: function(match) {
    return match.toProperCase();
  }
});
var filenames = [
  "001-the-nephews-come-to-town.mpg",
  "002-scroogey-scroogers.mp4",
  "32-hewi's-lost-pet.mp4",
  "109-copped-by-the-coppers.mpeg"
];
var exp = /^(\d+)(.+)(?=\.\w+$)/;
var replacement = "Episode ${1,RLZ} -${2,D2S,PROPER}";
for(var i = 0; i < filenames.length; i++) {
  filenames[i] = sub(filenames[i], exp, replacement);
}
alert(filenames.join('\n'));

The above code will result in the following filenames:

Episode 1 - The Nephews Come To Town.mpg
Episode 2 - Scroogey Scroogers.mp4
Episode 32 - Hewi's Lost Pet.mp4
Episode 109 - Copped By The Coppers.mpeg

Well, now that I have this code out of the way, hopefully the next step will be to actually make the File Renamer. When I do finish creating it, you can be sure to find it on this blog. 8)

JavaScript – String.prototype.matchAll(regexp)

One of the nice things that many people don’t know about JavaScript String replace() function is the fact that the second parameter can either be a string or a callback function. This callback function receives the entire matched substring, each parenthesized group (if not found the empty string is passed), the index of the match within the original string, and the original string. You can actually get these values using the String match() function, but this doesn’t work for retrieving these values for each instance of a globally matched regular expression. In this case, one could use the following code to define a matchAll() function:

String.prototype.matchAll = function(regexp) {
  var matches = [];
  this.replace(regexp, function() {
    var arr = ([]).slice.call(arguments, 0);
    var extras = arr.splice(-2);
    arr.index = extras[0];
    arr.input = extras[1];
    matches.push(arr);
  });
  return matches.length ? matches : null;
};

If a global regular expression is passed to the function defined above, an array of arrays similar to those that come from the native match() will be returned. As occurs with the native match() function, each sub-array will have an index property and an input property. The index property indicates the position where this particular match began in the original string. The input property indicates the string that the matches were found in.

The following is an example of running this function:

var str = 'Hello world!!!';
var regexp = /(\w+)\W*/g;
console.log(str.matchAll(regexp));

An array containing two arrays will be generated:

[
  {
    0: "Hello ",
    1: "Hello"
    index: 0,
    input: "Hello world!!!"
  },
  {
    0: "world!!!",
    1: "world"
    index: 6,
    input: "Hello world!!!"
  }
]

Even though the sub-arrays above are really object literals, the arrays that come back from this matchAll() function will be true arrays with those two properties (index and input) set for each. Hopefully this helps. 8)