Tag Archives: String

JavaScript Snippet – isValidVarName()

Now Available in YourJS

Recently I was working on a function which needed to determine whether or not a string could be used as a variable name. Variable name validation can get tricky so instead of using a crazy regular expression which includes all keywords (which may change over time) and instead of testing for strange unicode characters which are rarely used for variable names, I decided to leverage the JavaScript Function constructor:

The above function takes the string in question and returns true if the string can be used a variable name. If the string can not be used as a variable name false is returned.

Some may wonder why I’m doing the following:

varName.replace(/[\s\xA0,\/]|^$/g, '.')

The reason I included the above replacement is to avoid false-positives in the case of an empty string, extra spacing, commas, and forward slashes.

Security

Others have attempted to make the same function using the evil eval() function which allows for JS injection. Even though the Function constructor can also be used for evil, when supplying arguments it does prevent you from doing JS injection by making sure the arguments don’t have parentheses.

Examples

The following is an example of what will happen when running for the function for a number of strings:

console.log(isValidVarName(''));           // -> false
console.log(isValidVarName('3'));           // -> false
console.log(isValidVarName('3d'));          // -> false
console.log(isValidVarName('D3'));          // -> true
console.log(isValidVarName('D3 '));         // -> false
console.log(isValidVarName('D3,Q'));        // -> false
console.log(isValidVarName('D3/*Qs*/'));   // -> false
console.log(isValidVarName('D3Q'));         // -> true
console.log(isValidVarName('var'));         // -> false
console.log(isValidVarName('true'));        // -> false
console.log(isValidVarName('undefined'));   // -> true
console.log(isValidVarName('null'));        // -> false
console.log(isValidVarName('coolio.pop'));  // -> false
console.log(isValidVarName('coolio'));      // -> true
console.log(isValidVarName('coolio_pop'));  // -> true
console.log(isValidVarName('$'));           // -> true
console.log(isValidVarName('$á'));          // -> true
console.log(isValidVarName('áÑ'));          // -> true
console.log(isValidVarName('_'));           // -> true

Here is a similar example hosted on JSBin:
JS Bin on jsbin.com

JavaScript – Getting Function Parameter Names

Two years ago I wrote a post about how to pass arguments by name in JavaScript. Recently I have started to ramp a new project call YourJS and found a need to be able to read the names of the parameters of the given function. The following getParamNames() function takes an arbitrary function and returns an array of its parameter names:

Using this function is quite simple. Let’s say that getParamNames() and the function below are defined:

function repeat(string, times, opt_delimiter) {
  opt_delimiter = arguments.length > 2 ? opt_delimiter + '' : '';
  return new Array(times + 1).join(opt_delimiter + string).replace(opt_delimiter, '');
}

Running getParamNames(repeat) will result in the following:

>>> getParamNames(repeat)
["string", "times", "opt_delimiter"]

Running getParamNames(getParamNames) will result in the following:

>>> getParamNames(getParamNames)
["fn"]

Pretty cool, right?!?! Have fun! :cool:

JavaScript Snippet – String.prototype.after()

WARNING:
Extending native prototypes is frowned upon by many JS engineers but can be helpful as long as the extensions are properly documented in the codebase.
SCRIPTER’S DISCRETION IS ADVISED. :lol:

Yesterday I added a post about String.prototype.before(...). As is explained within the post, this function can be used to easily extract a substring before a given target. Of course, if you can get what comes before a target you should be able to get what comes after a target too, right? Here is a function that makes that possible:

As indicated in the comments, this function can take a target (string or regular expression) to key off of to extract the desired substring. It also takes an optional second argument used for indicating the occurrence of the target to key off of. For instance, if you wanted to get the string that comes after the second comma in "Uno, dos, tres, cuatro, y cinco" you could use the following code:

var str = "Uno, dos, tres, cuatro, y cinco";
var afterFirstTwo = str.after(',', 2);

Here are some other examples that I used to test this function:

As usual, feel free to use this function in your own projects. Have fun! :cool:

JavaScript Snippet – String.prototype.before()

WARNING:
Extending native prototypes is frowned upon by many JS engineers but can be helpful as long as the extensions are properly documented in the codebase.
SCRIPTER’S DISCRETION IS ADVISED. :lol:

Even though its NOT encouraged to extend native prototypes, at times you may find that doing so is pretty useful. One extension that you may find useful is String.prototype.before(...) which can be used to return the substring before a specified target. Here is the definition:

First I think its worth mentioning that depending on your preference you could choose to rename this function as String.prototype.leftOf(...) to clearly identify what it does. This function takes at least a target to find in the string which is either a string or a regexp. You may also optionally pass in a second argument which indicates the occurrence of the target to key off of. For example, if you wanted to extract the substring before the second comma in "one, two, three, and four" you could do something like the following:

var str = "one, two, three, and four";
var firstTwo = str.before(',', 2);

Of course, that is just one simple example of what you can do with this prototype extension. Check out some tests that exemplify how to use this helpful function:

Most likely if you like this function you will also want to check out the String.prototype.after(...) post. Enjoy using this helpful utility function. :cool:

Python – BeautifulSoup – Find All with Lambda Function for Attributes

Today, I had to figure out a way to parse an HTML string in Python in order to find all of the attribute values of attributes starting with a specific string. Since we already have BeautifulSoup installed, I started researching how to use a lambda function in conjunction with the attrs argument of BeautifulSoup#findAll(). Unfortunately, I didn’t figure out a way to use a callable with the attrs argument, but I did with the name:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('
Click
Jump
') elems = soup.findAll(lambda tag:[a for a in tag.attrs if a[0].startswith('custom-')])

After running the above code to find all elements with attributes starting with custom-, I ended up with a list of the following two elements:

[
Click
, Jump]

In order to get a list of all of the attribute values, instead of traversing the attributes of each element returned, I decided to just add a little bit to the lambda function:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('
Click
Jump
') custom_values = [] soup.findAll(lambda tag:[custom_values.append(a[1]) for a in tag.attrs if a[0].startswith('custom-')]) print custom_values

That resulted in this:

[u'Clicker', u'Jumper']

Pretty simple, right! :cool:

JavaScript Snippet – Simple CSV Parser

As I mentioned in yesterday’s post, I was recently working on a quick way to parse CSVs into an array of arrays and alternatively into an array of dictionaries keyed on the values in the first row. I ended up landing on the following definition:

As you can see the function is annotated for anyone that may be interested in using. Let’s see example of how it could be used. Here is an example string that can be parsed:

ID,First Name,Last Name,Address,Last Purchase Date,Purchase Amount,Comment,Return Customer
1,Don,Knots,"123 Main St.,
Duggietown, ET 12342",10/23/2013,23.43,"""Doesn't like cheese"" according to his mom.",Y
2,Cher,Vega,"92 Victor Ln.
Rutrow, DA 39252",01/12/2013,588.1,,N
3,Tina,Ray,"1111 Yomdip Circle
Bribloop, EV 92341",02/03/2013,234.2,,Y
4,Charlie,Bucket,"745 Caca Pl.
Hastiville, JS 92293",05/06/2013,345.4,,N

Below is an example of processing the above CSV first as an array of arrays, then as an array of dictionaries (objects), and lastly as an array of dictionaries with typed values:

As you can see from the jPaq Proof above, this parser works well with the majority of the CSVs that you would need to process. Still, in the case that you need a fully-fledged CSV parser, Papa Parse seems to be a pretty good solution. Have fun! 8-)

Regular Expressions – Extra Ending Match

A few days ago I was working on a very primitive version of a CSV reader for a quick JS project and while testing my regex out, I noticed that I was getting an extra match at the end. Here is the JavaScript function that I had:

function parseCSV(str) {
  var row = [], rows = [row];
  str.replace(/([^",\r\n]*|"((?:[^"]+|"")*)")(,|\r|\r?\n|$)/g, function(match, cell, quoted, delimiter) {
    row.push(quoted ? quoted.replace(/""/g, '"') : cell);
    if (delimiter && delimiter != ',') {
      rows.push(row = []);
    }
  });
  return rows;
}

Interestingly, if you pass "Name,DOB" into parseCSV() an extra cell will be added:

> parseCSV('Name,DOB')
[["Name", "DOB", ""]]

Without diving into the function too much to see why it should work with a good regex, you will notice that finding all of the matches for my CSV parsing regex produces an interesting result:

> 'Name,DOB'.match(/([^",\r\n]*|"((?:[^"]+|"")*)")(,|\r|\r?\n|$)/g)
["Name", "DOB", ""]

After thoroughly analyzing my regex, I started to think there was something wrong with the JS implementation of the regular expression engine. I also thought there might be something wrong with my regular expression so I made a simpler one and saw the following:

> 'hello,world'.match(/[^,]*(?:,|$)/g)
["hello", "world", ""]

That seems pretty strange, right? I continued investigating and asked around the office to see if others thought it was weird and they agreed. Therefore I decided to try it out in Python to see if there was just a peculiarity in the JS engine:

>>> import re
>>> re.findall('[^,]*(?:,|$)', 'hello,world')
['hello', 'world', '']

Finally, I started thinking about how I would create a regular expression engine that would look for all instances of the empty string and still not result in an infinite loop. The reason I did this was because I knew that running the following doesn’t result in an infinite loop in Python:

>>> import re
>>> re.findall('', 'Yo')
['', '', '']

I tried the same thing in JS:

> 'Yo'.match(/(?:)/g)
["", "", ""]

It seems that if the matched substring has a length of zero, the next search will start one character past the match’s starting/ending index. On the other hand, if the match contains at least one character, the next search will start at the index after the last character matched. Therefore, let’s consider my simplified regular expression again:

> 'hello,world'.match(/[^,]*(?:,|$)/g)
["hello", "world", ""]
  1. The first time the regex engine tries to find a match it uses a greedy search to find zero or more word characters right before a comma or the end of the string.
  2. It finds hello, and the end index is 6.
  3. Now since the last match was not the empty string the regex engine doesn’t try to advance the starting position of the next search and simply evaluates against "world".
  4. It finds world and the end index is 11 (5 plus the offset of 6).
  5. Again since the last match was not the empty string the regex engine doesn’t try to advance the starting position of the next search simply evaluates against the empty string.
  6. Since the empty string matches the regular expression, the third string found is the empty string and the ending index remains 11.
  7. Finally the regex engine looks to see if the previous match was the empty string and since it was, it tries to advance the starting index by one but realizes that is outside the bounds of the string and therefore there is no need to continue.

The steps listed above are simply used to validate the reasoning behind why regex engines would find three matches for matching /[^,]*(,|$)/g against "hello,world". In the event that I would want to use a similar regex which doesn’t allow the empty string at the end, I could use /(?!$)[^,]*(?:,|$)/g. In conclusion, even though I thought I knew all of the strange edge cases for regexes in JS, I found that I still have more to learn! 8-)

JavaScript Snippet – Get Function Comments

Last year I wrote about having heredoc like strings available in JavaScript. Today I figured i’d briefly bring the topic back, providing a solution for returning an array of all comments found in a function:

Why is this useful? As you may know, writing multiline strings in JavaScript isn’t always the prettiest especially when you have to use \n or \r\n. On the other hand, let’s imagine that we have some multiline strings stored inside of a function as comments:

As you can see in the above jPaq Proof, by using the getComments function on a function which contains comments you can pull the comments out as if they were HEREDOC strings. Have fun! 8-)

NOTE: It is important to remember that if you are minifying your code, these comments will most-likely be stripped out. In this case you will want to find a different solution such as using the string escape characters (\r\n or \n).

JavaScript Snippet – Undo Camel Case

Probably due to it being so late, I was looking for code to uncamelize (undo camel-casing) any string. I came across what claimed to be a solution in PHP but unfortunately did nothing but lowercased my string. Therefore I decided to write my own solution:

function uncamelize(s) {
  return s.replace(/[A-Z]/g, '_$&').toLowerCase();
}

Believe it or not, the solution is that simple. Here is an example of using it:

An interesting thing to note about this uncamelize implementation is that it uses the $& pattern to reuse the substring matched by the regular expression. Even though this substring is commonly used it is documented as shown here.

EcmaScript 6 – String.prototype.repeat()

One simple function that is on the roster to be released in EcmaScript 6 is String.prototype.repeat. The following can be used to define it in browsers which don’t currently have it defined natively:

String.prototype.repeat = String.prototype.repeat || function(count) {
    return Array(count >= 0 ? parseInt(count, 10) + 1 : -1).join(this);
};

Here are some examples of using this function (modified from the originals found on MDN):

alert("->".repeat(-1));     // RangeError
alert("->".repeat(0));      // ""
alert("->".repeat(1));      // "->"
alert("->".repeat(2));      // "->->"
alert("->".repeat(3.6));    // "->->->" (count will be converted to integer)
alert("->".repeat(1/0));    // RangeError

There is more information about this new JavaScript function here.