Python Quirks – List Concatenation or Mutation?

If you execute the following code, what do you think will be the result?

list1 = [1]
list2 = list1

list2 += [2, 3]
assert list1 == list2, '{} != {}'.format(list1, list2)

list1 = list1 + [4, 5]
assert list1 == list2, '{} != {}'.format(list1, list2)

Will an assert error occur? The answer is yes! The reason why is because list1 = list1 + list2 is actually different from list1 += list2 in Python. Let’s see what happened when I ran the above code in the command prompt:

>>> list1 = [1]
>>> list2 = list1
>>> 
>>> list2 += [2, 3]
>>> assert list1 == list2, '{} != {}'.format(list1, list2)
>>> 
>>> list1 = list1 + [4, 5]
>>> assert list1 == list2, '{} != {}'.format(list1, list2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: [1, 2, 3, 4, 5] != [1, 2, 3]

As is shown above, using `+=` is not assigning a concatenation of the two lists, but actually mutating the left-hand side list (list1). In other words, list1 += list2 is the same as list1.extend(list2). Interesting, right? :cool:

Python Snippet – Get List Item or Get Default

One of the many cool things about Python is that you can often use builtin functions to either get a value from a dictionary (or an object with getattr) or default to a specified value. Unfortunately lists do not provide such a function. The simplest use case would be to get the first value out of any list if it exists and if not simply return None. After fiddling around with a few different implementations, I landed on this one:

def get_at(list, index, default=None):
  return list[index] if max(~index, index) < len(list) else default

The above function can be used in order to either return the value at the specified index or a default value if the index doesn't exist:

lists = [[], ['Hello world!!!'], range(10)[3:]]

for list in lists:
  print 'list = {}'.format(list)
  print '- first value: {}'.format(get_at(list, 0))
  print '- second value: {}'.format(get_at(list, 1))
  print '- last value: {}'.format(get_at(list, -1))
  print '- 5th value (defaults to "missing"): {}'.format(get_at(list, 4, 'missing'))
  print ''

The above code outputs the following:

list = []
- first value: None
- second value: None
- last value: None
- 5th value (defaults to "missing"): missing

list = ['Hello world!!!']
- first value: Hello world!!!
- second value: None
- last value: Hello world!!!
- 5th value (defaults to "missing"): missing

list = [3, 4, 5, 6, 7, 8, 9]
- first value: 3
- second value: 4
- last value: 9
- 5th value (defaults to "missing"): 7

You may be thinking that there must be another way to do this. There are in fact various other ways to do this. One of them is to use `next(iter(list[index:]), default)` as the return value. The main reason I landed on the solution that uses max() and a ternary statement is because I wanted to minimize the amount of objects created. Using next() along with iter() requires a generator to be created for the sliced list. On the other hand, Python handles all of this pretty efficiently so its possible that the difference between the two solutions is minimal. Either way, now I have a succinct helper function that either gets me a list item or the default value I specify. :cool:

JavaScript Quirks – Array Slicing Node Lists

For a while many developers, including myself, suggested that Array.prototype.slice() be used to turn any array-like object into an array. Recently, I was alerted to a scenario in which this does not work as expected. Let’s take the following code for example:

var elems = document.body.getElementsByTagName('*');
var arrElems = Array.prototype.slice.call(elems, 0);

In most browsers, the above code would take the node list assigned to elems and create an array of those nodes and assign it to arrElems. Leave it to Internet Explorer to be the exception, right? Unfortunately, Internet Explorer 8 and lower throw a run-time error: JScript object expected.

For that reason, functions that are written to reliably turn array-like objects into arrays should use the good old-fashioned loop in order to accomplish the task:

function makeArray(o) {
  for (var i = 0, l = o.length, a = new Array(i); i < l; i++) {
    if (i in o) {
      a[i] = o[i];
    }
  }
  return a;
}

As you will notice, the above function creates an array of the same length and only adds the values where found (just as slice() does when copying arrays).

It might be argued that using hasOwnProperty() is a faster solution, but after doing this performance test it seems that the fastest implementation depends on the browser. On the other hand, going in reverse order seemed generally slightly slower. Therefore, I decided to settle with the above implementation which can be finely tuned depending on the target environment. Of course, you could use slice() and then if it fails, fall back to the for loop:

function makeArray(o) {
  try {
    return Array.prototype.slice.call(o);
  } catch (e) {}
  
  for (var i = 0, l = o.length, a = new Array(i); i < l; i++) {
    if (i in o) {
      a[i] = o[i];
    }
  }
  return a;
}

The aforementioned performance test shows that this solution is actually one of the more performant solutions across browsers. In the end it is all up to you. Have fun! :cool:

CSS – Expanding Underlines

The other day I was visiting Grid Style Sheets and was fascinated by the fact that the underlines of the links seemed to expand on mouse over. At first I tried simply using transition in conjunction with margin and padding but unfortunately the results were a bit shaky. Therefore I decided to actually examine the code and found they were using the ::before pseudo-selector. After some tinkering around, I landed on this solution for a simplified version of expanding lines for all links:

a:link, a:visited {
  color: #66F;
  text-decoration: none;
  position: relative;
  transition: all 1s;
}
a:hover {
  color: #00F;
}
a:link::before, a:visited::before {
  background: #DDF;
  box-shadow: 0 0 1px #00F;
  content: '';
  height: 2px;
  position: absolute;
  top: 100%;
  transition: all 1s;
  width: 80%;
  margin-left: 10%;
}
a:hover::before {
  background: #33F;
  box-shadow: 0 2px 2px #CCF;
  width: 110%;
  margin-left: -5%;
}

Here is how the end result looks (try moving your mouse over either Google or Bing:

Expanding Underlines

Pretty cool, right?! CSS3 Definitely adds some cool style to the web these days. :cool:

JavaScript Snippet – String.prototype.after()

WARNING:
Extending native prototypes is frowned upon by many JS engineers but can be helpful as long as the extensions are properly documented in the codebase.
SCRIPTER’S DISCRETION IS ADVISED. :lol:

Yesterday I added a post about String.prototype.before(...). As is explained within the post, this function can be used to easily extract a substring before a given target. Of course, if you can get what comes before a target you should be able to get what comes after a target too, right? Here is a function that makes that possible:

As indicated in the comments, this function can take a target (string or regular expression) to key off of to extract the desired substring. It also takes an optional second argument used for indicating the occurrence of the target to key off of. For instance, if you wanted to get the string that comes after the second comma in "Uno, dos, tres, cuatro, y cinco" you could use the following code:

var str = "Uno, dos, tres, cuatro, y cinco";
var afterFirstTwo = str.after(',', 2);

Here are some other examples that I used to test this function:

As usual, feel free to use this function in your own projects. Have fun! :cool:

JavaScript Snippet – String.prototype.before()

WARNING:
Extending native prototypes is frowned upon by many JS engineers but can be helpful as long as the extensions are properly documented in the codebase.
SCRIPTER’S DISCRETION IS ADVISED. :lol:

Even though its NOT encouraged to extend native prototypes, at times you may find that doing so is pretty useful. One extension that you may find useful is String.prototype.before(...) which can be used to return the substring before a specified target. Here is the definition:

First I think its worth mentioning that depending on your preference you could choose to rename this function as String.prototype.leftOf(...) to clearly identify what it does. This function takes at least a target to find in the string which is either a string or a regexp. You may also optionally pass in a second argument which indicates the occurrence of the target to key off of. For example, if you wanted to extract the substring before the second comma in "one, two, three, and four" you could do something like the following:

var str = "one, two, three, and four";
var firstTwo = str.before(',', 2);

Of course, that is just one simple example of what you can do with this prototype extension. Check out some tests that exemplify how to use this helpful function:

Most likely if you like this function you will also want to check out the String.prototype.after(...) post. Enjoy using this helpful utility function. :cool:

Python – BeautifulSoup – Find All with Lambda Function for Attributes

Today, I had to figure out a way to parse an HTML string in Python in order to find all of the attribute values of attributes starting with a specific string. Since we already have BeautifulSoup installed, I started researching how to use a lambda function in conjunction with the attrs argument of BeautifulSoup#findAll(). Unfortunately, I didn’t figure out a way to use a callable with the attrs argument, but I did with the name:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('
Click
Jump
') elems = soup.findAll(lambda tag:[a for a in tag.attrs if a[0].startswith('custom-')])

After running the above code to find all elements with attributes starting with custom-, I ended up with a list of the following two elements:

[
Click
, Jump]

In order to get a list of all of the attribute values, instead of traversing the attributes of each element returned, I decided to just add a little bit to the lambda function:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('
Click
Jump
') custom_values = [] soup.findAll(lambda tag:[custom_values.append(a[1]) for a in tag.attrs if a[0].startswith('custom-')]) print custom_values

That resulted in this:

[u'Clicker', u'Jumper']

Pretty simple, right! :cool:

JavaScript – Simplest Inheritance Before ES5

One of the great things about reading through Douglas Crockford’s site is that you get to learn about how certain things like prototypal inheritance was implemented prior to ECMAScript 5. Since then Object.create() made its way into JavaScript and has been available in most modern browsers for a while. If you haven’t read through Crockford’s site or seen how prototypal inheritance was done before, here is a quick overview:

You can click here to test out the code yourself. As you may have noticed, the key to prototypal inheritance is as follows (please ignore the names of these prototypes disguising themselves as classes :smile:):

SubClass.prototype = new Class();

Unfortunately, at times you may not be able to safely execute the constructor of the prototype which is being inherited (which in this case would be Class). In order to get around it, you can create a surrogate function whose prototype will become that of the prototype to be inherited:

function SurrogateClass(){}
SurrogateClass.prototype = Class.prototype;
SubClass.prototype = new SurrogateClass();

Unfortunately, it isn’t the most memorable sequence so let’s turn this into a simple function:

function inherit(Class, SubClass) {
    function SurrogateClass(){}
    SurrogateClass.prototype = Class.prototype;
    SubClass.prototype = new SurrogateClass();
}

Finally, let’s minify it:

function inherit(C,S,p){function O(){}O[p='prototype']=C[p];S[p]=new O}

OK, so this is minification to an extreme since I placed p in the function’s signature. I also went a bit far by sketchily defining p in the left-hand side of the assignment of the prototype to the surrogate function. Even though the above code seems to work, just to be safe, here is a slightly longer, more trustworthy definition:

In the end, both of these definitions are extremely small so I’d say even prior to ES5, prototypal inheritance wasn’t as difficult as one might have thought. :cool:

JavaScript Snippet – Simple CSV Parser

As I mentioned in yesterday’s post, I was recently working on a quick way to parse CSVs into an array of arrays and alternatively into an array of dictionaries keyed on the values in the first row. I ended up landing on the following definition:

As you can see the function is annotated for anyone that may be interested in using. Let’s see example of how it could be used. Here is an example string that can be parsed:

ID,First Name,Last Name,Address,Last Purchase Date,Purchase Amount,Comment,Return Customer
1,Don,Knots,"123 Main St.,
Duggietown, ET 12342",10/23/2013,23.43,"""Doesn't like cheese"" according to his mom.",Y
2,Cher,Vega,"92 Victor Ln.
Rutrow, DA 39252",01/12/2013,588.1,,N
3,Tina,Ray,"1111 Yomdip Circle
Bribloop, EV 92341",02/03/2013,234.2,,Y
4,Charlie,Bucket,"745 Caca Pl.
Hastiville, JS 92293",05/06/2013,345.4,,N

Below is an example of processing the above CSV first as an array of arrays, then as an array of dictionaries (objects), and lastly as an array of dictionaries with typed values:

As you can see from the jPaq Proof above, this parser works well with the majority of the CSVs that you would need to process. Still, in the case that you need a fully-fledged CSV parser, Papa Parse seems to be a pretty good solution. Have fun! 8-)

Regular Expressions – Extra Ending Match

A few days ago I was working on a very primitive version of a CSV reader for a quick JS project and while testing my regex out, I noticed that I was getting an extra match at the end. Here is the JavaScript function that I had:

function parseCSV(str) {
  var row = [], rows = [row];
  str.replace(/([^",\r\n]*|"((?:[^"]+|"")*)")(,|\r|\r?\n|$)/g, function(match, cell, quoted, delimiter) {
    row.push(quoted ? quoted.replace(/""/g, '"') : cell);
    if (delimiter && delimiter != ',') {
      rows.push(row = []);
    }
  });
  return rows;
}

Interestingly, if you pass "Name,DOB" into parseCSV() an extra cell will be added:

> parseCSV('Name,DOB')
[["Name", "DOB", ""]]

Without diving into the function too much to see why it should work with a good regex, you will notice that finding all of the matches for my CSV parsing regex produces an interesting result:

> 'Name,DOB'.match(/([^",\r\n]*|"((?:[^"]+|"")*)")(,|\r|\r?\n|$)/g)
["Name", "DOB", ""]

After thoroughly analyzing my regex, I started to think there was something wrong with the JS implementation of the regular expression engine. I also thought there might be something wrong with my regular expression so I made a simpler one and saw the following:

> 'hello,world'.match(/[^,]*(?:,|$)/g)
["hello", "world", ""]

That seems pretty strange, right? I continued investigating and asked around the office to see if others thought it was weird and they agreed. Therefore I decided to try it out in Python to see if there was just a peculiarity in the JS engine:

>>> import re
>>> re.findall('[^,]*(?:,|$)', 'hello,world')
['hello', 'world', '']

Finally, I started thinking about how I would create a regular expression engine that would look for all instances of the empty string and still not result in an infinite loop. The reason I did this was because I knew that running the following doesn’t result in an infinite loop in Python:

>>> import re
>>> re.findall('', 'Yo')
['', '', '']

I tried the same thing in JS:

> 'Yo'.match(/(?:)/g)
["", "", ""]

It seems that if the matched substring has a length of zero, the next search will start one character past the match’s starting/ending index. On the other hand, if the match contains at least one character, the next search will start at the index after the last character matched. Therefore, let’s consider my simplified regular expression again:

> 'hello,world'.match(/[^,]*(?:,|$)/g)
["hello", "world", ""]
  1. The first time the regex engine tries to find a match it uses a greedy search to find zero or more word characters right before a comma or the end of the string.
  2. It finds hello, and the end index is 6.
  3. Now since the last match was not the empty string the regex engine doesn’t try to advance the starting position of the next search and simply evaluates against "world".
  4. It finds world and the end index is 11 (5 plus the offset of 6).
  5. Again since the last match was not the empty string the regex engine doesn’t try to advance the starting position of the next search simply evaluates against the empty string.
  6. Since the empty string matches the regular expression, the third string found is the empty string and the ending index remains 11.
  7. Finally the regex engine looks to see if the previous match was the empty string and since it was, it tries to advance the starting index by one but realizes that is outside the bounds of the string and therefore there is no need to continue.

The steps listed above are simply used to validate the reasoning behind why regex engines would find three matches for matching /[^,]*(,|$)/g against "hello,world". In the event that I would want to use a similar regex which doesn’t allow the empty string at the end, I could use /(?!$)[^,]*(?:,|$)/g. In conclusion, even though I thought I knew all of the strange edge cases for regexes in JS, I found that I still have more to learn! 8-)

JavaScript, Math, and much more.