Category Archives: Python

Python Quirks – List Concatenation or Mutation?

If you execute the following code, what do you think will be the result?

list1 = [1]
list2 = list1

list2 += [2, 3]
assert list1 == list2, '{} != {}'.format(list1, list2)

list1 = list1 + [4, 5]
assert list1 == list2, '{} != {}'.format(list1, list2)

Will an assert error occur? The answer is yes! The reason why is because list1 = list1 + list2 is actually different from list1 += list2 in Python. Let’s see what happened when I ran the above code in the command prompt:

>>> list1 = [1]
>>> list2 = list1
>>> 
>>> list2 += [2, 3]
>>> assert list1 == list2, '{} != {}'.format(list1, list2)
>>> 
>>> list1 = list1 + [4, 5]
>>> assert list1 == list2, '{} != {}'.format(list1, list2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: [1, 2, 3, 4, 5] != [1, 2, 3]

As is shown above, using `+=` is not assigning a concatenation of the two lists, but actually mutating the left-hand side list (list1). In other words, list1 += list2 is the same as list1.extend(list2). Interesting, right? :cool:

Python Snippet – Get List Item or Get Default

One of the many cool things about Python is that you can often use builtin functions to either get a value from a dictionary (or an object with getattr) or default to a specified value. Unfortunately lists do not provide such a function. The simplest use case would be to get the first value out of any list if it exists and if not simply return None. After fiddling around with a few different implementations, I landed on this one:

def get_at(list, index, default=None):
  return list[index] if max(~index, index) < len(list) else default

The above function can be used in order to either return the value at the specified index or a default value if the index doesn't exist:

lists = [[], ['Hello world!!!'], range(10)[3:]]

for list in lists:
  print 'list = {}'.format(list)
  print '- first value: {}'.format(get_at(list, 0))
  print '- second value: {}'.format(get_at(list, 1))
  print '- last value: {}'.format(get_at(list, -1))
  print '- 5th value (defaults to "missing"): {}'.format(get_at(list, 4, 'missing'))
  print ''

The above code outputs the following:

list = []
- first value: None
- second value: None
- last value: None
- 5th value (defaults to "missing"): missing

list = ['Hello world!!!']
- first value: Hello world!!!
- second value: None
- last value: Hello world!!!
- 5th value (defaults to "missing"): missing

list = [3, 4, 5, 6, 7, 8, 9]
- first value: 3
- second value: 4
- last value: 9
- 5th value (defaults to "missing"): 7

You may be thinking that there must be another way to do this. There are in fact various other ways to do this. One of them is to use `next(iter(list[index:]), default)` as the return value. The main reason I landed on the solution that uses max() and a ternary statement is because I wanted to minimize the amount of objects created. Using next() along with iter() requires a generator to be created for the sliced list. On the other hand, Python handles all of this pretty efficiently so its possible that the difference between the two solutions is minimal. Either way, now I have a succinct helper function that either gets me a list item or the default value I specify. :cool:

Python – BeautifulSoup – Find All with Lambda Function for Attributes

Today, I had to figure out a way to parse an HTML string in Python in order to find all of the attribute values of attributes starting with a specific string. Since we already have BeautifulSoup installed, I started researching how to use a lambda function in conjunction with the attrs argument of BeautifulSoup#findAll(). Unfortunately, I didn’t figure out a way to use a callable with the attrs argument, but I did with the name:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('
Click
Jump
') elems = soup.findAll(lambda tag:[a for a in tag.attrs if a[0].startswith('custom-')])

After running the above code to find all elements with attributes starting with custom-, I ended up with a list of the following two elements:

[
Click
, Jump]

In order to get a list of all of the attribute values, instead of traversing the attributes of each element returned, I decided to just add a little bit to the lambda function:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('
Click
Jump
') custom_values = [] soup.findAll(lambda tag:[custom_values.append(a[1]) for a in tag.attrs if a[0].startswith('custom-')]) print custom_values

That resulted in this:

[u'Clicker', u'Jumper']

Pretty simple, right! :cool:

Regular Expressions – Extra Ending Match

A few days ago I was working on a very primitive version of a CSV reader for a quick JS project and while testing my regex out, I noticed that I was getting an extra match at the end. Here is the JavaScript function that I had:

function parseCSV(str) {
  var row = [], rows = [row];
  str.replace(/([^",\r\n]*|"((?:[^"]+|"")*)")(,|\r|\r?\n|$)/g, function(match, cell, quoted, delimiter) {
    row.push(quoted ? quoted.replace(/""/g, '"') : cell);
    if (delimiter && delimiter != ',') {
      rows.push(row = []);
    }
  });
  return rows;
}

Interestingly, if you pass "Name,DOB" into parseCSV() an extra cell will be added:

> parseCSV('Name,DOB')
[["Name", "DOB", ""]]

Without diving into the function too much to see why it should work with a good regex, you will notice that finding all of the matches for my CSV parsing regex produces an interesting result:

> 'Name,DOB'.match(/([^",\r\n]*|"((?:[^"]+|"")*)")(,|\r|\r?\n|$)/g)
["Name", "DOB", ""]

After thoroughly analyzing my regex, I started to think there was something wrong with the JS implementation of the regular expression engine. I also thought there might be something wrong with my regular expression so I made a simpler one and saw the following:

> 'hello,world'.match(/[^,]*(?:,|$)/g)
["hello", "world", ""]

That seems pretty strange, right? I continued investigating and asked around the office to see if others thought it was weird and they agreed. Therefore I decided to try it out in Python to see if there was just a peculiarity in the JS engine:

>>> import re
>>> re.findall('[^,]*(?:,|$)', 'hello,world')
['hello', 'world', '']

Finally, I started thinking about how I would create a regular expression engine that would look for all instances of the empty string and still not result in an infinite loop. The reason I did this was because I knew that running the following doesn’t result in an infinite loop in Python:

>>> import re
>>> re.findall('', 'Yo')
['', '', '']

I tried the same thing in JS:

> 'Yo'.match(/(?:)/g)
["", "", ""]

It seems that if the matched substring has a length of zero, the next search will start one character past the match’s starting/ending index. On the other hand, if the match contains at least one character, the next search will start at the index after the last character matched. Therefore, let’s consider my simplified regular expression again:

> 'hello,world'.match(/[^,]*(?:,|$)/g)
["hello", "world", ""]
  1. The first time the regex engine tries to find a match it uses a greedy search to find zero or more word characters right before a comma or the end of the string.
  2. It finds hello, and the end index is 6.
  3. Now since the last match was not the empty string the regex engine doesn’t try to advance the starting position of the next search and simply evaluates against "world".
  4. It finds world and the end index is 11 (5 plus the offset of 6).
  5. Again since the last match was not the empty string the regex engine doesn’t try to advance the starting position of the next search simply evaluates against the empty string.
  6. Since the empty string matches the regular expression, the third string found is the empty string and the ending index remains 11.
  7. Finally the regex engine looks to see if the previous match was the empty string and since it was, it tries to advance the starting index by one but realizes that is outside the bounds of the string and therefore there is no need to continue.

The steps listed above are simply used to validate the reasoning behind why regex engines would find three matches for matching /[^,]*(,|$)/g against "hello,world". In the event that I would want to use a similar regex which doesn’t allow the empty string at the end, I could use /(?!$)[^,]*(?:,|$)/g. In conclusion, even though I thought I knew all of the strange edge cases for regexes in JS, I found that I still have more to learn! 8-)

Python – List Comprehension of Updated Dictionaries

First of all, i’m sure that the title is a bit confusing, but I couldn’t think of a good way to state in a few words the scope of a solution I was looking for today. Basically, I had a list of dictionaries and I wanted to create a new list of dictionaries with an additional key-value pair. For example, let’s say I have a list of dictionaries which define a and b:

my_list = [
    { 'a': 1, 'b': 2 },
    { 'a': 2, 'b': 3 },
    { 'a': 3, 'b': 4 }
]

What I wanted to do was create a new list containing copies of the dictionaries with a new key-value pair. After finding this stackoverflow answer I came up with a solution similar to the following:

import math
list1 = [
    {'a': 1, 'b': 2},
    {'a': 2, 'b': 3},
    {'a': 3, 'b': 4}
]
list2 = [dict(d.items() + [('c', math.hypot(*d.values()))]) for d in list1]
# [
#     {'a': 1, 'c': 2.23606797749979, 'b': 2},
#     {'a': 2, 'c': 3.605551275463989, 'b': 3},
#     {'a': 3, 'c': 5.0, 'b': 4}
# ]

Cool stuff, right? Do you know what is actually happening? d.items() actually turns each key-value pair into a tuple and returns them all as a list. Then we are concatenating our tuple to make a new list of three tuples. In this case doing math.hypot(*args) actually passes each item in the list as a separate argument (here is more information about using an apply-like function in Python). After that dict(...) takes that list of tuples and creates a dictionary from them. Finally, this is done for each dictionary in list1, add the new dictionary to a new list which is assigned to list2.

How cool is that?!?!?! Even though I still am a big fan of JavaScript, the fact that you can write this type of code in Python is pretty inspiring! 8-)

Python – Convert Excel Column Name To Number

A few days I wrote a post about how to convert an cell address into coordinates (column number, and row number). I am working a quick Python script for work to import CSV data into a DB. In order to allow the user the ability to enter either the column number or the column name I used the following function to convert column names into number:

def colNameToNum(name):
    pow = 1
    colNum = 0
    for letter in name[::-1]:
            colNum += (int(letter, 36) -9) * pow
            pow *= 26
    return colNum

As you can see, all you have to do is pass it the name of the column (letters) and the correspond column number will be returned. Have fun! 8-)

JavaScript – Degree & Radian Conversion

Two very simple operations that you may have to deal with if writing a JavaScript that deals with trigonometry are Math.degrees() and Math.radians(). These function can be easily defined as follows:

// Converts from degrees to radians.
Math.radians = function(degrees) {
  return degrees * Math.PI / 180;
};

// Converts from radians to degrees.
Math.degrees = function(radians) {
  return radians * 180 / Math.PI;
};

The name of these functions indicates the end result. Using the above definitions, you could run code such as the following to get the results indicated in the comments:

alert(Math.radians(90));  // 1.5707963267948966
alert(Math.radians(180)); // 3.141592653589793

alert(Math.degrees(1.5707963267948966)); // 90
alert(Math.degrees(3.141592653589793));  // 180

Personally, I don’t have any use for this at the moment, but if you do, feel free to take it and use it in your code. I figured if it is available in a language like Python, it might as well be available in the best scripting language, JavaScript!!! 8-)

POW Answer – Appending = Adding Squares

This post gives the answer to last week’s Problem of the Week.

First of all, I have to admit that this problem actually came from the Quicker Maths blog. This blog actually pulled the problem from IBM’s Ponder this section.

First of all, in order to write the equation algebraicly, we could express the relationship as follows:
x2 + y2 = x + y × 1000

Unfortunately, the difficulty with solving the problem with just pencil and paper is the fact that we have to solve for the variables when they are both integers. For this reason, let’s do it with a programming language such as Python:

arr = []
for x in range(1000, 10000):
    for y in range(x, 10000):
        sum = x * x + y * y
        if sum == x * 10000 + y:
            arr.append("x = {0}, y = {1} => {2}".format(x, y, sum))
        elif sum == y * 10000 + x:
            arr.append("x = {0}, y = {1} => {2}".format(y, x, sum))

print("x * x + y * y = x * 10000 + y where:")
print("\n".join(arr))

Using this code will result in the following being printed out:

x * x + y * y = x * 10000 + y where:
x = 9412, y = 2353 => 94122353

As you can see, this was the brute-force way of solving this problem, but sometimes the brute-force way is the fastest way to come to the correct answer. 8)

Here Document

Something that you may not know about PHP and many other languages is that it provides heredoc syntax to build strings. What does this mean? Check out the following:

$title = 'Test HTML Page';
$myCode = <<<STR
<html>
  <head>
    <title>$title</title>
  </head>
  <body>
    <h1>$title</h1>
    <p>This is a test page to show that "heredoc" syntax in PHP actually works.</p>
  </body>
</html>
STR;

The above puts the following into the $myCode variable as a string:

<html>
  <head>
    <title>Test HTML Page</title>
  </head>
  <body>
    <h1>Test HTML Page</h1>
    <p>This is a test page to show that "heredoc" syntax in PHP actually works.</p>
  </body>
</html>

You can use this syntax in many different languages:

  • Bash
  • C++
  • Lua
  • Perl
  • PHP
  • Python
  • R
  • Racket
  • Ruby
  • Tcl

There are other languages that also support this syntax as well but I think it is important to note that neither Java nor JavaScript support this syntax.  For more information, you can check out the Wikipedia page.

Regular Expression Examples

One of the things that I love about string manipulation is the existence of regular expressions. For this reason, I have decided to share a few examples that may help those who are learning about regular expressions so that can understand them a bit better.


JavaScript – General Variable Name

In many languages, a variable must start off with a letter and may be followed by letters, numbers, and/or the underscore character.  Knowing this, we could use the following regular expression in JavaScript to match a variable name:

var expVarIFlag = /^[A-Z]\w*$/i;

Basically, the above regular expression will only match a string if it matches the pattern for a variable name. The reason that I only put [A-Z] and not [A-Za-z] is because in JavaScript you can specify an “i” flag after the regular expression which indicates that the expression will be case-insensitive. Another thing to note is that I used the \w class which basically represents a word. A word in regular expressions typically means any letter (A to Z regardless of case), any digit (0 to 9), or the underscore character. The reason I used the asterisk instead of the plus sign is because a variable may be just one letter.

NOTE: Although this regular expression may work for other languages, in JavaScript, a variable name can also start off with an underscore or dollar sign.


PostgreSQL – Date (MM/DD/YYYY)

Even though using a regular expression shouldn’t be the way to completely validate a date, you can do so partially with the following in PostgreSQL:

SELECT id, text
FROM answers
WHERE text ~ '^(0\\d|1[012])/([012]\\d|3[01])/\\d{4}$';

The above query will pull all of the answers with a text that matches the pattern to see if it looks like a valid date.

  1. First it specifies that the first two characters are a 0 followed by another digit or a 1 followed by either a 0, 1, or 2.
  2. Next should come a forward slash.
  3. Next should be either…
    1.  0, 1, or 2 followed by any digit
    2. or 3 followed by a 0 or 1
  4. Finally should be another forward slash followed by four digits.

One thing to notice is that in order to properly escape the class inside of a string (which is what we have to do here in PostgreSQL), you have to escape the backslash so that it will be interpreted as one backslash in front of the next character thus rendering “\\w” as “\w“.


PHP – Hexadecimal Color Code

In CSS, a color code can be in many different forms. One accepted form is hexadecimal. The hex form can be three characters or six characters long. It can start off with a number sign, but this symbol isn’t required. Knowing all of this, we could use the following in PHP to validate the hex color:

$pattern = '/^#?([0-9A-F]{3}){1,2}$/i';
$validHex = preg_match($pattern, $_GET['hex']);

The preg_match() function is used to validate the GET parameter called “hex” against our regular expression:

  1. First it specifies that the first character may be a number sign (#).
  2. Next I have defined a parenthesized group which matches any three hexadecimal digits.
  3. After that, I am specifying that my parenthesized group pattern may appear once or two times in a row and that no other characters should follow.
  4. Finally, you will notice that I am again using the “i” flag to indicate that this is a case-insensitive pattern.

Python – Simple Image File Names

Let’s use Python now to check to see if a file name looks like a valid image name:

# Import the regular expression library
import re

# Defining the compiled regular expression.
pat = "^[^/\\?%*:|\"<>]+\\.(jpg|png|gif|bmp)$"
reImg = re.compile(pat, re.I)

# Getting the file name from the user
fileName = raw_input("File name:  ")

# Determine if the file name is an image name
isImage = reImg.match(fileName) is not None

The regular expression created does the following:

  1. First makes sure that the string starts off with one more characters which are none of the following:  /  \  ?  %  *  :  |  ”  <  >
  2. In the end it checks that a dot is found followed by one of the following extensions which must appear at the end of the string:
    1. jpg
    2. png
    3. gif
    4. bmp
  3. It is also important to note that by using “re.I“, I specified that casing would be ignored.

The code should basically prompt the user for a file name and then validate the string entered to determine if it matches the regular expression for an image.  The boolean value indicating whether or not it is an image is stored in the isImage variable.


VBScript – Format Large Integer With Commas

The following is how you could use a regular expression to insert commas into a number (integer):

' Setup the RegExp for testing if input is an integer.
Dim re : Set re = new RegExp
re.Pattern = "^(0|-?[1-9]\d*)$"

' Get the input integer from the user.
input = InputBox("Enter an integer", "Your Integer", 123456789)

' If the input is an integer...
If re.Test(input) Then
  ' Modify the pattern to input the commas correctly.
  re.Pattern = "(\d)(?=(\d{3})+$)"
  re.Global = True

  ' Reformat the integer, if given.
  newInput = re.Replace(input, "$1,")

  ' Display the input formatted with commas.
  MsgBox input & " became " & newInput

' If the input is not an integer, tell the user so.
Else
  MsgBox "The input given wasn't recognized as an integer."
End If

The first regular expression basically tests to make sure that the input is either simply a zero or one or more digits with the first one being non-zero. In other words, the first pattern makes sure that the input is an integer that doesn’t start with a zero (unless it is zero). The second regular expression is what is used to insert the comma(s) in the right place(s). It finds every instance in which one digit is followed by at least one group of three digits. By starting the group off with “?=” I am ensuring that the matched group will not be skipped on the next pass through.