Today, I had to figure out a way to parse an HTML string in Python in order to find all of the attribute values of attributes starting with a specific string. Since we already have BeautifulSoup installed, I started researching how to use a lambda function in conjunction with the attrs argument of BeautifulSoup#findAll(). Unfortunately, I didn’t figure out a way to use a callable with the attrs argument, but I did with the name:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<div><div custom-click="Clicker">Click</div><span custom-jump="Jumper">Jump</span></div>')
elems = soup.findAll(lambda tag:[a for a in tag.attrs if a[0].startswith('custom-')])

After running the above code to find all elements with attributes starting with custom-, I ended up with a list of the following two elements:

[<div custom-click="Clicker">Click</div>, <span custom-jump="Jumper">Jump</span>]

In order to get a list of all of the attribute values, instead of traversing the attributes of each element returned, I decided to just add a little bit to the lambda function:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<div><div custom-click="Clicker">Click</div><span custom-jump="Jumper">Jump</span></div>')
custom_values = []
soup.findAll(lambda tag:[custom_values.append(a[1]) for a in tag.attrs if a[0].startswith('custom-')])
print custom_values

That resulted in this:

[u'Clicker', u'Jumper']

Pretty simple, right! 😎

Categories: BlogPython

Leave a Reply

Your email address will not be published. Required fields are marked *