Today, I had to figure out a way to parse an HTML string in Python in order to find all of the attribute values of attributes starting with a specific string. Since we already have BeautifulSoup installed, I started researching how to use a lambda function in conjunction with the attrs
argument of BeautifulSoup#findAll()
. Unfortunately, I didn’t figure out a way to use a callable with the attrs
argument, but I did with the name
:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('ClickJump')
elems = soup.findAll(lambda tag:[a for a in tag.attrs if a[0].startswith('custom-')])
After running the above code to find all elements with attributes starting with custom-
, I ended up with a list of the following two elements:
[Click, Jump]
In order to get a list of all of the attribute values, instead of traversing the attributes of each element returned, I decided to just add a little bit to the lambda function:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('ClickJump')
custom_values = []
soup.findAll(lambda tag:[custom_values.append(a[1]) for a in tag.attrs if a[0].startswith('custom-')])
print custom_values
That resulted in this:
[u'Clicker', u'Jumper']
Pretty simple, right! š