Today, I had to figure out a way to parse an HTML string in Python in order to find all of the attribute values of attributes starting with a specific string. Since we already have BeautifulSoup installed, I started researching how to use a lambda function in conjunction with the attrs argument of BeautifulSoup#findAll(). Unfortunately, I didn’t figure out a way to use a callable with the attrs argument, but I did with the name:


from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('
Click
Jump
') elems = soup.findAll(lambda tag:[a for a in tag.attrs if a[0].startswith('custom-')])

After running the above code to find all elements with attributes starting with custom-, I ended up with a list of the following two elements:


[
Click
, Jump]

In order to get a list of all of the attribute values, instead of traversing the attributes of each element returned, I decided to just add a little bit to the lambda function:


from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('
Click
Jump
') custom_values = [] soup.findAll(lambda tag:[custom_values.append(a[1]) for a in tag.attrs if a[0].startswith('custom-')]) print custom_values

That resulted in this:


[u'Clicker', u'Jumper']

Pretty simple, right! šŸ˜Ž

Categories: BlogPython

Leave a Reply

Your email address will not be published. Required fields are marked *