Soupy is a wrapper around BeautifulSoup that makes it easier to search through HTML and XML documents.
from soupy import Soupy, Q
html = """
<div id="main">
<div>The web is messy</div>
and full of traps
<div>but Soupy loves you</div>
</div>"""
print(Soupy(html).find(id='main').children
.each(Q.text.strip()) # extract text from each node, trim whitespace
.filter(len) # remove empty strings
.val()) # dump out of Soupy
[u'The web is messy', u'and full of traps', u'but Soupy loves you']
Compare to the same task in BeautifulSoup:
from bs4 import BeautifulSoup, NavigableString
html = """
<div id="main">
<div>The web is messy</div>
and full of traps
<div>but Soupy loves you</div>
</div>"""
result = []
for node in BeautifulSoup(html).find(id='main').children:
if isinstance(node, NavigableString):
text = node.strip()
else:
text = node.text.strip()
if len(text):
result.append(text)
print(result)
[u'The web is messy', u'and full of traps', u'but Soupy loves you']
Soupy uses BeautifulSoup under the hood and provides a very similar API, while smoothing over some of the warts in BeautifulSoup. Soupy also adds a functional interface for chaining together operations, gracefully dealing with failed searches, and extracting data into simpler formats.
pip install soupy
or download the GitHub source.
Contents: