Welcome to the Soupy documentation

Soupy is a wrapper around BeautifulSoup that makes it easier to search through HTML and XML documents.

from soupy import Soupy, Q

html = """
<div id="main">
  <div>The web is messy</div>
  and full of traps
  <div>but Soupy loves you</div>
</div>"""

print(Soupy(html).find(id='main').children
      .each(Q.text.strip()) # extract text from each node, trim whitespace
      .filter(len)          # remove empty strings
      .val())               # dump out of Soupy
[u'The web is messy', u'and full of traps', u'but Soupy loves you']

Compare to the same task in BeautifulSoup:

from bs4 import BeautifulSoup, NavigableString

html = """
<div id="main">
  <div>The web is messy</div>
  and full of traps
  <div>but Soupy loves you</div>
</div>"""

result = []
for node in BeautifulSoup(html).find(id='main').children:
    if isinstance(node, NavigableString):
        text = node.strip()
    else:
        text = node.text.strip()
    if len(text):
        result.append(text)

print(result)
[u'The web is messy', u'and full of traps', u'but Soupy loves you']

Soupy uses BeautifulSoup under the hood and provides a very similar API, while smoothing over some of the warts in BeautifulSoup. Soupy also adds a functional interface for chaining together operations, gracefully dealing with failed searches, and extracting data into simpler formats.

Indices and tables

About

Soupy makes wrangling web documents better.

Useful Links

This Page