.. soupy documentation master file, created by
sphinx-quickstart on Thu Apr 2 18:06:34 2015.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to the Soupy documentation
==================================
.. currentmodule:: soupy
Soupy is a wrapper around `BeautifulSoup `_ that makes it easier to
search through HTML and XML documents.
.. testcode::
from soupy import Soupy, Q
html = """
The web is messy
and full of traps
but Soupy loves you
"""
print(Soupy(html).find(id='main').children
.each(Q.text.strip()) # extract text from each node, trim whitespace
.filter(len) # remove empty strings
.val()) # dump out of Soupy
.. testoutput::
[u'The web is messy', u'and full of traps', u'but Soupy loves you']
Compare to the same task in BeautifulSoup:
.. testcode::
from bs4 import BeautifulSoup, NavigableString
html = """
The web is messy
and full of traps
but Soupy loves you
"""
result = []
for node in BeautifulSoup(html).find(id='main').children:
if isinstance(node, NavigableString):
text = node.strip()
else:
text = node.text.strip()
if len(text):
result.append(text)
print(result)
.. testoutput::
[u'The web is messy', u'and full of traps', u'but Soupy loves you']
Soupy uses BeautifulSoup under the hood and provides a very similar API,
while smoothing over some of the warts in BeautifulSoup. Soupy also
adds a functional interface for chaining together operations,
gracefully dealing with failed searches, and extracting data into
simpler formats.
Installation
------------
::
pip install soupy
or download the `GitHub source `_.
Contents:
.. toctree::
:maxdepth: 3
getting_started.rst
api.rst
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`