Soupy API Documentation

Main Wrapper Classes

class soupy.Node(value)

The Node class is the main wrapper around BeautifulSoup elements like Tag. It implements many of the same properties and methods as BeautifulSoup for navigating through documents, like find, select, parents, etc.

dump(**kwargs)

Extract derived values into a Scalar(dict)

The keyword names passed to this function become keys in the resulting dictionary.

The keyword values are functions that are called on this Node.

Notes

  • The input functions are called on the Node, not the underlying BeautifulSoup element
  • If the function returns a wrapper, it will be unwrapped

Example

>>> soup = Soupy("<b>hi</b>").find('b')
>>> data = soup.dump(name=Q.name, text=Q.text).val()
>>> data == {'text': 'hi', 'name': 'b'}
True
val()

Return the value inside a wrapper.

Raises NullValueError if called on a Null object

orelse(value)

Provide a fallback value for failed matches.

Examples

>>> Scalar(5).orelse(10).val()
5
>>> Null().orelse(10).val()
10
nonnull()

Require that a node is not null

Null values will raise NullValueError, whereas nonnull values return self.

useful for being strict about portions of queries.

Examples

node.find(‘a’).nonnull().find(‘b’).orelse(3)

This will raise an error if find(‘a’) doesn’t match, but provides a fallback if find(‘b’) doesn’t match.

require(func, msg=u'Requirement violated')

Assert that self.apply(func) is True.

Parameters:
  • func – func(wrapper)
  • msg – str The error message to display on failure
Returns:

If self.apply(func) is True, returns self. Otherwise, raises NullValueError.

attrs

A Scalar of this Node’s attribute dictionary

Example

>>> Soupy("<a val=3></a>").find('a').attrs
Scalar({'val': '3'})
children

A Collection of the child elements.

contents

A Collection of the child elements.

descendants

A Collection of all elements nested inside this Node.

find(*args, **kwargs)

Find a single Node among this Node’s descendants.

Returns NullNode if nothing matches.

This inputs to this function follow the same semantics as BeautifulSoup. See http://bit.ly/bs4doc for more info.

Examples

  • node.find(‘a’) # look for a tags
  • node.find(‘a’, ‘foo’) # look for a tags with class=`foo`
  • node.find(func) # find tag where func(tag) is True
  • node.find(val=3) # look for tag like <a, val=3>
find_all(*args, **kwargs)

Like find(), but selects all matches (not just the first one).

Returns a Collection.

If no elements match, this returns a Collection with no items.

find_next_sibling(*args, **kwargs)

Like find(), but searches through next_siblings

find_next_siblings(*args, **kwargs)

Like find_all(), but searches through next_siblings

find_parent(*args, **kwargs)

Like find(), but searches through parents

find_parents(*args, **kwargs)

Like find_all(), but searches through parents

find_previous_sibling(*args, **kwargs)

Like find(), but searches through previous_siblings

find_previous_siblings(*args, **kwargs)

Like find_all(), but searches through previous_siblings

name

A Scalar of this Node’s tag name.

Example

>>> node = Soupy('<p>hi there</p>').find('p')
>>> node
Node(<p>hi there</p>)
>>> node.name
Scalar('p')
next_sibling

The Node sibling after this, or NullNode

next_siblings

A Collection of all siblings after this node

parent

The parent Node, or NullNode

parents

A Collection of the parents elements.

previous_sibling

The Node sibling prior to this, or NullNode

previous_siblings

A Collection of all siblings before this node

select(selector)

Like find_all(), but takes a CSS selector string as input.

text

A Scalar of this Node’s text.

Example

>>> node = Soupy('<p>hi there</p>').find('p')
>>> node
Node(<p>hi there</p>)
>>> node.text
Scalar(u'hi there')
class soupy.Collection(items)

Collection’s store lists of other wrappers.

They support most of the list methods (len, iter, getitem, etc).

apply(func)

Call a function on a wrapper, and wrap the result if necessary.

Parameters:func – function(wrapper) -> val

Examples

>>> s = Scalar(5)
>>> s.apply(lambda val: isinstance(val, Scalar))
Scalar(True)
map(func)

Call a function on a wrapper’s value, and wrap the result if necessary.

Parameters:func – function(val) -> val

Examples

>>> s = Scalar(3)
>>> s.map(Q * 2)
Scalar(6)
all()

Scalar(True) if all items are truthy, or collection is empty.

any()

Scalar(True) if any items are truthy. False if empty.

count()

Return the number of items in the collection, as a Scalar

dictzip(keys)

Turn this collection into a Scalar(dict), by zipping keys and items.

Parameters:keys – list or Collection of NavigableStrings The keys of the dictionary

Examples

>>> c = Collection([Scalar(1), Scalar(2)])
>>> c.dictzip(['a', 'b']).val() == {'a': 1, 'b': 2}
True
dropwhile(func)

Return a new Collection with the first few items removed.

Parameters:func – function(Node) -> Node
Returns:A new Collection, discarding all items before the first item where bool(func(item)) == True
dump(*args, **kwargs)

Build a list of dicts, by calling Node.dump() on each item.

Each keyword provides a function that extracts a value from a Node.

Examples

>>> c = Collection([Scalar(1), Scalar(2)])
>>> c.dump(x2=Q*2, m1=Q-1).val()
[{'x2': 2, 'm1': 0}, {'x2': 4, 'm1': 1}]
each(*funcs)

Call func on each element in the collection.

If multiple functions are provided, each item in the output will be a tuple of each func(item) in self.

Returns a new Collection.

Example

>>> col = Collection([Scalar(1), Scalar(2)])
>>> col.each(Q * 10)
Collection([Scalar(10), Scalar(20)])
>>> col.each(Q * 10, Q - 1)
Collection([Scalar((10, 0)), Scalar((20, 1))])
filter(func)

Return a new Collection with some items removed.

Parameters:func – function(Node) -> Node
Returns:A new Collection consisting of the items where bool(func(item)) == True

Examples

node.find_all(‘a’).filter(Q[‘href’].startswith(‘http’))

first()

Return the first element of the collection, or Null

iter_val()

An iterator version of val()

none()

Scalar(True) if no items are truthy, or collection is empty.

takewhile(func)

Return a new Collection with the last few items removed.

Parameters:func – function(Node) -> Node
Returns:A new Collection, discarding all items at and after the first item where bool(func(item)) == False

Examples

node.find_all(‘tr’).takewhile(Q.find_all(‘td’).count() > 3)

val()

Unwraps each item in the collection, and returns as a list

zip(*others)

Zip the items of this collection with one or more other sequences, and wrap the result.

Unlike Python’s zip, all sequences must be the same length.

Parameters:others – One or more iterables or Collections
Returns:A new collection.

Examples

>>> c1 = Collection([Scalar(1), Scalar(2)])
>>> c2 = Collection([Scalar(3), Scalar(4)])
>>> c1.zip(c2).val()
[(1, 3), (2, 4)]
class soupy.Scalar(value)

A wrapper around single values.

Scalars support boolean testing (<, ==, etc), and use the wrapped value in the comparison. They return the result as a Scalar(bool).

Calling a Scalar calls the wrapped value, and wraps the result.

Examples

>>> s = Scalar(3)
>>> s > 2
Scalar(True)
>>> s.val()
3
>>> s + 5
Scalar(8)
>>> s + s
Scalar(6)
>>> bool(Scalar(3))
True
>>> Scalar(lambda x: x+2)(5)
Scalar(7)

Null Wrappers

class soupy.NullValueError

The NullValueError exception is raised when attempting to extract values from Null objects

class soupy.Null

The class for ill-defined Scalars.

class soupy.NullNode

NullNode is returned when a query doesn’t match any node in the document.

children

Returns the NullCollection

contents

Returns the NullCollection

descendants

Returns the NullCollection

dump(**kwargs)

Returns Null

find(*args, **kwargs)

Returns NullNode

find_all(*args, **kwargs)

Returns NullCollection

find_next_sibling(*args, **kwargs)

Returns NullNode

find_next_siblings(*args, **kwargs)

Returns NullCollection

find_parent(*args, **kwargs)

Returns NullNode

find_parents(*args, **kwargs)

Returns NullCollection

find_previous_sibling(*args, **kwargs)

Returns NullNode

find_previous_siblings(*args, **kwargs)

Returns NullCollection

next_sibling

Returns the NullNode

next_siblings

Returns the NullCollection

parent

Returns the NullNode

parents

Returns the NullCollection

previous_sibling

Returns the NullNode

previous_siblings

Returns the NullCollection

select(selector)

Returns NullCollection

class soupy.NullCollection

Represents in invalid Collection.

Returned by some methods on other Null objects.

Expressions

class soupy.Expression

Soupy expressions are a shorthand for building single-argument functions.

Users should use the Q object, which is just an instance of Expression.

debug_()

Returns debugging information for the previous error raised during expression evaluation.

Returns a QDebug namedtuple with four fields:

  • expr is the last full expression to have raised an exception
  • inner_expr is the specific sub-expression that raised the exception
  • val is the value that expr tried to evaluate.
  • inner_val is the value that inner_expr tried to evaluate

If no exceptions have been triggered from expression evaluation, then each field is None.

Examples

>>> Scalar('test').map(Q.upper().foo)
Traceback (most recent call last):
...
AttributeError: 'str' object has no attribute 'foo'
...
>>> dbg = Q.debug_()
>>> dbg.expr
Q.upper().foo
>>> dbg.inner_expr
.foo
>>> dbg.val
'test'
>>> dbg.inner_val
'TEST'
eval_(val)

Pass the argument val to the function, and return the result.

This special method is necessary because the __call__ method builds a new function stead of evaluating the current one.

class soupy.QDebug

Namedtuple that holds information about a failed expression evaluation.

Table Of Contents

Related Topics

This Page