sep/sep-001.rst
======= ============================================
SEP 1
Title API for populating item fields (comparison)
Author Ismael Carnales, Pablo Hoffman, Daniel Grana
Created 2009-07-19
Status Obsoleted by :ref:sep-008
======= ============================================
This page shows different usage scenarios for the two new proposed API for populating item field values (which will replace the old deprecated !RobustItem API) and compares them. One of these will be chosen as the recommended (and supported) mechanism in Scrapy 0.7.
attribute(field_name, selector_or_value, **modifiers_and_adaptor_args).. note:: attribute() modifiers (like add=True) are passed together
with adaptor args as keyword arguments (this is ugly)
__init__(response, item=None, **adaptor_args)
ItemForm with a item instance with predefined adaptor arguments__setitem__(field_name, selector_or_value)
__getitem__(field_name)
None if not set.get_item()
__init__(response, item=None, **adaptor_args)
ItemBuilder with predefined adaptor argumentsadd_value(field_name, selector_or_value, **adaptor_args)
replace_value(field_name, selector_or_value, **adaptor_args)
get_value(field_name)
None if not set.get_item()
Pros:
Cons:
Neutral:
__add__ and list.append() methodPros:
Cons:
Neutral:
ItemForm
.. code-block:: python
#!python
class NewsForm(ItemForm):
item_class = NewsItem
url = adaptor(extract, remove_tags(), unquote(), strip)
headline = adaptor(extract, remove_tags(), unquote(), strip)
ItemBuilder
.. code-block:: python
#!python class NewsBuilder(ItemBuilder): item_class = NewsItem
url = adaptor(extract, remove_tags(), unquote(), strip)
headline = adaptor(extract, remove_tags(), unquote(), strip)
ItemForm
.. code-block:: python
#!python
ia = NewsForm(response)
ia["url"] = response.url
ia["headline"] = x.x('//h1[@class="headline"]')
# if we want to add another value to the same field
ia["headline"] += x.x('//h1[@class="headline2"]')
# if we want to replace the field value other value to the same field
ia["headline"] = x.x('//h1[@class="headline3"]')
return ia.get_item()
ItemBuilder
.. code-block:: python
#!python il = NewsBuilder(response) il.add_value("url", response.url) il.add_value("headline", x.x('//h1[@class="headline"]'))
il.add_value("headline", x.x('//h1[@class="headline2"]'))
il.replace_value("headline", x.x('//h1[@class="headline3"]'))
return il.get_item()
ItemForm
.. code-block:: python
#!python
class SiteNewsFrom(NewsForm):
published = adaptor(HtmlNewsForm.published, to_date("%d.%m.%Y"))
ItemBuilder
.. code-block:: python
#!python class SiteNewsBuilder(NewsBuilder): published = adaptor(HtmlNewsBuilder.published, to_date("%d.%m.%Y"))
ItemForm
.. code-block:: python
#!python
ia = NewsForm(response)
ia["headline"] = x.x('//h1[@class="headline"]')
if not ia["headline"]:
ia["headline"] = x.x('//h1[@class="title"]')
ItemBuilder
.. code-block:: python
#!python il = NewsBuilder(response) il.add_value("headline", x.x('//h1[@class="headline"]')) if not nf.get_value("headline"): il.add_value("headline", x.x('//h1[@class="title"]'))
ItemForm
.. code-block:: python
#!python
ia["headline"] += x.x('//h1[@class="headline"]')
ItemBuilder
.. code-block:: python
#!python il.add_value("headline", x.x('//h1[@class="headline"]'))
ItemForm
.. code-block:: python
#!python
# Only approach is passing arguments when instantiating the form
ia = NewsForm(response, default_unit="cm")
ia["width"] = x.x('//p[@class="width"]')
ItemBuilder
.. code-block:: python
#!python il.add_value("width", x.x('//p[@class="width"]'), default_unit="cm")
il = NewsBuilder(response, default_unit="cm") il.add_value("width", x.x('//p[@class="width"]'))
ItemForm
.. code-block:: python
#!python
class MySiteForm(ItemForm):
width = adaptor(ItemForm.width, default_unit="cm")
volume = adaptor(ItemForm.width, default_unit="lt")
ia["width"] = x.x('//p[@class="width"]')
ia["volume"] = x.x('//p[@class="volume"]')
# another example passing parameters on instance
ia = NewsForm(response, encoding="utf-8")
ia["name"] = x.x('//p[@class="name"]')
ItemBuilder
.. code-block:: python
#!python il.add_value("width", x.x('//p[@class="width"]'), default_unit="cm") il.add_value("volume", x.x('//p[@class="volume"]'), default_unit="lt")