sep/sep-007.rst
======= ============================= SEP 7 Title ItemLoader processors library Author Ismael Carnales Created 2009-08-10 Status Draft ======= =============================
This SEP proposes a library of ItemLoader processor to ship with Scrapy.
to_dateConverts a date string to a YYYY-MM-DD one suitable for DateField
Decision: Obsolete. DateField doesn't exists anymore.
extractThis adaptor tries to extract data from the given locations. Any
XPathSelector in it will be extracted, and any other data will be added
as-is to the result.
Decision: Obsolete. Functionality included in XpathLoader.
ExtractImageLinks
This adaptor may receive either XPathSelectors pointing to the desired locations for finding image urls, or just a list of XPath expressions (which will be turned into selectors anyway).
Decision: XXX
remove_tagsFactory that returns an adaptor for removing each tag in the tags parameter
found in the given value. If no tags are specified, all of them are
removed.
Decision: XXX
remove_rootThis adaptor removes the root tag of the given string/unicode, if it's found.
Decision: XXX
replace_escapeFactory that returns an adaptor for removing/replacing each escape character in
the wich_ones parameter found in the given value.
Decision: XXX
unquoteThis factory returns an adaptor that receives a string or unicode, removes all
of the CDATAs and entities (except the ones in CDATAs, and the ones you specify
in the keep parameter) and then, returns a new string or unicode.
Decision: XXX
to_unicodeReceives a string and converts it to unicode using the given encoding (if specified, else utf-8 is used) and returns a new unicode object. E.g:
::
to_unicode('it costs 20\xe2\x82\xac, or 30\xc2\xa3') [u'it costs 20\u20ac, or 30\xa3']
Decision: XXX
clean_spacesConverts multispaces into single spaces for the given string. E.g:
::
clean_spaces(u'Hello sir') u'Hello sir'
Decision: XXX
drop_emptyRemoves any index that evaluates to None from the provided iterable. E.g:
::
drop_empty([0, 'this', None, 'is', False, 'an example']) ['this', 'is', 'an example']
Decision: Obsolete. Functionality included in reducers.
delistThis factory returns and adaptor that joins an iterable with the specified delimiter.
Decision: Obsolete. Functionality included in reducers.
RegexThis adaptor must receive either a list of strings or an XPathSelector and return a new list with the matches of the given strings with the given regular expression (which is passed by a keyword argument, and is mandatory for this adaptor).
Decision: XXX