sep/sep-012.rst
======= ============================== SEP 12 Title Spider name Author Ismael Carnales, Pablo Hoffman Created 2009-12-01 Updated 2010-03-23 Status Final ======= ==============================
The spiders are currently referenced by its domain_name attribute. This SEP
proposes adding a name attribute to spiders and using it as their
identifier.
domain_name and putting the
real domains in the extra_domain_names attributes)domain_name and extra_domain_names.name attribute to spiders and use it as their unique identifier.domain_name and extra_domain_names attributes in a single
list allowed_domains.In general, all references to spider.domain_name will be replaced by
spider.name
OffsiteMiddleware will use spider.allowed_domains for determining the
domain names of a spider
crawl
The new syntax for crawl command will be:
::
crawl [options] <spider|url> ...
If you provide an url, it will try to find the spider the processes it. If no
spider is found or more than one spider is found, it will raise an error. So,
to crawl in those cases you must set the spider to use using the ``--spider``
option
genspider
The new signature for genspider will be:
::
genspider [options] <name> <domain>
example:
::
$ scrapy-ctl genspider google google.com
$ ls project/spiders/ project/spiders/google.py
$ cat project/spiders/google.py
.. code-block:: python
class GooglecomSpider(BaseSpider): name = "google" allowed_domains = ["google.com"]
.. note:: spider_allowed_domains becomes optional as only OffsiteMiddleware uses it.