Back to Scrapy

SEP-015: ScrapyManager and SpiderManager API refactoring

sep/sep-015.rst

2.15.21.6 KB
Original Source

======= ============================================== SEP 15 Title ScrapyManager and SpiderManager API refactoring Author Insophia Team Created 2010-03-10 Status Final ======= ==============================================

======================================================== SEP-015: ScrapyManager and SpiderManager API refactoring

This SEP proposes a refactoring of ScrapyManager and SpiderManager APIs.

SpiderManager

  • get(spider_name) -> Spider instance

  • find_by_request(request) -> list of spider names

  • list() -> list of spider names

  • remove fromdomain(), fromurl()

ScrapyManager

  • crawl_request(request, spider=None)

    • calls SpiderManager.find_by_request(request) if spider is None
    • fails if len(spiders returned) != 1
  • crawl_spider(spider)

    • calls spider.start_requests()
  • crawl_spider_name(spider_name)

    • calls SpiderManager.get(spider_name)
    • calls spider.start_requests()
  • crawl_url(url)

    • calls spider.make_requests_from_url()
  • remove crawl(), runonce()

Instead of using runonce(), commands (such as crawl/parse) would call crawl_* and then start().

Changes to Commands

  • if is_url(arg):
    • calls ScrapyManager.crawl_url(arg)
  • else:
    • calls ScrapyManager.crawl_spider_name(arg)

Pending issues

  • should we rename ScrapyManager.crawl_* to schedule_* or add_* ?
  • SpiderManager.find_by_request or SpiderManager.search(request=request) ?