docs/About-Projects.md
In most cases, a project is one script you write for one website.
from projects import other_projectTODO, STOP, CHECKING, DEBUG and RUNNING
TODO - a script is just created to be writtenSTOP - you can mark a project as STOP if you want it to STOP (= =).CHECKING - when a running project is modified, to prevent incomplete modification, project status will be set as CHECKING automatically.DEBUG/RUNNING - these two status have no difference to spider. But it's good to mark it as DEBUG when it's running the first time then change it to RUNNING after being checked.rate and burst with token-bucket algorithm.
rate - how many requests in one secondburst - consider this situation, rate/burst = 0.1/3, it means that the spider scrawls 1 page every 10 seconds. All tasks are finished, project is checking last updated items every minute. Assume that 3 new items are found, pyspider will "burst" and crawl 3 tasks without waiting 3*10 seconds. However, the fourth task needs wait 10 seconds.group to delete and status to STOP, wait 24 hours.on_finished callbackYou can override on_finished method in the project, the method would be triggered when the task_queue goes to 0.
Example 1: When you start a project to crawl a website with 100 pages, the on_finished callback will be fired when 100 pages are successfully crawled or failed after retries.
Example 2: A project with auto_recrawl tasks will NEVER trigger the on_finished callback, because time queue will never become 0 when there are auto_recrawl tasks in it.
Example 3: A project with @every decorated method will trigger the on_finished callback every time when the newly submitted tasks are finished.