docs/About-Tasks.md
Tasks are the basic unit to be scheduled.
taskid. (Default: md5(url), can be changed by overriding the def get_taskid(self, task) method)priority.When a new task (never seen before) comes in:
exetime is set but not arrived, it will be put into a time-based queue to wait.When the task is already in the queue:
force_updateWhen a completed task comes out:
age is set, last_crawl_time + age < now it will be accepted. Otherwise discarded.itag is set and not equal to it's previous value, it will be accepted. Otherwise discarded.When a fetch error or script error happens, the task will retry 3 times by default.
The first retry will execute every time after 30 seconds, 1 hour, 6 hours, 12 hours and any more retries will postpone 24 hours.
If age is specified, the retry delay will not larger then age.
You can config the retry delay by adding a variable named retry_delay to handler. retry_delay is a dict to specify retry intervals. The items in the dict are {retried: seconds}, and a special key: '' (empty string) is used to specify the default retry delay if not specified.
e.g. the default retry_delay declares like:
class MyHandler(BaseHandler):
retry_delay = {
0: 30,
1: 1*60*60,
2: 6*60*60,
3: 12*60*60,
'': 24*60*60
}