docs/TASK_SCHEDULER.md
This document describes how the task system schedules and runs queue tasks across three deployment modes: local timer, per-daemon shared scheduler, and serverless. It includes design rationale (WHY) for each choice.
The task system is the single place for when scheduled work runs. Only tasks with tag queue are polled by the scheduler; other tasks (approval, follow-up, etc.) are stored and executed only when explicitly triggered (e.g. choice action or executeTaskById).
Why one scheduler: Recurring work (batcher drains, cron-like use) shares the same DB, same pause/resume, same visibility (getTaskStatus, nextRunAt, lastError). Retry and backoff live in one place to avoid infinite retry storms.
Why this document: The scheduler can run in three ways depending on how the host is deployed. Understanding the modes and the getTasks(agentIds) contract avoids bugs and makes serverless/daemon integration straightforward.
| Mode | When it applies | Who drives ticks | DB queries per tick |
|---|---|---|---|
| Local timer | Default: no daemon, not serverless | TaskService setInterval | 1 per runtime |
| Daemon | Host called startTaskScheduler(adapter) | Shared module timer | 1 for all registered runtimes |
| Serverless | runtime.serverless === true | Host calls taskService.runDueTasks() | On each runDueTasks() call |
Why three modes: Single-process apps keep a simple local timer. Multi-process or multi-agent daemons need one shared timer and one batched getTasks(agentIds) to avoid N queries per second. Serverless has no long-lived process, so the host (cron or request handler) must explicitly run due tasks.
getTaskSchedulerAdapter() returns null) and runtime.serverless is not true.setInterval(..., TICK_INTERVAL). Every tick it calls checkTasks(): if tasksDirty, it fetches queue tasks for this agent and calls runTick(tasks).tasksDirty flag avoids redundant getTasks when nothing changed (e.g. no createTask/updateTask/deleteTask since last tick).startTaskScheduler(databaseAdapter) before starting agent runtimes. TaskService then sees getTaskSchedulerAdapter() != null and registers with the scheduler instead of starting a local timer.TICK_INTERVAL_MS (e.g. 1000 ms).markTaskSchedulerDirty(agentId) called), then calls adapter.getTasks({ tags: ["queue"], agentIds }) once.task.agentId; each registered runtime’s TaskService.runTick(tasks) is invoked with only that agent’s tasks.getTasks every second. Batching by agentIds in a single query reduces DB load and keeps scheduling logic in one place.markDirty() → markTaskSchedulerDirty()) are included. Why we still tick when dirty is empty: The first tick after registration uses the snapshot of dirty agents; if none, the tick no-ops. So we only query when there is at least one dirty agent.@elizaos/core Node build): startTaskScheduler, stopTaskScheduler, getTaskSchedulerAdapter, registerTaskSchedulerRuntime, unregisterTaskSchedulerRuntime, markTaskSchedulerDirty.Usage (host):
import { startTaskScheduler, stopTaskScheduler } from "@elizaos/core";
import { someDatabaseAdapter } from "./db";
startTaskScheduler(someDatabaseAdapter);
// … create runtimes, run agents …
// On shutdown:
stopTaskScheduler();
TaskService automatically registers on start and unregisters on stop when the daemon is present.
{ serverless: true } (or equivalent). No long-lived process; no timer.taskService.runDueTasks() from a cron job or on each request to run due queue tasks once.runDueTasks() gives the host full control over when and how often tasks run (e.g. once per request, or on a fixed cron schedule).runtime.getService(ServiceType.TASK) then (service as TaskService).runDueTasks(). This performs one getTasks({ tags: ["queue"], agentIds: [runtime.agentId] }) and then runTick(tasks).Note: In serverless mode, markDirty() has no effect on when tasks run (there is no tick loop). It is safe to call but does not change behavior; the next runDueTasks() will query the DB anyway.
All task queries that are used for scheduling or multi-tenant filtering use the batch API:
getTasks(params: {
roomId?: UUID;
tags?: string[];
entityId?: UUID;
agentIds: UUID[]; // required
limit?: number;
offset?: number;
}): Promise<Task[]>;
Why agentIds is required (array, not optional agentId):
agent_id; filtering by agentIds keeps queries efficient and prevents one agent from seeing another’s tasks.task.agentId and dispatches to the right TaskService. A single agentId would force N separate calls for N agents.agentIds (and using an array) forces every caller to pass a list (e.g. [this.runtime.agentId]). No implicit “current agent” that could be wrong in shared adapters.Empty agentIds: Adapters return [] without querying. Why: Avoids expensive “all tasks” queries by mistake; daemon never passes an empty list because it only ticks when the dirty set is non-empty.
Call sites: All call sites (TaskService, approval, follow-up, choice, status, autonomy, etc.) pass agentIds: [runtime.agentId] or the batch list from the daemon. See audit in codebase for the full list.
runTick(tasks: Task[]): Promise<void>checkTasks() after fetching queue tasks (local or daemon).getTasks(agentIds), grouped by agent, each group is passed to the corresponding runTick(tasks).runDueTasks(): fetches queue tasks for this agent, then calls runTick(tasks).Why separate fetch and runTick: So the daemon can do one fetch and then dispatch to many runtimes without each runtime doing its own fetch. TaskService stays agnostic of who provided the task list.
runDueTasks(): Promise<void>runTick(tasks). Why: Single entry point for “run due tasks now” without starting a timer.Done:
getTasks takes required agentIds only; all adapters and call sites updated; empty agentIds returns [].runTick(tasks) extracted from checkTasks(); procedural daemon module with one timer and batched getTasks(agentIds).markDirty() notifies daemon; scheduler API exported from Node build.runtime.serverless), no timer when serverless; runDueTasks() for host-driven execution.Possible future work:
runTick are swallowed in the daemon’s catch. Adding a small logger or error callback would help operations.runDueTasks() when no task mutations happened, markDirty() could be wired to a flag read by runDueTasks(). Not required for correctness.| Topic | Decision | Why |
|---|---|---|
| Who runs the tick? | Local timer, daemon, or host (serverless) | Support single-process, multi-agent daemon, and serverless without a long-lived process. |
| getTasks filter | Required agentIds: UUID[] | Multi-tenant safety; daemon can batch one query for many agents. |
| runTick vs fetch | Caller fetches; runTick only runs | Daemon does one fetch, then dispatches to N runtimes. |
| runDueTasks() | One fetch + runTick for this agent | Serverless host can run due tasks on cron or per request. |
| markDirty in serverless | No-op for scheduling | No tick loop; next runDueTasks() will query anyway. |
For task metadata (dueAt, repeat, pause, backoff) and public API (executeTaskById, pauseTask, resumeTask, getTaskStatus), see the main task system docs and packages/typescript/README.md (§ Task system).
Plugins that use setInterval for recurring work (e.g. polling, cleanup, reminders) can be converted to queue tasks so that:
getTasks per tick; no per-plugin timers).getTaskStatus / status action; tasks can be paused or resumed.runDueTasks() or daemon batches getTasks(agentIds)).Pattern:
runtime.registerTaskWorker({ name: "PLUGIN_ACTION", execute, shouldRun? }).runtime.getTasksByName(taskName), filter by task.agentId === runtime.agentId (adapters may return tasks for all agents), and only runtime.createTask(...) if none exists. Store the task id for stop.tags: ["queue", "repeat"], metadata: { updateInterval: ms, baseInterval: ms, updatedAt: Date.now() }.setInterval; the scheduler tick runs the task when due.Reference: Full inventory of setInterval usages and which to convert is in the plan setInterval Inventory and Task-Conversion Strategy. Batcher pattern: packages/typescript/src/utils/prompt-batcher/batcher.ts (_ensureAffinityTask).