Global Task Framework

The Global Task Framework (GTF) provides a unified way to manage background tasks. It handles task execution, progress tracking, cancellation, and deduplication for both synchronous and asynchronous execution. The framework uses distributed locking internally to ensure race-free operations—you don't need to worry about concurrent task creation or cancellation conflicts.

Enabling GTF

GTF is disabled by default and must be enabled via the GLOBAL_TASK_FRAMEWORK feature flag in your superset_config.py:

python

FEATURE_FLAGS = {
    "GLOBAL_TASK_FRAMEWORK": True,
}

When GTF is disabled:

The Task List UI menu item is hidden
The /api/v1/task/* endpoints return 404
Calling or scheduling a @task-decorated function raises GlobalTaskFrameworkDisabledError

:::note Future Migration When GTF is considered stable, it will replace legacy Celery tasks for built-in features like thumbnails and alerts & reports. Enabling this flag prepares your deployment for that migration. :::

Quick Start

Define a Task

python

from superset_core.tasks.decorators import task, get_context

@task
def process_data(dataset_id: int) -> None:
    ctx = get_context()

    @ctx.on_cleanup
    def cleanup():
        logger.info("Processing complete")

    data = fetch_dataset(dataset_id)
    process_and_cache(data)

Execute a Task

python

# Async execution - schedules on Celery worker
task = process_data.schedule(dataset_id=123)
print(task.status)  # "pending"

# Sync execution - runs inline in current process
task = process_data(dataset_id=123)
# ... blocks until complete
print(task.status)  # "success"

Async vs Sync Execution

Method	When to Use
`.schedule()`	Long-running operations, background processing, when you need to return immediately
Direct call	Short operations, when deduplication matters, when you need the result before responding

Both execution modes provide the same task features: deduplication, progress tracking, cancellation, and visibility in the Task List UI. The difference is whether execution happens in a Celery worker (async) or inline (sync).

Task Lifecycle

PENDING ──→ IN_PROGRESS ────→ SUCCESS
   │             │
   │             ├──────────→ FAILURE
   │             ↓                ↑
   │         ABORTING ────────────┘
   │             │
   │             ├──────────→ TIMED_OUT (timeout)
   │             │
   └─────────────┴──────────→ ABORTED (user cancel)

Status	Description
`PENDING`	Queued, awaiting execution
`IN_PROGRESS`	Executing
`ABORTING`	Abort/timeout triggered, abort handlers running
`SUCCESS`	Completed successfully
`FAILURE`	Failed with error or abort/cleanup handler exception
`ABORTED`	Cancelled by user/admin
`TIMED_OUT`	Exceeded configured timeout

Context API

Access task context via get_context() from within any @task function. The context provides methods for updating task metadata and registering handlers.

Updating Task Metadata

Use update_task() to report progress and store custom payload data:

python

@task
def my_task(items: list[int]) -> None:
    ctx = get_context()

    for i, item in enumerate(items):
        result = process(item)
        ctx.update_task(
            progress=(i + 1, len(items)),
            payload={"last_result": result}
        )

:::tip Call update_task() once per iteration for best performance. Frequent DB writes are throttled to limit metastore load, so batching progress and payload updates together in a single call ensures both are persisted at the same time. :::

Progress Formats

The progress parameter accepts three formats:

Format	Example	Display
`tuple[int, int]`	`progress=(3, 100)`	3 of 100 (3%) with ETA
`float` (0.0-1.0)	`progress=0.5`	50% with ETA
`int`	`progress=42`	42 processed

:::tip Use the tuple format (current, total) whenever possible. It provides the richest information to users: showing both the count and percentage, while still computing ETA automatically. :::

Payload

The payload parameter stores custom metadata that can help users understand what the task is doing. Each call to update_task() replaces the previous payload completely.

In the Task List UI, when a payload is defined, an info icon appears in the Details column. Users can hover over it to see the JSON content.

Handlers

Handler	When it runs	Use case
`on_cleanup`	Always (success, failure, abort)	Release resources, close connections
`on_abort`	When task is aborted	Set stop flag, cancel external operations

python

@task
def my_task() -> None:
    ctx = get_context()

    @ctx.on_cleanup
    def cleanup():
        logger.info("Task ended, cleaning up")

    @ctx.on_abort
    def handle_abort():
        logger.info("Abort requested")

    # ... task logic

Multiple handlers of the same type execute in LIFO order (last registered runs first). Abort handlers run first when abort is detected, then cleanup handlers run when the task ends.

Best-Effort Execution

All registered handlers will always be attempted, even if one fails. This ensures that a failure in one handler doesn't prevent other handlers from running their cleanup logic.

For example, if you have three cleanup handlers and the second one throws an exception:

Handler 3 runs ✓
Handler 2 throws an exception ✗ (logged, but execution continues)
Handler 1 runs ✓

If any handler fails, the task is marked as FAILURE with combined error details showing all handler failures.

:::tip Write handlers to be independent and self-contained. Don't assume previous handlers succeeded, and don't rely on shared state between handlers. :::

Making Tasks Abortable

When users click Cancel in the Task List, the system decides whether to abort (stop) the task or unsubscribe (remove the user from a shared task). Abort occurs when:

It's a private or system task
It's a shared task and the user is the last subscriber
An admin checks Force abort to stop the task for all subscribers

Pending tasks can always be aborted: they simply won't start. In-progress tasks require an abort handler to be abortable:

python

@task
def abortable_task(items: list[str]) -> None:
    ctx = get_context()
    should_stop = False

    @ctx.on_abort
    def handle_abort():
        nonlocal should_stop
        should_stop = True
        logger.info("Abort signal received")

    @ctx.on_cleanup
    def cleanup():
        logger.info("Task ended, cleaning up")

    for item in items:
        if should_stop:
            return  # Exit gracefully
        process(item)

Key points:

Registering on_abort marks the task as abortable and starts the abort listener
The abort handler fires automatically when abort is triggered
Use a flag pattern to gracefully stop processing at safe points
Without an abort handler, in-progress tasks cannot be aborted: the Cancel button in the Task List UI will be disabled

The framework automatically skips execution if a task was aborted while pending: no manual check needed at task start.

:::tip Always implement an abort handler for long-running tasks. This allows users to cancel unneeded tasks and free up worker capacity for other operations. :::

Timeouts

Set a timeout to automatically abort tasks that run too long:

python

from superset_core.tasks.decorators import task, get_context
from superset_core.tasks.types import TaskOptions

# Set default timeout in decorator
@task(timeout=300)  # 5 minutes
def process_data(dataset_id: int) -> None:
    ctx = get_context()
    should_stop = False

    @ctx.on_abort
    def handle_abort():
        nonlocal should_stop
        should_stop = True

    for chunk in fetch_large_dataset(dataset_id):
        if should_stop:
            return
        process(chunk)

# Override timeout at call time
task = process_data.schedule(
    dataset_id=123,
    options=TaskOptions(timeout=600)  # Override to 10 minutes
)

How Timeouts Work

The timeout timer starts when the task begins executing (status changes to IN_PROGRESS). When the timeout expires:

With an abort handler registered: The task transitions to ABORTING, abort handlers run, then cleanup handlers run. The final status depends on handler execution:
- If handlers complete successfully → TIMED_OUT status
- If handlers throw an exception → FAILURE status
Without an abort handler: The framework cannot forcibly terminate the task. A warning is logged, and the task continues running. The Task List UI shows a warning indicator (⚠️) in the Details column to alert users that the timeout cannot be enforced.

Timeout Precedence

Source	Priority	Example
`TaskOptions.timeout`	Highest	`options=TaskOptions(timeout=600)`
`@task(timeout=...)`	Default	`@task(timeout=300)`
Not set	No timeout	Task runs indefinitely

Call-time options always override decorator defaults, allowing tasks to have sensible defaults while permitting callers to extend or shorten the timeout for specific use cases.

:::warning Timeouts require an abort handler to be effective. Without one, the timeout triggers only a warning and the task continues running. Always implement an abort handler when using timeouts. :::

Deduplication

Use task_key to prevent duplicate task execution:

python

from superset_core.tasks.types import TaskOptions

# Without key - creates new task each time (random UUID)
task1 = my_task.schedule(x=1)
task2 = my_task.schedule(x=1)  # Different task

# With key - joins existing task if active
task1 = my_task.schedule(x=1, options=TaskOptions(task_key="report_123"))
task2 = my_task.schedule(x=1, options=TaskOptions(task_key="report_123"))  # Returns same task

When a task with matching key already exists, the user is added as a subscriber and the existing task is returned. This behavior is consistent across all scopes—private tasks naturally have only one subscriber since their deduplication key includes the user ID.

Deduplication only applies to active tasks (pending/in-progress). Once a task completes, a new task with the same key can be created.

Sync Join-and-Wait

When a sync call joins an existing task, it blocks until the task completes:

python

# Schedule async task
task = my_task.schedule(options=TaskOptions(task_key="report_123"))

# Later sync call with same key blocks until completion of the active task
task2 = my_task(options=TaskOptions(task_key="report_123"))
assert task.uuid == task2.uuid  # True
print(task2.status)  # "success" (terminal status)

Task Scopes

python

from superset_core.tasks.decorators import task
from superset_core.tasks.types import TaskScope

@task  # Private by default
def private_task(): ...

@task(scope=TaskScope.SHARED)  # Multiple users can subscribe
def shared_task(): ...

@task(scope=TaskScope.SYSTEM)  # Admin-only visibility
def system_task(): ...

Scope	Visibility	Cancel Behavior
`PRIVATE`	Creator only	Cancels immediately
`SHARED`	All subscribers	Last subscriber cancels; others unsubscribe
`SYSTEM`	Admins only	Admin cancels

Task Cleanup

Completed tasks accumulate in the database over time. Configure a scheduled prune job to automatically remove old tasks:

python

# In your superset_config.py, add to your Celery beat schedule:
CELERY_CONFIG.beat_schedule["prune_tasks"] = {
    "task": "prune_tasks",
    "schedule": crontab(minute=0, hour=0),  # Run daily at midnight
    "kwargs": {
        "retention_period_days": 90,   # Keep tasks for 90 days
        "max_rows_per_run": 10000,     # Limit deletions per run
    },
}

The prune job only removes tasks in terminal states (SUCCESS, FAILURE, ABORTED, TIMED_OUT). Active tasks (PENDING, IN_PROGRESS, ABORTING) are never pruned.

See superset/config.py for a complete example configuration.

:::tip Distributed Coordination for Faster Notifications By default, abort detection and sync join-and-wait use database polling. Configure DISTRIBUTED_COORDINATION_CONFIG to enable Redis pub/sub for real-time notifications. See Distributed Coordination Backend for configuration details. :::

API Reference

@task Decorator

python

@task(
    name: str | None = None,
    scope: TaskScope = TaskScope.PRIVATE,
    timeout: int | None = None
)

name: Task identifier (defaults to function name)
scope: PRIVATE, SHARED, or SYSTEM
timeout: Default timeout in seconds (can be overridden via TaskOptions)

TaskContext Methods

Method	Description
`update_task(progress, payload)`	Update progress and/or custom payload
`on_cleanup(handler)`	Register cleanup handler
`on_abort(handler)`	Register abort handler (makes task abortable)

TaskOptions

python

TaskOptions(
    task_key: str | None = None,
    task_name: str | None = None,
    timeout: int | None = None
)

task_key: Deduplication key (also used as display name if task_name is not set)
task_name: Human-readable display name for the Task List UI
timeout: Timeout in seconds (overrides decorator default)

:::tip Provide a descriptive task_name for better readability in the Task List UI. While task_key is used for deduplication and may be technical (e.g., chart_export_123), task_name can be user-friendly (e.g., "Export Sales Chart 123"). :::

Error Handling

Let exceptions propagate: the framework captures them automatically and sets task status to FAILURE:

python

@task
def risky_task() -> None:
    # No try/catch needed - framework handles it
    result = operation_that_might_fail()

On failure, the framework records:

error_message: Exception message
exception_type: Exception class name
stack_trace: Full traceback (visible when SHOW_STACKTRACE=True)

In the Task List UI, failed tasks show error details when hovering over the status. When stack traces are enabled, a separate bug icon appears in the Details column for viewing the full traceback.

Cleanup handlers still run after an exception, so resources can be properly released as necessary.

:::tip Use descriptive exception messages. In environments where stack traces are hidden (SHOW_STACKTRACE=False), users see only the error message and exception type when hovering over failed tasks. Clear messages help users troubleshoot issues without administrator assistance. :::