providers/edge3/docs/edge_executor.rst
.. Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
EdgeExecutor is an option if you want to distribute tasks to workers distributed in different locations.
You can use it also in parallel with other executors if needed. Change your airflow.cfg to point
the executor parameter to EdgeExecutor and provide the related settings. The EdgeExecutor is the component
to schedule tasks to the edge workers. The edge workers need to be set-up separately as described in :doc:deployment.
The configuration parameters of the Edge Executor can be found in the Edge provider's :doc:configurations-ref.
To understand the setup of the Edge Executor, please also take a look to :doc:architecture.
See more details Airflow documentation
:ref:apache-airflow:using-multiple-executors-concurrently.
.. _edge_executor:queue:
When using the EdgeExecutor, the workers that tasks are sent to
can be specified. queue is an attribute of BaseOperator, so any
task can be assigned to any queue. The default queue for the environment
is defined in the airflow.cfg's operators -> default_queue. This defines
the queue that tasks get assigned to when not specified, as well as which
queue Airflow workers listen to when started.
Workers can listen to one or multiple queues of tasks. When a worker is
started (using command airflow edge worker), a set of comma-delimited queue
names (with no whitespace) can be given (e.g. airflow edge worker -q remote,wisconsin_site).
This worker will then only pick up tasks wired to the specified queue(s).
If the queue attribute is not given then a worker will pick tasks from all queues.
This can be useful if you need specialized workers, either from a resource perspective (for say very lightweight tasks where one worker could take thousands of tasks without a problem), or from an environment perspective (you want a worker running from a specific location where required infrastructure is available).
When using EdgeExecutor in addition to other executors and EdgeExecutor not being the default executor
(that is to say the first one in the list of executors), be reminded to also define EdgeExecutor
as the executor at task or Dag level in addition to the queues you are targeting.
For more details on multiple executors please see :ref:apache-airflow:using-multiple-executors-concurrently.
.. _edge_executor:concurrency_slots:
Some tasks may need more resources than other tasks, to handle these use case the Edge worker supports
concurrency slot handling. The logic behind this is the same as the pool slot feature
see :doc:apache-airflow:administration-and-deployment/pools.
Edge worker reuses pool_slots of task_instance to keep number if task instance parameter as low as possible.
The pool_slots value works together with the worker_concurrency value which is defined during start of worker.
If a task needs more resources, the pool_slots value can be increased to reduce number of tasks running in parallel.
The value can be used to block other tasks from being executed in parallel on the same worker.
A pool_slots of 2 and a worker_concurrency of 3 means
that a worker which executes this task can only execute a job with a pool_slots of 1 in parallel.
If no pool_slots is defined for a task the default value is 1. The pool_slots value only supports
integer values.
Here is an example setting pool_slots for a task:
.. code-block:: python
import os
import pendulum
from airflow import DAG
from airflow.decorators import task
from airflow.example_dags.libs.helper import print_stuff
from airflow.settings import AIRFLOW_HOME
with DAG(
dag_id="example_edge_pool_slots",
schedule=None,
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
catchup=False,
tags=["example"],
) as dag:
@task(executor="EdgeExecutor", pool_slots=2)
def task_with_template():
print_stuff()
task_with_template()
Some known limitations