Back to Airflow

Metadata Database Updates

contributing-docs/14_metadata_database_updates.rst

3.2.07.0 KB
Original Source

.. Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

.. http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Metadata Database Updates

When developing features, you may need to persist information to the metadata database. Airflow has Alembic <https://github.com/sqlalchemy/alembic>__ built-in module to handle all schema changes. Alembic must be installed on your development machine before continuing with migration. If you had made changes to the ORM, you will need to generate a new migration file. This file will contain the changes to the database schema that you have made. To generate a new migration file, run the following:

.. code-block:: bash

# starting at the root of the project
# Use breeze:
$ breeze generate-migration-file -m "add new field to db"
# Or, go to the airflow directory and use alembic directly:
$ breeze --backend postgres
$ cd airflow-core/src/airflow
$ alembic revision -m "add new field to db" --autogenerate

   Generating
~/airflow-core/src/airflow/migrations/versions/a1e23c41f123_add_new_field_to_db.py

Note that migration file names are standardized by prek hook update-migration-references, so that they sort alphabetically and indicate the Airflow version in which they first appear (the alembic revision ID is removed). As a result you should expect to see a prek failure on the first attempt. Just stage the modified file and commit again (or run the hook manually before committing).

After your new migration file is run through prek hook it will look like this:

.. code-block::

1234_A_B_C_add_new_field_to_db.py

This represents that your migration is the 1234th migration and expected for release in Airflow version A.B.C.

.. warning::

In rare cases, you may need to manually modify the migration logic of your auto-generated migration script. If you must make manual changes to your migration script, you must ensure you're not referencing any ORM classes within your migration script. Directly referring to an ORM class definition within a migration script can lead to unexpected and / or broken downgrade pathways in the future, as described here <https://github.com/apache/airflow/issues/59871>_.

How to Resolve Conflicts When Rebasing

When rebasing your branch onto the latest main, you may encounter conflicts in certain files. This often happens when another PR updates the Metadata Database and is merged before yours.

The affected files may include:

  • docs/apache-airflow/migrations-ref.rst

  • airflow/migrations/versions/1234_A_B_C_<your_migration_name>.py

    There should be another file, 1234_A_B_C_<other_migration_name>.py, with the same 1234_A_B_C prefix.

To resolve these conflicts:

  1. First, resolve all conflicts except those in the files listed above. This includes conflicts in other .py files within the airflow/ or tests/ directories.
  2. Then, run the following command to automatically update the affected files:

.. code-block:: bash

prek update-migration-references --all-files

3. Add the updated files to the staging area and continue with the rebase.

.. note::

The ERD diagram (``airflow_erd.svg``) is no longer committed to the repository. It is
automatically generated during the documentation build by the ``generate_erd`` Sphinx extension.

Running migration CI tests locally

The various CI migration tests are defined in .github/actions/migration_tests/action.yml. These tests ensure the database upgrades and downgrades are still functional from the lowest supported source migration version, to the latest version, and back down to the former. To run any of those CI tests on your machine, you can:

  1. Copy the relevant command (specified by the run key for the relevant CI job), and replace the environment variable references with their literal values defined in the sibling env section.
  2. Run the command you created from step 1, troubleshooting errors as needed.

How to hook your application into Airflow's migration process

Airflow 3.0.0 introduces a new feature that allows you to hook your application into Airflow's migration process. This feature is useful if you have a custom database schema that you want to migrate along with Airflow's schema. This guide will show you how to hook your application into Airflow's migration process.

Subclass the BaseDBManager

To hook your application into Airflow's migration process, you need to subclass the ``BaseDBManager`` class from the
``airflow.utils.db_manager`` module. This class provides methods for running Alembic migrations.

Create Alembic migration scripts for your application

At the root of your application, run "alembic init migrations" to create a new migrations directory. Set the version_table variable in the env.py file to the name of the table that stores the migration history. Specify this version_table in the version_table argument of the alembic's context.configure method of the run_migration_online and run_migration_offline functions. This will ensure that your application's migrations are stored in a separate table from Airflow's migrations.

Next, define an include_object function in the env.py that ensures that only your application's metadata is included in the application's migrations. This too should be specified in the context.configure method of the run_migration_online and run_migration_offline.

Next, set the config_file not to disable existing loggers:

.. code-block:: python

if config.config_file_name is not None:
    fileConfig(config.config_file_name, disable_existing_loggers=False)

Replace the content of your application's alembic.ini file with Airflow's alembic.ini copy.

If the above is not clear, you might want to look at the FAB implementation of this migration.

After setting up those, and you want Airflow to run the migration for you when running airflow db migrate then you need to add your DBManager to the [core] external_db_managers configuration.


You can also learn how to setup your Node environment <15_node_environment_setup.rst>__ if you want to develop Airflow UI.