providers/informatica/docs/guides/usage.rst
.. Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
The Informatica provider enables automatic lineage tracking for Airflow tasks that define inlets and outlets.
The Informatica plugin automatically detects tasks with lineage support and sends inlet/outlet information to Informatica EDC when tasks succeed. No additional configuration is required beyond defining inlets and outlets in your tasks.
The provider consists of several key components:
Hooks
InformaticaEDCHook provides low-level EDC API access for authentication, object retrieval, and lineage creation.
Extractors
InformaticaLineageExtractor handles lineage data extraction and conversion to Airflow-compatible formats.
Plugins
InformaticaProviderPlugin registers listeners that monitor task lifecycle events and trigger lineage operations.
Listeners Event-driven listeners that respond to task success/failure events and process lineage information.
Install the provider:
.. code-block:: bash
pip install apache-airflow-providers-informatica
Configure connection:
Create an HTTP connection in Airflow UI with EDC server details and security domain in extras.
Add lineage to tasks:
Define inlets and outlets in your tasks using EDC object URIs.
Run your DAG:
The provider automatically handles lineage extraction when tasks succeed.
.. code-block:: python
from airflow import DAG from airflow.providers.standard.operators.python import PythonOperator from datetime import datetime
def my_python_task(**kwargs): print("Hello Informatica Lineage!")
with DAG( dag_id="example_informatica_lineage_dag", start_date=datetime(2024, 1, 1), schedule=None, catchup=False, ) as dag: python_task = PythonOperator( task_id="my_python_task", python_callable=my_python_task, inlets=[{"dataset_uri": "edc://object/source_table_abc123"}], outlets=[{"dataset_uri": "edc://object/target_table_xyz789"}], )
When this task succeeds, the provider automatically creates a lineage link between the source and target objects in EDC.
InformaticaEDCHook ^^^^^^^^^^^^^^^^^^
The hook provides low-level access to Informatica EDC API.
.. code-block:: python
from airflow.providers.informatica.hooks.edc import InformaticaEDCHook
hook = InformaticaEDCHook(informatica_edc_conn_id="my_connection") object_data = hook.get_object("edc://object/table_123") result = hook.create_lineage_link("source_id", "target_id")
The InformaticaProviderPlugin automatically registers listeners that:
No manual intervention is required. The plugin works transparently with any task that defines inlets and outlets.
Inlets and outlets can be defined as:
"edc://object/table_name"{"dataset_uri": "edc://object/table_name"}The plugin automatically handles both formats and resolves them to EDC object IDs.
Licensed under the Apache License, Version 2.0. See LICENSE file for details.