content/integrate/redis-data-integration/data-pipelines/_index.md
RDI uses pipelines to implement change data capture (CDC). (See the [architecture overview]({{< relref "/integrate/redis-data-integration/architecture#overview" >}}) for an introduction to pipelines.) The sections below explain how pipelines work and give an overview of how to configure and deploy them.
An RDI pipeline captures change data records from the source database, and transforms them into Redis data structures. It writes each of these new structures to a Redis target database under its own key.
By default, RDI transforms the source data into [hashes]({{< relref "/develop/data-types/hashes" >}}) or [JSON objects]({{< relref "/develop/data-types/json" >}}) for the target with a standard data mapping and a standard format for the key. However, you can also provide your own custom transformation jobs for each source table, using your own data mapping and key pattern. You specify these jobs declaratively with YAML configuration files that require no coding.
Data transformation involves two stages:
The diagram below shows the flow of data through the pipeline:
{{< image filename="/images/rdi/ingest/RDIPipeDataflow.webp" >}}
You can provide a job file for each source table that needs a custom transformation. You can also add a default job file for any tables that don't have their own. You must specify the full name of the source table in the job file (or the special name "*" in the default job) and you can also include filtering logic to skip data that matches a particular condition. As part of the transformation, you can specify any of the following data types to store the data in Redis:
After you deploy a pipeline, it goes through the following phases:
Follow the steps described in the sections below to prepare and run an RDI pipeline.
Before using the pipeline you must first prepare your source database to use the Debezium connector for change data capture (CDC). See the [architecture overview]({{< relref "/integrate/redis-data-integration/architecture#overview" >}}) for more information about CDC. Each database type has a different set of preparation steps. You can find the preparation guides for the databases that RDI supports in the [Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}}) section.
RDI uses a set of YAML files to configure each pipeline. The folder structure of the configuration is shown below:
"(root)":
"config.yaml":
_meta:
description: "\"config.yaml\" is the main pipeline configuration file."
"jobs":
_meta:
description: "The 'jobs' folder containing optional job files."
"default-job.yaml":
_meta:
description: "A default job."
"job1.yaml":
_meta:
description: "Each job file must have a unique name."
"...":
_meta:
ellipsis: true
description: "Other job files, if required."
The main configuration for the pipeline is in the config.yaml file.
This specifies the connection details for the source database (such
as host, username, and password) and also the queries that RDI will use
to extract the required data. You should place job files in the Jobs
folder if you want to specify your own data transformations.
See
[Pipeline configuration file]({{< relref "/integrate/redis-data-integration/data-pipelines/pipeline-config" >}})
for a full description of the config.yaml file and some example configurations.
You can use one or more job files to configure which fields from the source tables you want to use, and which data structure you want to write to the target. You can also optionally specify a transformation to apply to the data before writing it to the target. See the [Job files]({{< relref "/integrate/redis-data-integration/data-pipelines/transform-examples" >}}) section for full details of the file format and examples of common tasks for job files.
When your configuration is ready, you must deploy it to start using the pipeline. See [Deploy a pipeline]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy" >}}) to learn how to do this.
See the other pages in this section for more information and examples: