Back to Seatunnel

Redshift

docs/en/connectors/source/Redshift.md

2.3.137.2 KB
Original Source

import ChangeLog from '../changelog/connector-jdbc.md';

Redshift

JDBC Redshift Source Connector

Description

Read external data source data through JDBC.

Support those engines

Spark

Flink

Seatunnel Zeta

  1. You need to ensure that the jdbc driver jar package has been placed in directory ${SEATUNNEL_HOME}/plugins/.

For SeaTunnel Zeta Engine

  1. You need to ensure that the jdbc driver jar package has been placed in directory ${SEATUNNEL_HOME}/lib/.

Key features

supports query SQL and can achieve projection effect.

Supported DataSource list

datasourcesupported versionsdriverurlmaven
redshiftDifferent dependency version has different driver class.com.amazon.redshift.jdbc.Driverjdbc:redshift://localhost:5439/databaseDownload

Database dependency

Please download the support list corresponding to 'Maven' and copy it to the '$SEATUNNEL_HOME/plugins/jdbc/lib/' working directory

For example Redshift datasource: cp RedshiftJDBC42-xxx.jar $SEATUNNEL_HOME/plugins/jdbc/lib/

Data Type Mapping

Redshift Data typeSeatunnel Data type
SMALLINT
INT2SHORT
INTEGER
INT
INT4INT
BIGINT
INT8
OIDLONG
DECIMAL
NUMERICDECIMAL((Get the designated column's specified column size)+1,
(Gets the designated column's number of digits to right of the decimal point.)))
REAL
FLOAT4FLOAT
DOUBLE_PRECISION
FLOAT8
FLOATDOUBLE
BOOLEAN
BOOLBOOLEAN
CHAR
CHARACTER
NCHAR
BPCHAR
VARCHAR
CHARACTER_VARYING
NVARCHAR
TEXT
SUPERSTRING
VARBYTE
BINARY_VARYINGBYTES
TIME
TIME_WITH_TIME_ZONE
TIMETZLOCALTIME
TIMESTAMP
TIMESTAMP_WITH_OUT_TIME_ZONE
TIMESTAMPTZLOCALDATETIME

Example

Simple

This example queries type_bin 'table' 16 data in your test "database" in single parallel and queries all of its fields. You can also specify which fields to query for final output to the console.

env {
  parallelism = 2
  job.mode = "BATCH"
}
source{
    Jdbc {
        url = "jdbc:redshift://localhost:5439/dev"
        driver = "com.amazon.redshift.jdbc.Driver"
        username = "root"
        password = "123456"
        
        table_path = "public.table2"
        # Use query filetr rows & columns
        query = "select id, name from public.table2 where id > 100"
        
        #split.size = 8096
        #split.even-distribution.factor.upper-bound = 100
        #split.even-distribution.factor.lower-bound = 0.05
        #split.sample-sharding.threshold = 1000
        #split.inverse-sampling.rate = 1000
    }
}

sink {
    Console {}
}

Multiple table read

Configuring table_list will turn on auto split, you can configure split.* to adjust the split strategy

hocon
env {
  job.mode = "BATCH"
  parallelism = 2
}
source {
  Jdbc {
    url = "jdbc:redshift://localhost:5439/dev"
    driver = "com.amazon.redshift.jdbc.Driver"
    username = "root"
    password = "123456"

    table_list = [
      {
        table_path = "public.table1"
      },
      {
        table_path = "public.table2"
        # Use query filetr rows & columns
        query = "select id, name from public.table2 where id > 100"
      }
    ]
    #split.size = 8096
    #split.even-distribution.factor.upper-bound = 100
    #split.even-distribution.factor.lower-bound = 0.05
    #split.sample-sharding.threshold = 1000
    #split.inverse-sampling.rate = 1000
  }
}

sink {
  Console {}
}

Changelog

<ChangeLog />