docs/en/connectors/source/OceanBase.md
import ChangeLog from '../changelog/connector-jdbc.md';
JDBC OceanBase Source Connector
Spark
Flink
SeaTunnel Zeta
Read external data source data through JDBC.
| Datasource | Supported versions | Driver | Url | Maven |
|---|---|---|---|---|
| OceanBase | All OceanBase server versions. | com.oceanbase.jdbc.Driver | jdbc:oceanbase://localhost:2883/test | Download |
Please download the support list corresponding to 'Maven' and copy it to the '$SEATUNNEL_HOME/plugins/jdbc/lib/' working directory
For example: cp oceanbase-client-xxx.jar $SEATUNNEL_HOME/plugins/jdbc/lib/
| Mysql Data type | SeaTunnel Data type |
|---|---|
| BIT(1) | |
| TINYINT(1) | BOOLEAN |
| TINYINT | BYTE |
| TINYINT | |
| TINYINT UNSIGNED | SMALLINT |
| SMALLINT UNSIGNED | |
| MEDIUMINT | |
| MEDIUMINT UNSIGNED | |
| INT | |
| INTEGER | |
| YEAR | INT |
| INT UNSIGNED | |
| INTEGER UNSIGNED | |
| BIGINT | BIGINT |
| BIGINT UNSIGNED | DECIMAL(20,0) |
| DECIMAL(x,y)(Get the designated column's specified column size.<38) | DECIMAL(x,y) |
| DECIMAL(x,y)(Get the designated column's specified column size.>38) | DECIMAL(38,18) |
| DECIMAL UNSIGNED | DECIMAL((Get the designated column's specified column size)+1, |
| (Gets the designated column's number of digits to right of the decimal point.))) | |
| FLOAT | |
| FLOAT UNSIGNED | FLOAT |
| DOUBLE | |
| DOUBLE UNSIGNED | DOUBLE |
| CHAR | |
| VARCHAR | |
| TINYTEXT | |
| MEDIUMTEXT | |
| TEXT | |
| LONGTEXT | |
| JSON | |
| ENUM | STRING |
| DATE | DATE |
| TIME | TIME |
| DATETIME | |
| TIMESTAMP | TIMESTAMP |
| TINYBLOB | |
| MEDIUMBLOB | |
| BLOB | |
| LONGBLOB | |
| BINARY | |
| VARBINAR | |
| BIT(n) | |
| GEOMETRY | BYTES |
| Oracle Data type | SeaTunnel Data type |
|---|---|
| Integer | DECIMAL(38,0) |
| Number(p), p <= 9 | INT |
| Number(p), p <= 18 | BIGINT |
| Number(p), p > 18 | DECIMAL(38,18) |
| Number(p,s) | DECIMAL(p,s) |
| Float | DECIMAL(38,18) |
| REAL | |
| BINARY_FLOAT | FLOAT |
| BINARY_DOUBLE | DOUBLE |
| CHAR | |
| NCHAR | |
| VARCHAR | |
| VARCHAR2 | |
| NVARCHAR2 | |
| NCLOB | |
| CLOB | |
| LONG | |
| XML | |
| ROWID | STRING |
| DATE | TIMESTAMP |
| TIMESTAMP | |
| TIMESTAMP WITH LOCAL TIME ZONE | TIMESTAMP |
| BLOB | |
| RAW | |
| LONG RAW | |
| BFILE | BYTES |
| UNKNOWN | Not supported yet |
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| url | String | Yes | - | The URL of the JDBC connection. Refer to a case: jdbc:oceanbase://localhost:2883/test |
| driver | String | Yes | - | The jdbc class name used to connect to the remote data source, should be com.oceanbase.jdbc.Driver. |
| username | String | No | - | Connection instance user name |
| password | String | No | - | Connection instance password |
| compatible_mode | String | Yes | - | The compatible mode of OceanBase, can be 'mysql' or 'oracle'. |
| query | String | Yes | - | Query statement |
| connection_check_timeout_sec | Int | No | 30 | The time in seconds to wait for the database operation used to validate the connection to complete |
| partition_column | String | No | - | The column name for parallelism's partition, only support numeric type column and string type column. |
| partition_lower_bound | BigDecimal | No | - | The partition_column min value for scan, if not set SeaTunnel will query database get min value. |
| partition_upper_bound | BigDecimal | No | - | The partition_column max value for scan, if not set SeaTunnel will query database get max value. |
| partition_num | Int | No | job parallelism | The number of partition count, only support positive integer. Default value is job parallelism. |
| fetch_size | Int | No | 0 | For queries that return a large number of objects, you can configure |
| the row fetch size used in the query to improve performance by | ||||
| reducing the number database hits required to satisfy the selection criteria. | ||||
| Zero means use jdbc default value. | ||||
| properties | Map | No | - | Additional connection configuration parameters,when properties and URL have the same parameters, the priority is determined by the |
| specific implementation of the driver. For example, in MySQL, properties take precedence over the URL. | ||||
| common-options | No | - | Source plugin common parameters, please refer to Source Common Options for details |
If partition_column is not set, it will run in single concurrency, and if partition_column is set, it will be executed in parallel according to the concurrency of tasks.
env {
parallelism = 2
job.mode = "BATCH"
}
source {
Jdbc {
driver = "com.oceanbase.jdbc.Driver"
url = "jdbc:oceanbase://localhost:2883/test?useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
username = "root"
password = ""
compatible_mode = "mysql"
query = "select * from source"
}
}
transform {
# If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
# please go to https://seatunnel.apache.org/docs/transform/sql
}
sink {
Console {}
}
Read your query table in parallel with the shard field you configured and the shard data. You can do this if you want to read the whole table
env {
parallelism = 10
job.mode = "BATCH"
}
source {
Jdbc {
driver = "com.oceanbase.jdbc.Driver"
url = "jdbc:oceanbase://localhost:2883/test?useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
username = "root"
password = ""
compatible_mode = "mysql"
query = "select * from source"
# Parallel sharding reads fields
partition_column = "id"
# Number of fragments
partition_num = 10
}
}
sink {
Console {}
}
It is more efficient to read your data source according to the upper and lower boundaries you configured
source {
Jdbc {
driver = "com.oceanbase.jdbc.Driver"
url = "jdbc:oceanbase://localhost:2883/test?useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
username = "root"
password = ""
compatible_mode = "mysql"
query = "select * from source"
partition_column = "id"
partition_num = 10
# Read start boundary
partition_lower_bound = 1
# Read end boundary
partition_upper_bound = 500
}
}