docs/en/07-develop/15-high.md
import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem";
To help users easily build data ingestion pipelines with million-level throughput, the TDengine connector provides a high-performance write feature. When this feature is enabled, the TDengine connector automatically creates write threads and dedicated queues, caches data sharded by sub-tables, and sends data in batches when the data volume threshold is reached or a timeout condition occurs. This approach reduces network requests and increases throughput, allowing users to achieve high-performance writes without needing to master multithreaded programming knowledge or data sharding techniques.
The following introduces the usage methods of the efficient writing feature for each connector:
<Tabs defaultValue="java" groupId="lang"> <TabItem label="Java" value="java">Starting from version 3.6.0, the JDBC driver provides an efficient writing feature over WebSocket connections. The JDBC driver's efficient writing feature has the following characteristics:
executeUpdate interface to obtain the number of written data records. If there is an exception during writing, it can be caught at this time.The following details its usage method. This section assumes that the user is already familiar with the JDBC standard parameter binding interface (refer to Parameter Binding for reference).
For the JDBC connector, there are two ways to enable the efficient writing feature:
PROPERTY_KEY_ASYNC_WRITE to stmt in the connection properties or adding asyncWrite = stmt to the JDBC URL can enable efficient writing on this connection. After enabling the efficient writing feature on the connection, all subsequent PreparedStatement objects created will use the efficient writing mode.PreparedStatement, using ASYNC_INSERT INTO instead of INSERT INTO in the SQL statement can enable efficient writing for this parameter - bound object.The client application uses the addBatch method of the JDBC standard interface to add a record and executeBatch to submit all added records. In the efficient writing mode, the executeUpdate method can be used to synchronously obtain the number of successfully written records. If there is a data writing failure, calling executeUpdate will catch an exception at this time.
TSDBDriver.PROPERTY_KEY_BACKEND_WRITE_THREAD_NUM: The number of background writing threads in the efficient writing mode. It only takes effect when using a WebSocket connection. The default value is 10.
TSDBDriver.PROPERTY_KEY_BATCH_SIZE_BY_ROW: The batch size of the written data in the efficient writing mode, with the unit of rows. It only takes effect when using a WebSocket connection. The default value is 1000.
TSDBDriver.PROPERTY_KEY_CACHE_SIZE_BY_ROW: The size of the cache in the efficient writing mode, with the unit of rows. It only takes effect when using a WebSocket connection. The default value is 10000.
TSDBDriver.PROPERTY_KEY_ENABLE_AUTO_RECONNECT: Whether to enable automatic reconnection. It only takes effect when using a WebSocket connection. true means enable, false means disable. The default is false. It is recommended to enable it in the efficient writing mode.
TSDBDriver.PROPERTY_KEY_RECONNECT_INTERVAL_MS: The retry interval for automatic reconnection, with the unit of milliseconds. The default value is 2000. It only takes effect when PROPERTY_KEY_ENABLE_AUTO_RECONNECT is true.
TSDBDriver.PROPERTY_KEY_RECONNECT_RETRY_COUNT: The number of retry attempts for automatic reconnection. The default value is 3. It only takes effect when PROPERTY_KEY_ENABLE_AUTO_RECONNECT is true.
For other configuration parameters, please refer to Efficient Writing Configuration.
The following is a simple example of using JDBC efficient writing, which illustrates the relevant configurations and interfaces for efficient writing.
<details> <summary>Sample of Using JDBC Efficient Writing</summary> ```java {{#include docs/examples/JDBC/JDBCDemo/src/main/java/com/taos/example/WSHighVolumeDemo.java:efficient_writing}} ``` </details> </TabItem> </Tabs>The following sample program demonstrates how to efficiently write data, with the scenario designed as follows:
This section provides sample code for the above scenario. The principle of efficient writing is the same for other scenarios, but the code needs to be modified accordingly.
This sample code assumes that the source data belongs to different subtables of the same supertable (meters). The program has already created this supertable in the test database before starting to write data. For subtables, they will be automatically created by the application according to the received data. If the actual scenario involves multiple supertables, only the code for automatic table creation in the write task needs to be modified.
<Tabs defaultValue="java" groupId="lang"> <TabItem label="Java" value="java">Program Listing:
| Class Name | Functional Description |
|---|---|
| FastWriteExample | The main program responsible for command-line argument parsing, thread pool creation, and waiting for task completion. |
| WorkTask | Reads data from a simulated source and writes it using the JDBC standard interface. |
| MockDataSource | Simulates and generates data for a certain number of meters child tables. |
| DataBaseMonitor | Tracks write speed and prints the current write speed to the console every 10 seconds. |
| CreateSubTableTask | Creates child tables within a specified range for invocation by the main program. |
| Meters | Provides serialization and deserialization of single records in the meters table, used for sending messages to Kafka and receiving messages from Kafka. |
| ProducerTask | A producer that sends messages to Kafka. |
| ConsumerTask | A consumer that receives messages from Kafka, writes data to TDengine using the JDBC efficient writing interface, and commits offsets according to progress. |
| Util | Provides basic functionalities, including creating connections, creating Kafka topics, and counting write operations. |
Below are the complete codes and more detailed function descriptions for each class.
<details> <summary>FastWriteExample</summary>Introduction to Main Program Command-Line Arguments:
-b,--batchSizeByRow <arg> Specifies the `batchSizeByRow` parameter for Efficient Writing, default is 1000
-c,--cacheSizeByRow <arg> Specifies the `cacheSizeByRow` parameter for Efficient Writing, default is 10000
-d,--dbName <arg> Specifies the database name, default is `test`
--help Prints help information
-K,--useKafka Enables Kafka mode, creating a producer to send messages and a consumer to receive messages for writing to TDengine. Otherwise, uses worker threads to subscribe to simulated data for writing.
-r,--readThreadCount <arg> Specifies the number of worker threads, default is 5. In Kafka mode, this parameter also determines the number of producer and consumer threads.
-R,--rowsPerSubTable <arg> Specifies the number of rows to write per child table, default is 100
-s,--subTableNum <arg> Specifies the total number of child tables, default is 1000000
-w,--writeThreadPerReadThread <arg> Specifies the number of write threads per worker thread, default is 5
JDBC URL and Kafka Cluster Address Configuration:
The JDBC URL is configured via an environment variable, for example:
export TDENGINE_JDBC_URL="jdbc:TAOS-WS://localhost:6041?user=root&password=taosdata"
The Kafka cluster address is configured via an environment variable, for example:
export KAFKA_BOOTSTRAP_SERVERS=localhost:9092
Usage:
1. Simulated data writing mode:
java -jar highVolume.jar -r 5 -w 5 -b 10000 -c 100000 -s 1000000 -R 1000
2. Kafka subscription writing mode:
java -jar highVolume.jar -r 5 -w 5 -b 10000 -c 100000 -s 1000000 -R 100 -K
Responsibilities of the Main Program:
{{#include docs/examples/JDBC/highvolume/src/main/java/com/taos/example/highvolume/FastWriteExample.java}}
The worker thread is responsible for reading data from the simulated data source. Each read task is associated with a simulated data source, which can generate data for a specific range of sub-tables. Different simulated data sources generate data for different tables.
The worker thread uses a blocking approach to invoke the JDBC standard interface addBatch. This means that if the corresponding efficient writing backend queue is full, the write operation will block.
{{#include docs/examples/JDBC/highvolume/src/main/java/com/taos/example/highvolume/WorkTask.java}}
A simulated data generator that produces data for a certain range of sub-tables. To mimic real-world scenarios, it generates data in a round-robin fashion, one row per subtable.
{{#include docs/examples/JDBC/highvolume/src/main/java/com/taos/example/highvolume/MockDataSource.java}}
Creates sub-tables within a specified range using a batch SQL creation approach.
{{#include docs/examples/JDBC/highvolume/src/main/java/com/taos/example/highvolume/CreateSubTableTask.java}}
A data model class that provides serialization and deserialization methods for sending data to Kafka and receiving data from Kafka.
{{#include docs/examples/JDBC/highvolume/src/main/java/com/taos/example/highvolume/Meters.java}}
A message producer that writes data generated by the simulated data generator to all partitions using a hash method different from JDBC efficient writing.
{{#include docs/examples/JDBC/highvolume/src/main/java/com/taos/example/highvolume/ProducerTask.java}}
A message consumer that receives messages from Kafka and writes them to TDengine.
{{#include docs/examples/JDBC/highvolume/src/main/java/com/taos/example/highvolume/ConsumerTask.java}}
Provides a periodic function to count the number of written records.
{{#include docs/examples/JDBC/highvolume/src/main/java/com/taos/example/highvolume/StatTask.java}}
A utility class that provides functions such as creating connections, creating databases, and creating topics.
{{#include docs/examples/JDBC/highvolume/src/main/java/com/taos/example/highvolume/Util.java}}
Execution Steps:
<details> <summary>Execute the Java Example Program</summary>Execute the example program in a local integrated development environment:
Clone the TDengine repository
git clone [email protected]:taosdata/TDengine.git --depth 1
Open the TDengine/docs/examples/JDBC/highvolume directory with the integrated development environment.
Configure the environment variable TDENGINE_JDBC_URL in the development environment. If the global environment variable TDENGINE_JDBC_URL has already been configured, you can skip this step.
If you want to run the Kafka example, you need to set the environment variable KAFKA_BOOTSTRAP_SERVERS for the Kafka cluster address.
Specify command-line arguments, such as -r 3 -w 3 -b 100 -c 1000 -s 1000 -R 100.
Run the class com.taos.example.highvolume.FastWriteExample.
Execute the example program on a remote server:
To execute the example program on a server, follow these steps:
Package the sample code. Navigate to the directory TDengine/docs/examples/JDBC/highvolume and run the following command to generate highVolume.jar:
mvn package
Copy the program to the specified directory on the server:
scp -r .\target\highVolume.jar <user>@<host>:~/dest-path
Configure the environment variable.
Edit ~/.bash_profile or ~/.bashrc and add the following content for example:
export TDENGINE_JDBC_URL="jdbc:TAOS-WS://localhost:6041?user=root&password=taosdata"
The above uses the default JDBC URL for a locally deployed TDengine Server. Modify it according to your actual environment. If you want to use Kafka subscription mode, additionally configure the Kafka cluster environment variable:
export KAFKA_BOOTSTRAP_SERVERS=localhost:9092
Start the sample program with the Java command. Use the following template (append -K for Kafka subscription mode):
java -jar highVolume.jar -r 5 -w 5 -b 10000 -c 100000 -s 1000000 -R 1000
Terminate the test program. The program does not exit automatically. Once a stable write speed is achieved under the current configuration, press <kbd>CTRL</kbd> + <kbd>C</kbd> to terminate it. Below is a sample log output from an actual run on a machine with a 40-core CPU, 256GB RAM, and SSD storage.
---------------$ java -jar highVolume.jar -r 2 -w 10 -b 10000 -c 100000 -s 1000000 -R 100
[INFO ] 2025-03-24 18:03:17.980 com.taos.example.highvolume.FastWriteExample main 309 main readThreadCount=2, writeThreadPerReadThread=10 batchSizeByRow=10000 cacheSizeByRow=100000, subTableNum=1000000, rowsPerSubTable=100
[INFO ] 2025-03-24 18:03:17.983 com.taos.example.highvolume.FastWriteExample main 312 main create database begin.
[INFO ] 2025-03-24 18:03:34.499 com.taos.example.highvolume.FastWriteExample main 315 main create database end.
[INFO ] 2025-03-24 18:03:34.500 com.taos.example.highvolume.FastWriteExample main 317 main create sub tables start.
[INFO ] 2025-03-24 18:03:34.502 com.taos.example.highvolume.FastWriteExample createSubTables 73 main create sub table task started.
[INFO ] 2025-03-24 18:03:55.777 com.taos.example.highvolume.FastWriteExample createSubTables 82 main create sub table task finished.
[INFO ] 2025-03-24 18:03:55.778 com.taos.example.highvolume.FastWriteExample main 319 main create sub tables end.
[INFO ] 2025-03-24 18:03:55.781 com.taos.example.highvolume.WorkTask run 41 FW-work-thread-2 started
[INFO ] 2025-03-24 18:03:55.781 com.taos.example.highvolume.WorkTask run 41 FW-work-thread-1 started
[INFO ] 2025-03-24 18:04:06.580 com.taos.example.highvolume.StatTask run 36 pool-1-thread-1 numberOfTable=1000000 count=12235906 speed=1223590
[INFO ] 2025-03-24 18:04:17.531 com.taos.example.highvolume.StatTask run 36 pool-1-thread-1 numberOfTable=1000000 count=31185614 speed=1894970
[INFO ] 2025-03-24 18:04:28.490 com.taos.example.highvolume.StatTask run 36 pool-1-thread-1 numberOfTable=1000000 count=51464904 speed=2027929
[INFO ] 2025-03-24 18:04:40.851 com.taos.example.highvolume.StatTask run 36 pool-1-thread-1 numberOfTable=1000000 count=71498113 speed=2003320
[INFO ] 2025-03-24 18:04:51.948 com.taos.example.highvolume.StatTask run 36 pool-1-thread-1 numberOfTable=1000000 count=91242103 speed=1974399
From the client application's perspective, the following factors should be considered for efficient data writing:
Client applications should fully and appropriately utilize these factors. For example, choosing parameter binding, pre-creating sub-tables, writing to a single table (or sub-table) in each batch, and configuring batch size and concurrent thread count through testing to achieve the optimal write speed for the current system.
Client applications usually need to read data from a data source before writing it to TDengine. From the data source's perspective, the following situations require adding a queue between the reading and writing threads:
If the data source for the writing application is Kafka, and the writing application itself is a Kafka consumer, then Kafka's features can be utilized for efficient writing. For example:
First, consider several important performance-related parameters in the database creation options:
last_row and each column's last value during writing. Reducing the impact by changing the option from both to last_row or last_value.stt_trigger = 1 is suitable for scenarios with few tables and high write frequency; stt_trigger > 1 is better for scenarios with many tables and low write frequency.For other parameters, refer to Database Management.
Next, consider performance-related parameters in the taosd configuration:
debugFlag control log output levels. Higher log levels increase output pressure and impact write performance, so the default configuration is recommended.For other parameters, refer to Server Configuration.
From the factors affecting write performance discussed above, developing high-performance data writing programs requires knowledge of multithreaded programming and data sharding, posing a technical threshold. To reduce user development costs, the TDengine connector provides the efficient writing feature, allowing users to leverage TDengine's powerful writing capabilities without dealing with underlying thread management and data sharding logic.
Below is a schematic diagram of the connector's efficient writing feature implementation:
Automatic Thread and Queue Creation:
The connector dynamically creates independent write threads and corresponding write queues based on configuration parameters. Each queue is bound to a sub-table, forming a processing chain of "sub-table data - dedicated queue - independent thread".
Data Sharding and Batch Triggering:
When the application writes data, the connector automatically shards the data by sub-table and caches it in the corresponding queue. Batch sending is triggered when either of the following conditions is met:
This mechanism significantly improves write throughput by reducing the number of network requests.
:::note
The connector's efficient writing feature only supports writing to supertables, not ordinary tables.
:::