docs/en/14-reference/02-tools/11-taosgen.md
taosgen is a performance benchmarking tool for time-series data products, supporting data generation, write performance testing, and more. taosgen uses "jobs" as the basic unit, where a job is user-defined and consists of a set of operations to accomplish a specific task. Each job contains one or more steps and can be connected to other jobs via dependencies, forming a Directed Acyclic Graph (DAG) execution flow for flexible and efficient task orchestration.
taosgen currently supports Linux and macOS systems.
Compared to taosBenchmark, taosgen offers the following advantages and improvements:
taosgen solves the problems of inflexible configuration, limited data generation methods, and poor extensibility in taosBenchmark, making it more suitable for modern IoT and industrial internet big data testing needs.
Download the taosgen tool as needed. Download the binary release package locally and extract it. For convenient access, you can create a symbolic link in the system executable directory. For example, on Linux:
tar zxvf tsgen-v0.3.0-linux-amd64.tar.gz
cd tsgen
ln -sf `pwd`/taosgen /usr/bin/taosgen
taosgen supports parameter configuration via command line or configuration file. For parameters specified both ways, command line options take precedence.
:::tip Before running taosgen, ensure that all target TDengine TSDB clusters are running normally. :::
Example startup:
taosgen -h 127.0.0.1 -c config.yaml
| Parameter | Description |
|---|---|
| -h/--host | Hostname or IP address of the server to connect to, default is localhost |
| -P/--port | Port number of the server to connect to, default is 6030 |
| -u/--user | Username for connecting to the server, default is root |
| -p/--password | Password for connecting to the server, default is taosdata |
| -c/--config-file | Path to the YAML configuration file |
| -d/--log-dir | Specify log output directory, default is ./log |
| -f/--log-file | Specify complete log file path (overrides --log-dir) |
| -?/--help | Show help information and exit |
| -V/--version | Show version information and exit. Cannot be used with other parameters |
Tip: If no parameters are specified when running taosgen, taosgen will create the TDengine database tsbench, the super table meters, 10,000 child tables, and batch write 10,000 rows of data to each child table.
The configuration file is divided into many parts: "tdengine", "mqtt", "kafka", "schema", "concurrency", and "jobs".
log/.A job is user-defined and contains an ordered set of steps. Each job has a unique identifier (key name) and can specify dependencies (needs) to control execution order with other jobs. Job attributes include:
A step is the basic operation unit in a job, representing the execution process of a specific operation type. Each step runs in order and can reference a predefined Action to perform a specific function. Step attributes include:
log/. When only log_dir is set, the log file will be written to <log_dir>/taosgen.log.These parameters can also be set via command line options (--log-dir, --log-file). The priority order from highest to lowest is:
--log-file (CLI)--log-dir (CLI)log_file (YAML config)log_dir (YAML config)log/taosgen.logprecision ms vgroups 20 replica 3 keep 3650 sets the time precision, virtual group count, replica count, and data retention period.
bootstrap_servers (string): A list of Kafka cluster addresses in "host:port" format, separated by commas.
client_id (string): Client identifier prefix, default: taosgen.
topic (string): The name of the Kafka Topic to write to.
rdkafka_options (map): Optional parameters supported by the underlying librdkafka library, such as: security.protocol, sasl.mechanisms, sasl.username, sasl.password.
security.protocol (string): Specifies the security protocol for communication between the client and the Kafka cluster. Options:
sasl.mechanism (string): When security.protocol is set to "sasl_plaintext" or "sasl_ssl", specifies the SASL authentication mechanism to use. Common options:
Note: This field must be configured along with security.protocol, and its value depends on the SASL mechanisms enabled on the Kafka Broker.
sasl.username (string): The username for SASL authentication. Required for "PLAIN" or "SCRAM" mechanisms.
sasl.password (string): The password for SASL authentication.
For more parameters, refer to the librdkafka configuration documentation.
±[value][unit] (e.g., "+1d3h30m" means add 1 day 3 hours 30 minutes), units: y (year), M (month, uppercase), d (day), h (hour), m (minute, lowercase), s (second).Each column includes:
name (string): Column name. If count > 1, name is the prefix (e.g., name: current, count: 3 yields current1, current2, current3).
type (string): Data type, supports TDengine-compatible types (case-insensitive):
Currently, the following data types are not supported: json, geometry, varbinary, decimal, blob.
count (int): Number of consecutive columns of this type (e.g., count: 4096 creates 4096 columns).
props (string): Column property info for TDengine, may include:
gen_type (string): Data generation method for this column, default: random. Supported types:
random: Random generation
order: Sequential natural number growth (integer only), wraps to min after reaching max
expression: Generated by expression (supports integer, float, and string types). If gen_type is not explicitly specified but the expr attribute is detected, gen_type will be automatically set to expression.
expr (string): Expression content, syntax uses Lua language, built-in variables:
_i: Call index, starts from 0, e.g., "2 + math.sin(_i/10)";_table: Table for which the expression builds data;_last: Last returned value for this expression, only for numeric types, initial value 0.0;Example of a complex expression:
(math.sin(_i / 7) * math.cos(_i / 13) + 0.5 * (math.random(80, 120) / 100)) * ((_i % 50 < 25) and (1 + 0.3 * math.sin(_i / 3)) or 0.7) + 10 * (math.floor(_i / 100) % 2)
This combines various math functions, conditional logic, periodic behavior, and random noise to simulate a nonlinear, noisy, segmented dynamic data generation process, structured as (A + B) × C + D. Breakdown:
| Part | Content | Type | Function |
|---|---|---|---|
| A | math.sin(_i / 7) * math.cos(_i / 13) | Base signal | Dual-frequency modulation, generates complex waveform |
| B | 0.5 * (math.random(80, 120) / 100) | Noise | Adds 80%~120% random noise |
| C | ((_i % 50 < 25) and (1 + 0.3 * math.sin(_i / 3)) or 0.7) | Dynamic gain | Switches gain every 50 calls (first 25 high, next 25 low) |
| D | 10 * (math.floor(_i / 100) % 2) | Baseline step | Switches baseline every 100 calls (0 or 10), simulates peaks |
Actions are encapsulated reusable operation units for specific functions. Each action represents an independent logic and can be called and executed in different steps. By abstracting common operations into standardized action modules, the system achieves extensibility and flexible configuration. The same action type can be used in multiple steps in parallel or repeatedly, supporting diverse workflow orchestration. For example: creating databases, defining super tables, generating child tables, writing data, etc., can all be scheduled via corresponding actions. Currently supported built-in actions:
tdengine/create-database: Create TDengine databasetdengine/create-super-table: Create TDengine super tabletdengine/create-child-table: Create child tables for TDengine super tabletdengine/insert: Write data to TDengine databasemqtt/publish: Publish data to MQTT Brokerkafka/produce: Publish data to Kafka Broker
Each action can receive parameters via the with field, with content varying by action type.:::note
tdengine/insert-data was the old name used in v0.7.x and earlier. Using it from v0.8.0 onward will show: "Action 'tdengine/insert-data' is deprecated and will be removed in future versions. Please use 'tdengine/insert' instead". This name is no longer supported as of v0.8.3 (corresponding to TDengine TSDB 3.3.6.39/3.3.8.16/3.4.0.2).
:::
The tdengine/create-database action creates a new database on the specified TDengine server. With connection info and database parameters, users can easily define database properties such as name, whether to drop if exists, time precision, etc.
The tdengine/create-super-table action creates a new super table in the specified database. With necessary connection info and super table parameters, users can define properties such as table name, normal columns, tag columns, etc.
The tdengine/create-child-table action creates multiple child tables in the target database based on the specified super table. Each child table can have different names and tag data, enabling effective classification and management of time-series data. Supports defining child table names and tag data from generator or CSV file sources.
The tdengine/insert action writes data to specified child tables. Supports obtaining child table names and normal column data from generator or CSV file sources, and allows users to control timestamp attributes via various strategies. Also provides rich write control strategies for optimization.
The mqtt/publish action publishes data to the specified topic. Supports obtaining data from generator or CSV file sources, and allows users to control timestamp attributes via various strategies. Also provides rich publish control strategies for optimization.
{table}. Supports dynamic topics via placeholder syntax:
{table}: Table name data{column}: Column data, where column is the column field nameThe kafka/produce action publishes data to the specified topic. It supports obtaining data from generator or CSV file sources and allows users to control timestamp attributes via various strategies. It also provides rich control strategies to optimize the publishing process.
{table}. Placeholder syntax:
{table}: Table name data{column}: Column data, where column is the column field nameThis example demonstrates how to use taosgen to simulate 10,000 smart meters, each collecting current, voltage, and phase. Each meter generates a record every 5 minutes, with current data generated randomly and voltage simulated using a sine wave. The generated data is written to the meters super table in the tsbench database of TDengine TSDB via WebSocket.
Configuration details:
Scenario description:
This configuration is designed for TDengine database performance benchmarking. It is suitable for simulating large-scale IoT devices (such as meters, sensors) continuously generating high-frequency data, and can be used for:
{{#include docs/examples/taosgen/taosgen_config.yaml:tdengine_gen_stmt_insert_config}}
The parameters tdengine, schema::name, schema::tbname, schema::tags, tdengine/create-child-table::batch, and tdengine/insert::concurrency can use their default values to further simplify the configuration.
{{#include docs/examples/taosgen/taosgen_config.yaml:tdengine_gen_stmt_insert_simple}}
This example demonstrates how to use taosgen to simulate 10,000 smart meters, each collecting current, voltage, and phase. Each meter generates a record every 5 minutes, with measurement data read from a CSV file and written to the meters super table in the tsbench database of TDengine TSDB via WebSocket.
Configuration details:
Scenario description:
This configuration is designed for importing device metadata and historical data from existing CSV files into TDengine. It is suitable for:
{{#include docs/examples/taosgen/taosgen_config.yaml:tdengine_csv_stmt_insert_config}}
Where:
ctb-tags.csv file format:
groupid,location,tbname
1,California.Campbell,d1
2,Texas.Austin,d2
3,NewYork.NewYorkCity,d3
ctb-data.csv file format:
tbname,ts,current,voltage,phase
d1,1700000010000,5.23,221.5,146.2
d3,1700000010000,8.76,219.8,148.7
d2,1700000010000,12.45,223.1,147.3
d3,1700000310000,9.12,220.3,149.1
d2,1700000310000,11.87,222.7,145.8
d1,1700000310000,4.98,220.9,147.9
This example demonstrates how to use taosgen to simulate 10,000 smart meters, each collecting current, voltage, phase, and location. Each meter generates a record every 5 minutes, with current data generated randomly, voltage simulated using a sine wave, and the generated data published via MQTT.
Configuration details:
factory/{table}/{location}, where:
{table} placeholder is replaced with the generated child table name.{location} placeholder is replaced with the generated location column value, enabling publishing to different topics by device location.Scenario description:
This configuration is designed for publishing simulated device data to an MQTT message broker. It is suitable for:
{{#include docs/examples/taosgen/taosgen_config.yaml:mqtt_publish_config}}
This example demonstrates how to use taosgen to simulate 10,000 smart meters, each collecting current, voltage, phase, and location. Each meter generates a record every 5 minutes, with current data generated randomly, voltage simulated using a sine wave, and the generated data published to Kafka.
Configuration details:
Kafka configuration
bootstrap_servers to describe the connection to the Kafka Broker.factory-electric-meter.schema configuration
Data publishing: 8 threads concurrently publishing to Kafka Broker for higher throughput.
Scenario description:
This configuration is designed for publishing simulated device data to a Kafka message broker. It is suitable for:
{table} for device ID) to test message filtering, multiplexing, and dynamic routing based on device tags.{{#include docs/examples/taosgen/taosgen_config.yaml:kafka_produce_config}}