docs/en/using_starrocks/Flat_json.md
import Beta from '../_assets/commonMarkdown/_beta.mdx'
This article introduces the basic concept of Flat JSON and how to use this feature.
Starting from version 2.2.0, StarRocks supports the JSON data type to allow more flexible data storage. However, when querying JSON, most scenarios do not involve directly reading the entire JSON data but rather accessing data at specified paths. For example:
-- Store required fields in logs as fixed fields, and package other fields that frequently change with business as JSON.
SELECT
time,
event,
user,
get_json_string(remain_json, "$.from_system"),
get_json_string(remain_json, "$.tag")
FROM logs;
Due to the special nature of the JSON type, its performance in queries is not as good as standard types (INT, STRING, etc.). The reasons include:
StarRocks introduces the Flat JSON feature to improve JSON data query efficiency and reduce the complexity of using JSON.
The core principle of Flat JSON is to detect JSON data during load and extract common fields from JSON data for storage as standard type data. When querying JSON, these common fields optimize the query speed of JSON. Example data:
1, {"a": 1, "b": 21, "c": 3, "d": 4}
2, {"a": 2, "b": 22, "d": 4}
3, {"a": 3, "b": 23, "d": [1, 2, 3, 4]}
4, {"a": 4, "b": 24, "d": null}
5, {"a": 5, "b": 25, "d": null}
6, {"c": 6, "d": 1}
When loading the above JSON data, fields a and b are present in most JSON data and have similar data types (both INT). Therefore, the data of fields a and b can be extracted from JSON and stored separately as two INT columns. When these two columns are used in queries, their data can be directly read without needing to process additional JSON fields, reducing the computational overhead of handling JSON structures.
Flat JSON is enabled globally by default from v4.0 onwards. For versions earlier than v4.0, you must manually enable it.
From v4.0, this feature can be configured on the table level.
Modify BE configuration: enable_json_flat, which defaults to false before v4.0. For modification methods, refer to
Configure BE parameters.
Enable FE pruning feature:
SET GLOBAL cbo_prune_json_subfield = true;
Setting Flat JSON-related properties on table level is supported from v4.0 onwards.
When creating the table, you can set flat_json.enable and other Flat JSON-related properties. For detailed instructions, see CREATE TABLE.
Alternatively, you can set these properties using ALTER TABLE.
Example:
ALTER TABLE t1 SET ("flat_json.enable" = "true");
ALTER TABLE t1 SET ("flat_json.null.factor" = "0.1");
ALTER TABLE t1 SET ("flat_json.sparsity.factor" = "0.8");
ALTER TABLE t1 SET ("flat_json.column.max" = "90");
Enable FE pruning feature:
SET GLOBAL cbo_prune_json_subfield = true;
After loading data, you can query the extracted sub-columns of the corresponding column:
SELECT flat_json_meta(<json_column>)
FROM <table_name>[_META_];
You can verify whether the executed query benefits from Flat JSON optimization through the Query Profile by observing the following metrics:
PushdownAccessPaths: The number of sub-field paths pushed down to storage.AccessPathHits: The number of times Flat JSON sub-fields are hit, with detailed information on the specific JSON hit.AccessPathUnhits: The number of times Flat JSON sub-fields are not hit, with detailed information on the specific JSON not hit.JsonFlattern: The time taken to extract sub-columns on-site when Flat JSON is not hit.Method 1: Configure Flat JSON properties when creating a table with JSON columns. Supported from v4.0 onwards.
CREATE TABLE `t1` (
`k1` int,
`k2` JSON,
`k3` VARCHAR(20),
`k4` JSON
)
DUPLICATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 2
PROPERTIES (
"replication_num" = "3",
"flat_json.enable" = "true",
"flat_json.null.factor" = "0.5",
"flat_json.sparsity.factor" = "0.5",
"flat_json.column.max" = "50");
INSERT INTO t1 (k1,k2) VALUES
(11,parse_json('{"str":"test_flat_json","Integer":123456,"Double":3.14158,"Object":{"c":"d"},"arr":[10,20,30],"Bool":false,"null":null}')),
(15,parse_json('{"str":"test_str0","Integer":11,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(15,parse_json('{"str":"test_str1","Integer":111,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(15,parse_json('{"str":"test_str2","Integer":222,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(15,parse_json('{"str":"test_str2","Integer":222,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(16,parse_json('{"str":"test_str3","Integer":333,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(17,parse_json('{"str":"test_str3","Integer":333,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(18,parse_json('{"str":"test_str5","Integer":444,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(19,parse_json('{"str":"test_str6","Integer":444,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(20,parse_json('{"str":"test_str6","Integer":444,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}'));
Method 2: Enable Flat JSON before creating the table.
Enable the feature (refer to other sections)
Create a table with JSON columns. In this example, use INSERT INTO to load JSON data into the table.
CREATE TABLE `t1` (
`k1` int,
`k2` JSON,
`k3` VARCHAR(20),
`k4` JSON
)
DUPLICATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 2
PROPERTIES ("replication_num" = "3");
INSERT INTO t1 (k1,k2) VALUES
(11,parse_json('{"str":"test_flat_json","Integer":123456,"Double":3.14158,"Object":{"c":"d"},"arr":[10,20,30],"Bool":false,"null":null}')),
(15,parse_json('{"str":"test_str0","Integer":11,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(15,parse_json('{"str":"test_str1","Integer":111,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(15,parse_json('{"str":"test_str2","Integer":222,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(15,parse_json('{"str":"test_str2","Integer":222,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(16,parse_json('{"str":"test_str3","Integer":333,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(17,parse_json('{"str":"test_str3","Integer":333,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(18,parse_json('{"str":"test_str5","Integer":444,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(19,parse_json('{"str":"test_str6","Integer":444,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}')),
(20,parse_json('{"str":"test_str6","Integer":444,"Double":3.14,"Object":{"a":"b"},"arr":[1,2,3],"Bool":true,"null":null}'));
View the extracted sub-columns for the k2 column.
SELECT flat_json_meta(k2) FROM t1[_META_];
+---------------------------------------------------------------------------------------------------------------------------+
| flat_json_meta(k2) |
+---------------------------------------------------------------------------------------------------------------------------+
| ["nulls(TINYINT)","Integer(BIGINT)","Double(DOUBLE)","str(VARCHAR)","Bool(JSON)","Object(JSON)","arr(JSON)","null(JSON)"] |
+---------------------------------------------------------------------------------------------------------------------------+
Execute data queries.
SELECT * FROM t1;
SELECT get_json_string(k2,'\$.Integer') FROM t1 WHERE k2->'str' = 'test_flat_json';
SELECT get_json_string(k2,'\$.Double') FROM t1 WHERE k2->'Integer' = 123456;
SELECT get_json_string(k2,'\$.Object') FROM t1 WHERE k2->'Double' = 3.14158;
SELECT get_json_string(k2,'\$.arr') FROM t1 WHERE k2->'Object' = to_json(map{'c':'d'});
SELECT get_json_string(k2,'\$.Bool') FROM t1 WHERE k2->'arr' = '[10,20,30]';
View Flat JSON-related metrics in the Query Profile
PushdownAccessPaths: 2
- Table: t1
- AccessPathHits: 2
- __MAX_OF_AccessPathHits: 1
- __MIN_OF_AccessPathHits: 1
- /k2: 2
- __MAX_OF_/k2: 1
- __MIN_OF_/k2: 1
- AccessPathUnhits: 0
- JsonFlattern: 0ns
cbo_json_v2_rewrite (default: true): Enables JSON v2 path rewrite so that expressions like get_json_* can be rewritten to direct access of Flat JSON subcolumns, unlocking predicate pushdown and column pruning.cbo_json_v2_dict_opt (default: true): Enables low-cardinality dictionary optimization for Flat JSON extended string subcolumns produced by the rewrite, which can speed up string expressions, GROUP BY, and JOIN.Example:
SET cbo_json_v2_rewrite = true;
SET cbo_json_v2_dict_opt = true;
StarRocks shared-nothing clusters support Flat JSON starting from v3.3.0, and shared-data clusters support it from v3.3.3.
In versions v3.3.0, v3.3.1, and v3.3.2:
Starting from version v3.3.3: