docs/en/administration/management/BE_parameters/query_loading.md
import BEConfigMethod from '../../../_assets/commonMarkdown/BE_config_method.mdx'
import CNConfigMethod from '../../../_assets/commonMarkdown/CN_config_method.mdx'
import PostBEConfig from '../../../_assets/commonMarkdown/BE_dynamic_note.mdx'
import StaticBEConfigNote from '../../../_assets/commonMarkdown/StaticBE_config_note.mdx'
You can view the BE configuration items using the following command:
SELECT * FROM information_schema.be_configs [WHERE NAME LIKE "%<name_pattern>%"]
This topic introduces the following types of FE configurations:
dictionary_speculate_min_chunk_size the writer will run speculation immediately and set an encoding (DICT, PLAIN or BIT_SHUFFLE) rather than buffering more rows. Speculation uses dictionary_encoding_ratio for string columns and dictionary_encoding_ratio_for_non_string_column for numeric/non-string columns to decide whether dictionary encoding is beneficial. Also, a large column byte_size (larger than or equal to UINT32_MAX) forces immediate speculation to avoid BinaryColumn<uint32_t> overflow.true indicates disabling PageCache.true to false since StarRocks v2.4.true, StarRocks will postpone the Flat JSON operation to calculation process instead of read process.string_prefix_zonemap_prefix_len.ExecStateReporter to asynchronously send non-priority execution status reports (such as fragment completion and error status) from BE to FE via RPC. The actual pool size at startup is max(1, exec_state_report_max_threads). Changing this config at runtime triggers update_max_threads on the pool in every executor set (shared and exclusive). The pool has a fixed task queue size of 1000; report submissions are silently dropped when all threads are busy and the queue is full. Paired with priority_exec_state_report_max_threads for the high-priority pool. Increase this value when delayed or dropped exec-state reports are observed under high query concurrency.${STARROCKS_HOME}/bin/flamegraphdfs.client.hedged.read.threadpool.size parameter in the hdfs-site.xml file of your HDFS cluster.30. In this situation, if a read from a block has not returned within 30 milliseconds, your HDFS client immediately starts up a new read against a different block replica. It is equivalent to the dfs.client.hedged.read.threshold.millis parameter in the hdfs-site.xml file of your HDFS cluster.jit_lru_cache_size = min(mem_limit*0.01, 1GB) (while mem_limit of the node must be greater or equal to 16 GB).enable_json_flat is set to true.enable_json_flat is set to true.enable_json_flat is set to true.enable_json_flat is set to true.0 (or ≤ 0) disables late materialization; 1000 (or ≥ 1000) forces late materialization for all reads. Values > 0 and < 1000 enable a conditional strategy where both late and early materialization contexts are prepared and the iterator selects behavior based on predicate filter ratios (higher values favor late materialization). When a segment contains complex metric types, StarRocks uses metric_late_materialization_ratio instead. If lake_io_opts.cache_file_only is set, late materialization is disabled.0 disables late materialization; 1000 forces late materialization for all applicable reads. Values 1–999 enable a conditional strategy where both late and early materialization contexts are prepared and chosen at runtime based on predicate/selectivity. When complex metric types exist, metric_late_materialization_ratio overrides the general late_materialization_ratio. Note: cache_file_only I/O mode will cause late materialization to be disabled regardless of this setting.-1 indicates to use the default timeout duration of the SDK configurations.-1 indicates to use the default timeout duration of the SDK configurations.true indicates enabling late materialization, and false indicates disabling it.true indicates enabling pageindex, and false indicates disabling it.true indicates enabling the bloom filter, and false indicates disabling it. You can also control this behavior on session level using the system variable enable_parquet_reader_bloom_filter. Bloom filters in Parquet are maintained at the column level within each row group. If a Parquet file contains bloom filters for certain columns, queries can use predicates on those columns to efficiently skip row groups.0, if a driver takes longer than pipeline_poller_timeout_guard_ms for a single dispatch in the poller, then the information of the driver and operator is printed.0 indicates the value is equal to the number of system VCPU core number.0, if a plan fragment exceeds pipeline_prepare_timeout_guard_ms during the PREPARE process, a stack trace of the plan fragment is printed.pk_index_parallel_get_threadpool_max_threads; this setting only limits how many tasks may be queued awaiting execution. The very large default (2^20) effectively makes the queue unbounded; lowering it prevents excessive memory growth from queued tasks but may cause task submissions to block or fail when the queue is full. Tune together with pk_index_parallel_get_threadpool_max_threads based on workload concurrency and memory constraints.ExecStateReporter to asynchronously send high-priority execution status reports (such as urgent fragment failures) from BE to FE via RPC. Unlike the normal exec-state-report pool, this pool has an unbounded task queue. The actual pool size at startup is max(1, priority_exec_state_report_max_threads). Changing this config at runtime triggers update_max_threads on the priority pool in every executor set (shared and exclusive). Paired with exec_state_report_max_threads for the normal pool. Increase this value when high-priority reports are delayed under heavy concurrent query loads._upgrade_counter; when _upgrade_counter exceeds priority_queue_remaining_tasks_increased_frequency, the queue increments every element's priority, rebuilds the heap, and resets the counter. Lower values cause more frequent priority aging (reducing starvation but increasing CPU cost due to iterating and re-heapifying); higher values reduce that overhead but delay priority adjustments. The value is a simple operation count threshold, not a time duration.query_pool memory limit * query_pool_spill_mem_limit_threshold, intermediate result spilling will be triggered.${STARROCKS_HOME}; (e.g. /mnt/ssd1/tmp;/mnt/ssd2/tmp). Directories should be accessible and writable by the BE process and have sufficient free space; StarRocks will pick among them to distribute spill I/O. Changes require a restart to take effect. If a directory is missing, not writable, or full, spilling may fail or degrade query performance.enable_string_prefix_zonemap is enabled.GlobalEnv computes the MemTracker for updates as process_mem_limit * clamp(update_memory_limit_percent, 0, 100) / 100. UpdateManager also uses this percentage to size its primary-index/index-cache capacity (index cache capacity = GlobalEnv::process_mem_limit * update_memory_limit_percent / 100). The HTTP config update logic registers a callback that calls update_primary_index_memory_limit on the update managers, so changes would be applied to the update subsystem if the config were changed. Increasing this value gives more memory to update/primary-index paths (reducing memory available for other pools); decreasing it reduces update memory and cache capacity. Values are clamped to the range 0–100.0 or negative, it will be clamped to 1 to avoid infinite loop. This item controls the number of newly inserted rows processed in each batch. Larger values can improve write performance but will consume more memory.true means fallback to LIST metadata files to compute base size (historical behavior, more accurate size estimation). false means skip LIST and use base_size = 0, which reduces LIST object requests but may delay immutable marking due to less accurate size estimation.0, the system uses twice of the CPU core count as the value.
When this value is set to less than 0, the system uses the product of its absolute value and the CPU core count as the value.10240 to 102400.0, the system uses the CPU core count as the value, so as to avoid insufficient thread resources when import concurrency is high but only a fixed number of threads are used. From v2.5, the default value has been changed from 8 to 0.PTabletWriterOpen) is offloaded from the BRPC worker to a dedicated thread pool: the request handler creates a ChannelOpenTask and submits it to the internal _async_rpc_pool instead of running LoadChannelMgr::_open inline. This reduces work and blocking inside BRPC threads and allows tuning concurrency via load_channel_rpc_thread_pool_num and load_channel_rpc_thread_pool_queue_size. If the thread pool submission fails (when pool is full or shut down), the request is canceled and an error status is returned. The pool is shut down on LoadChannelMgr::close(), so consider capacity and lifecycle when you want to enable this feature so as to avoid request rejections or delayed processing.PLoadDiagnoseRequest and sends an RPC to the remote LoadChannel to collect a profile and/or stack trace (controlled by load_diagnose_rpc_timeout_profile_threshold_ms and load_diagnose_rpc_timeout_stack_trace_threshold_ms). The diagnose RPC uses load_diagnose_send_rpc_timeout_ms as its timeout. Diagnosis is skipped if a diagnose request is already in progress. Enabling this produces additional RPCs and profiling work on target nodes; disable on sensitive production workloads to avoid extra overhead._parallel_load and disables partial compaction when enabled); consider implications for operations that rely on segment order.TLoadJobType::STREAM_LOAD, ConnectorScanNode submits scanner tasks to the streaming_load_thread_pool (which is configured with INT32_MAX threads and queue sizes, i.e. effectively unbounded). When disabled, scanners use the general thread_pool and its PriorityThreadPool submission logic (priority computation, try_offer/offer behavior). Enabling isolates streaming-load work from regular query execution to reduce interference; however, because the dedicated pool is effectively unbounded, enabling may increase concurrent threads and resource usage under heavy streaming-load traffic. This option is on by default and typically does not require modification.es_scroll_keepalive, which controls the scroll context keep-alive duration.es_index_max_result_window, chunk_size) when building KEY_BATCH_SIZE for the ES reader. If an ES request exceeds the Elasticsearch index setting index.max_result_window, Elasticsearch returns HTTP 400 (Bad Request). Adjust this value when scanning large indexes or increase the ES index.max_result_window on the Elasticsearch side to permit larger single requests.false, the system will treat any tablet header load failures (non-NotFound and non-AlreadyExist errors) as fatal: the code logs the error and calls LOG(FATAL) to stop the BE process. When it is set to true, the BE continues startup despite such per-tablet load errors — failed tablet IDs are recorded and skipped while successful tablets are still loaded. Note that this parameter does NOT suppress fatal errors from the RocksDB meta scan itself, which always cause the process to quit._aborted_load_channels. When a load job is cancelled or fails, the load ID stays recorded so any late-arriving load RPCs can be rejected immediately; once the delay expires, the entry is cleaned during the periodic background sweep (minimum sweep interval is 60 seconds). Setting the delay too low risks accepting stray RPCs after an abort, while setting it too high may retain state and consume resources longer than necessary. Tune this to balance correctness of late-request rejection and resource retention for aborted loads.-1) the pool size is auto-set to the number of CPU cores (CpuInfo::num_cores()). The configured value is used as ThreadPoolBuilder's max threads and the pool's min threads is set to min(5, max_threads). The pool queue size is controlled separately by load_channel_rpc_thread_pool_queue_size. This setting was introduced to align the async RPC pool size with brpc workers' default (brpc_num_threads) so behavior remains compatible after switching load RPC handling from synchronous to asynchronous. Changing this config at runtime triggers ExecEnv::GetInstance()->load_channel_mgr()->async_rpc_pool()->update_max_threads(...).open requests when enable_load_channel_rpc_async is enabled; the pool size is paired with load_channel_rpc_thread_pool_num. The large default (1024000) aligns with brpc workers' defaults to preserve behavior after switching from synchronous to asynchronous handling. If the queue is full, ThreadPool::submit() will fail and the incoming open RPC is cancelled with an error, causing the caller to receive a rejection. Increase this value to buffer larger bursts of concurrent open requests; reducing it tightens backpressure but may cause more rejections under load.enable_load_diagnose is true, this threshold controls whether a full profiling diagnose is requested. If the request-level RPC timeout _rpc_timeout_ms is greater than load_diagnose_rpc_timeout_profile_threshold_ms, profiling is enabled for that diagnose. For smaller _rpc_timeout_ms values, profiling is sampled once every 20 timeouts to avoid frequent heavy diagnostics for real-time/short-timeout loads. This value affects the profile flag in the PLoadDiagnoseRequest sent; stack-trace behavior is controlled separately by load_diagnose_rpc_timeout_stack_trace_threshold_ms and send timeout by load_diagnose_send_rpc_timeout_ms.OlapTableSink/NodeChannel will include stack_trace=true in a load_diagnose RPC to the target BE so the BE can return stack traces for debugging. LocalTabletsChannel::SecondaryReplicasWaiter also triggers a best-effort stack-trace diagnose from the primary if waiting for secondary replicas exceeds this interval. This behavior requires enable_load_diagnose and uses load_diagnose_send_rpc_timeout_ms for the diagnose RPC timeout; profiling is gated separately by load_diagnose_rpc_timeout_profile_threshold_ms. Lowering this value increases how aggressively stack traces are requested.load_diagnose RPCs (sent by NodeChannel/OlapTableSink when a LoadChannel brpc call times out) and for replica-status queries (used by SecondaryReplicasWaiter / LocalTabletsChannel when checking primary replica state). Choose a value high enough to allow the remote side to respond with profile or stack-trace data, but not so high that failure handling is delayed. This parameter works together with enable_load_diagnose, load_diagnose_rpc_timeout_profile_threshold_ms, and load_diagnose_rpc_timeout_stack_trace_threshold_ms which control when and what diagnostic information is requested.node_channel_set_brpc_timeout fail point is triggered. If set to a positive value, NodeChannel will set its internal _rpc_timeout_ms to this value (in milliseconds) causing open/add-chunk/cancel RPCs to use the shorter timeout and enabling simulation of brpc timeouts that produce the "[E1008]Reached timeout" error. Default (-1) disables the override. Changing this value is intended for testing and fault injection; small values may produce false timeouts and trigger load diagnostics (see enable_load_diagnose, load_diagnose_rpc_timeout_profile_threshold_ms, load_diagnose_rpc_timeout_stack_trace_threshold_ms, and load_diagnose_send_rpc_timeout_ms).-1) disables the injection. Intended for testing fault handling, timeouts, and replica synchronization behavior — do not enable in normal production workloads as it delays write completion and can trigger upstream timeouts or replica aborts.load_rowset_pool and load_segment_pool in exec_env.cpp, controlling concurrency for processing loaded rowsets and segments (e.g., decoding, indexing, writing) during streaming and batch loads. Increasing this value raises parallelism and can improve load throughput but also increases CPU, memory usage, and potential contention; decreasing it limits concurrent load processing and may reduce throughput. Tune together with load_segment_thread_pool_queue_size and streaming_load_thread_pool_idle_time_ms. Change requires BE restart.load_segment_thread_pool_num_max for their max thread count and this configuration controls how many load segment/rowset tasks can be buffered before the ThreadPool's overflow policy takes effect (further submissions may be rejected or blocked depending on the ThreadPool implementation). Increase to allow more pending load work (uses more memory and can raise latency); decrease to limit buffered load concurrency and reduce memory usage.pulsar_info->partitions exceeds this value, group creation fails with an error advising to increase max_pulsar_consumer_num_per_group on the BE or add more BEs. This limit is enforced when constructing a PulsarDataConsumerGroup and prevents a BE from hosting more than this many consumers for one routine load group. For Kafka routine load, max_consumer_num_per_group is used instead.${STARROCKS_HOME}/var/pull_loadroutine_load_kafka_timeout_second is used as the default RPC timeout (converted to milliseconds) for get_info. It is also used as the per-call consume poll timeout for the librdkafka consumer (converted to milliseconds and capped by remaining runtime). Note: the internal get_info path reduces this value to 80% before passing it to librdkafka to avoid FE-side timeout races. Set this to a value that balances timely failure reporting and sufficient time for network/broker responses; changes require a restart because the setting is not mutable.PInternalServiceImplBase::get_pulsar_info multiplies this value by 1000 to form the millisecond timeout passed to the routine load task executor methods that fetch Pulsar partition metadata and backlog. Increase to allow slower Pulsar responses at the cost of longer failure detection; decrease to fail faster on slow brokers. Analogous to routine_load_kafka_timeout_second used for Kafka.stream_load_io pool and also for load_rowset_pool and load_segment_pool. Threads in these pools are reclaimed when idle for this duration; lower values reduce idle resource usage but increase thread creation overhead, while higher values keep threads alive longer. The stream_load_io pool is used when enable_streaming_load_thread_pool is enabled.set_max_threads(INT32_MAX) and set_max_queue_size(INT32_MAX) so it is effectively unbounded to avoid deadlocks for concurrent streaming loads. A value of 0 lets the pool start with no threads and grow on demand; setting a positive value reserves that many threads at startup. This pool is used when enable_streaming_load_thread_pool is true and its idle timeout is controlled by streaming_load_thread_pool_idle_time_ms. Overall concurrency is still constrained by fragment_pool_thread_num_max and webserver_num_workers; changing this value is rarely necessary and may increase resource usage if set too high.