crates/object-capacity/README.md
rustfs-object-capacity is the core object-capacity statistics component in RustFS. It scans local data directories, maintains a capacity cache, triggers incremental refreshes after writes, and provides the admin layer with a used-capacity result that is as inexpensive and resilient as possible.
This crate is not meant to measure total filesystem capacity. Its job is to answer: "How many bytes are currently occupied by RustFS object data?" It makes practical tradeoffs between accuracy, freshness, and scan cost.
HybridCapacityManager cache with scheduled refresh, write-triggered refresh, foreground blocking refresh, and background refresh.src/lib.rs
Re-exports scan_used_capacity_disks, CapacityDiskRef, and CapacityScanSummary.src/types.rs
Defines scan input/output types, including CapacityDiskRef, the internal CapacityScanResult, and the public CapacityScanSummary.src/scan.rs
Implements directory traversal, sampled estimation, timeout/stall detection, multi-disk concurrent scans, and conversion into CapacityUpdate.src/capacity_manager.rs
Owns caching, write-frequency tracking, singleflight refresh coordination, background tasks, dirty-subset merge logic, and the global singleton manager.src/capacity_scope.rs
Tracks "which disks were touched by a write", including token-bound local scopes and the global dirty-scope registry.benches/capacity_scan.rs
Exercises the public scan API with benchmark scenarios for exact, sampled, and multi-disk scans.CapacityDiskRefpub struct CapacityDiskRef {
pub endpoint: String,
pub drive_path: String,
}
This is the minimal unit required for a scan:
endpoint is used to distinguish metrics and logs.drive_path is the local disk root path.CapacityScanSummarypub struct CapacityScanSummary {
pub used_bytes: u64,
pub file_count: usize,
pub sampled_count: usize,
pub is_estimated: bool,
pub had_partial_errors: bool,
pub scan_duration: Duration,
}
Field meanings:
used_bytes: the computed or estimated used capacity.file_count: the number of regular files traversed.sampled_count: the number of overflow files sampled after crossing the threshold.is_estimated: whether the result is estimated instead of exact.had_partial_errors: whether traversal encountered local errors while still producing a result.scan_duration: total scan duration.The directory scan lives in scan.rs::get_dir_size_async and works as follows:
tokio::task::spawn_blocking so the async runtime is not blocked.WalkDir and count only regular files.DEFAULT_MAX_FILES_THRESHOLD (default 200_000), add every file size exactly.max_files_threshold files as an exact prefix.sample_rate file after that and estimate the overflow portion from sampled bytes.stall_timeout, treat the traversal as stalled.had_partial_errors = true.buffer_unordered.4 disks.This crate is intentionally not "timeout means hard failure":
RUSTFS_CAPACITY_FOLLOW_SYMLINKS=false.3.HybridCapacityManager is the state center of this crate.
total_usedlast_updatefile_countis_estimatedDataSourcedisk_cacheDataSourceRealTime
Foreground real-time refresh when no cache exists yet.Scheduled
Background refresh triggered by the scheduled task.WriteTriggered
Refresh triggered when write frequency is high and the cache is old enough.Fallback
Fallback to externally supplied disk-used capacity when all scans fail.refresh_or_join
A singleflight foreground refresh. If another refresh is already running, callers join and wait for the shared result.spawn_refresh_if_needed
A background refresh. If another refresh is already running, it is skipped.start_background_task
Starts two background tasks:
refresh_or_join and spawn_refresh_if_needed use a watch channel to coordinate refresh cycles:
One of the main optimizations in this crate is "refresh only the disks dirtied by writes".
capacity_scope.rs provides two ways to propagate dirty disks:
record_capacity_scope(token, scope).record_write_operation_with_scope_token(Some(token)) consumes that scope and marks the disks dirty.record_global_dirty_scope(scope) records dirty disks directly in the global registry.get_dirty_disks().Refreshing only dirty disks is safe only when:
disk_cache_complete == trueIf the per-disk cache is incomplete, or there are no dirty disks, the system falls back to a full refresh.
per_disk replaces the entire disk_cache.disk_cache instead of trusting the subset sum directly.This crate provides capacity primitives only. The actual RustFS integration lives in rustfs/src/capacity/service.rs.
The high-level flow is:
init_capacity_management_for_local_disks().capacity_manager::start_background_task(...).HybridCapacityManager cache.Fallback.crates/ecstore/src/set_disk.rs is responsible for recording capacity scopes during object writes, heal operations, data movement, and related flows, so this crate can learn which disks were affected.
This is useful for benchmarks, operational tooling, or isolated validation.
use rustfs_object_capacity::{CapacityDiskRef, scan_used_capacity_disks};
let disks = vec![
CapacityDiskRef {
endpoint: "node-a".to_string(),
drive_path: "/data/disk1".to_string(),
},
];
let summary = scan_used_capacity_disks(&disks).await?;
println!(
"used={} files={} estimated={}",
summary.used_bytes, summary.file_count, summary.is_estimated
);
# Ok::<(), Box<dyn std::error::Error>>(())
This is useful for in-service caching and refresh orchestration.
use rustfs_object_capacity::capacity_manager::{DataSource, get_capacity_manager};
let manager = get_capacity_manager();
if let Some(cached) = manager.get_capacity().await {
println!("cached bytes={}", cached.total_used);
}
manager.record_write_operation().await;
let _ = manager
.refresh_or_join(DataSource::Scheduled, || async {
rustfs_object_capacity::scan::refresh_capacity_with_scope(
vec![rustfs_object_capacity::CapacityDiskRef {
endpoint: "node-a".to_string(),
drive_path: "/data/disk1".to_string(),
}],
false,
)
.await
})
.await;
use rustfs_object_capacity::capacity_scope::{
CapacityScope, CapacityScopeDisk, record_capacity_scope,
};
use rustfs_object_capacity::capacity_manager::get_capacity_manager;
use uuid::Uuid;
let token = Uuid::new_v4();
record_capacity_scope(
token,
CapacityScope {
disks: vec![CapacityScopeDisk {
endpoint: "node-a".to_string(),
drive_path: "/data/disk1".to_string(),
}],
},
);
get_capacity_manager()
.record_write_operation_with_scope_token(Some(token))
.await;
The configuration constants are defined in crates/config/src/constants/capacity.rs.
| Environment Variable | Default | Description |
|---|---|---|
RUSTFS_CAPACITY_SCHEDULED_INTERVAL | 120s | Scheduled refresh interval |
RUSTFS_CAPACITY_WRITE_TRIGGER_DELAY | 5s | Debounce delay after writes |
RUSTFS_CAPACITY_WRITE_FREQUENCY_THRESHOLD | 5 | Recent 60-second write-frequency threshold |
RUSTFS_CAPACITY_FAST_UPDATE_THRESHOLD | 30s | Cache age required before fast refresh is considered |
RUSTFS_CAPACITY_MAX_FILES_THRESHOLD | 200000 | Exact-count file threshold |
RUSTFS_CAPACITY_STAT_TIMEOUT | 3s | Base scan timeout |
RUSTFS_CAPACITY_SAMPLE_RATE | 200 | Overflow-file sampling interval |
RUSTFS_CAPACITY_METRICS_INTERVAL | 600s | Runtime summary emission interval |
RUSTFS_CAPACITY_FOLLOW_SYMLINKS | false | Whether to follow symlinks |
RUSTFS_CAPACITY_MAX_SYMLINK_DEPTH | 3 | Maximum symlink follow depth |
RUSTFS_CAPACITY_ENABLE_DYNAMIC_TIMEOUT | true | Whether to enable dynamic timeout scaling |
RUSTFS_CAPACITY_MIN_TIMEOUT | 2s | Dynamic-timeout lower bound |
RUSTFS_CAPACITY_MAX_TIMEOUT | 15s | Dynamic-timeout upper bound |
RUSTFS_CAPACITY_STALL_TIMEOUT | 20s | Stall-detection threshold |
In non-test builds, configuration is cached behind OnceLock:
RUSTFS_CAPACITY_* during runtime usually does not take effect immediately.This crate reports multiple metric families to rustfs-io-metrics::capacity_metrics, including:
So this crate is both a capacity-calculation component and an important producer of runtime observability data.
Run the benchmark suite with:
cargo bench -p rustfs-object-capacity --bench capacity_scan
Current benchmark scenarios:
capacity_scan_exact
Single-disk exact scan over 10k files.capacity_scan_sampled
Single-disk scan over 202,048 files that triggers sampled estimation.capacity_scan_multi_disk
Four-disk exact scan with mixed directory sizes.du.had_partial_errors.