docs/research/rf-topological-sensing/05-sublinear-mincut-algorithms.md
Date: 2026-03-08 Context: RuVector v2.0.4 / RuvSense multistatic mesh — 16 ESP32 nodes, 120 link edges, 20 Hz update rate Scope: Algorithmic foundations for maintaining minimum cuts on dynamic RF link graphs under real-time constraints
A 16-node ESP32 multistatic mesh generates a complete weighted graph on C(16,2) = 120 edges, where each edge weight encodes the RF channel state information (CSI) attenuation or coherence between two nodes. Human bodies, moving objects, and environmental changes continuously perturb these weights. The minimum cut of this graph partitions the sensing field into regions of minimal RF coupling — directly useful for person segmentation, occupancy counting, and anomaly detection.
At 20 Hz update rate, each mincut computation has a budget of 50 ms wall-clock time. On a resource-constrained coordinator (ESP32-S3 at 240 MHz or a modest ARM host), classical algorithms are either too slow or carry too much overhead. This document surveys the algorithmic landscape from classical exact methods through sublinear approximations, dynamic maintenance, streaming, and sparsification — evaluating each for applicability to the RuVector RF sensing pipeline.
Throughout, V = 16 and E = 120 (complete graph). While these are small by general graph algorithm standards, the constraint is not problem size but update frequency and platform limitations. The goal is not asymptotic superiority but practical per-frame latency under 2 ms on the target hardware.
Given an undirected weighted graph G = (V, E, w) with w: E -> R+, the global minimum cut is a partition of V into two non-empty sets (S, V\S) minimizing the total weight of edges crossing the partition:
mincut(G) = min_{S subset V, S != empty, S != V} sum_{(u,v) in E, u in S, v in V\S} w(u,v)
For RF sensing, w(u,v) typically represents the CSI coherence or signal attenuation between nodes u and v. A minimum cut identifies the partition where RF coupling is weakest — corresponding to physical obstructions (human bodies, walls, large objects) that attenuate the RF field.
The Stoer-Wagner algorithm computes exact global minimum cut in O(VE + V^2 log V) time using a sequence of V-1 minimum s-t cut computations, each performed via a maximum adjacency ordering.
Procedure:
Complexity for our graph:
Practical assessment: For V = 16, Stoer-Wagner executes 15 phases, each scanning at most 120 edges. Total work is roughly 1,800 edge scans plus priority queue operations. On modern hardware this completes in microseconds. On ESP32 at 240 MHz, estimated wall time is 50-200 us — well within budget.
This is the baseline. The algorithm is exact, deterministic, and simple to implement. For V = 16, classical complexity is not actually the bottleneck.
Karger's algorithm randomly contracts edges, merging endpoints, until two vertices remain. The surviving edges form a cut. Repeating O(V^2 log V) times yields the minimum cut with high probability.
Single contraction round: O(E) time using union-find. Total for high-probability success: O(V^2 log V * E) = O(V^2 E log V). With the improved implementation: O(V^2 log^3 V).
For our graph:
Practical assessment: Karger is elegant but the constant factors from repeated trials make it slower than Stoer-Wagner for small V. Its value emerges at scale (V > 1000) where the randomized approach avoids worst-case deterministic behavior.
Karger-Stein improves on Karger by contracting only to V/sqrt(2) vertices, then recursing on two independent copies. This reduces the repetition count from O(V^2) to O(V^2 / 2^depth), yielding O(V^2 log V) total time.
For our graph:
Practical assessment: At V = 16, the recursion tree has ~4 levels with branching factor 2, yielding ~16 leaf problems each of size ~4. Total work is dominated by the initial contraction steps. Fast in practice but adds implementation complexity over Stoer-Wagner for no real benefit at this scale.
For a static 16-node graph, all classical algorithms complete in microseconds. The real challenge is not single-computation cost but:
The subsequent sections address these challenges with algorithms designed for dynamic, streaming, and approximate settings.
A sublinear-time algorithm runs in o(m) time, where m = |E|. For our graph with m = 120, "sublinear in m" means fewer than 120 edge reads. This is useful when:
The simplest sublinear approach: sample k edges uniformly at random, compute their total weight, and estimate the mincut value.
Karger's sampling theorem (1994): If we sample each edge independently with probability p = O(log V / (epsilon^2 * lambda)), where lambda is the minimum cut value, then with high probability every cut in the sampled graph has value within (1 +/- epsilon) of its value in the original graph, after scaling by 1/p.
For our setting:
This achieves a (1 +/- 0.1)-approximation by reading only 1/3 of the edges.
Algorithm:
1. Sample each edge with probability p
2. Run exact mincut on the sampled graph (Stoer-Wagner)
3. Scale result by 1/p
The key insight: Stoer-Wagner on a sparse sample with ~40 edges and 16 vertices runs in O(16 * 40) = O(640) operations — faster than on the full graph, and with provable approximation guarantees.
A cut sparsifier H of G is a sparse graph on the same vertex set where every cut value is preserved within (1 +/- epsilon). Benczur and Karger (1996) showed that O(V log V / epsilon^2) edges suffice.
For V = 16, epsilon = 0.1: O(16 * 4 / 0.01) = O(6400) edges. This exceeds our actual edge count of 120, so sparsification provides no benefit at this scale. However, it becomes critical for:
Spielman and Srivastava (2011) showed that spectrally sparsifying the graph Laplacian preserves all cut values. Their algorithm:
Result: O(V log V / epsilon^2) edges suffice, same as combinatorial sparsification, but the spectral guarantee is stronger — it preserves the entire spectrum of the Laplacian, not just cut values.
For RF sensing: The graph Laplacian eigenvectors correspond to spatial
modes of the RF field. Spectral sparsification preserves these modes, which
is useful beyond mincut — it preserves the spatial structure needed for
tomography and field modeling (RuvSense field_model.rs).
Recent work by Rubinstein, Schramm, and Weinberg (2018) achieves O(V polylog V)-time algorithms that query the graph adjacency/weight oracle rather than reading all edges. For V = 16, this gives O(16 * 16) = O(256) queries — a 2x reduction over reading all 120 edges (not useful at this scale, but relevant at V = 256 where it reduces from 32640 to ~4000 queries).
In the dynamic setting, the graph undergoes edge insertions, deletions, and weight updates, and we must maintain the minimum cut value (and optionally the cut itself) after each update.
For RF sensing, every CSI frame update changes all 120 edge weights simultaneously. This is a batch-dynamic setting: 120 updates arrive together, then we query the mincut.
Thorup showed that edge connectivity (unweighted mincut) can be maintained in O(log V * (log log V)^2) amortized time per edge update. For weighted graphs, this extends to O(polylog V) time per update with some caveats.
For our setting:
The savings are modest at V = 16 but the amortized bound means some frames are nearly free (when the mincut does not change) while others pay more.
Goranci, Henzinger, and Thorup (2018) maintain a (1+epsilon)-approximate minimum cut under edge insertions and deletions in O(polylog(V)/epsilon^2) amortized update time.
Key ideas:
For our setting:
This reveals an important practical point: dynamic algorithms have excellent asymptotic behavior but carry large constant factors that dominate at small V. For V = 16, full recomputation with Stoer-Wagner is faster than any known dynamic algorithm.
Dynamic algorithms become beneficial when:
For our RF mesh, a practical middle ground is:
Threshold-filtered updates: Only re-process edges whose weight changed by more than delta from the previous frame. If the RF field is relatively stable (people move slowly relative to 20 Hz), most edges change minimally. If only 10-20 edges exceed the delta threshold per frame, a partial Stoer-Wagner restart or local repair becomes attractive.
Algorithm: Lazy-Mincut-Update
Input: Previous mincut (S*, V\S*), new edge weights w'
Output: Updated mincut
1. Compute delta = sum of |w'(e) - w(e)| for edges crossing (S*, V\S*)
2. If delta < epsilon * mincut_value:
Return (S*, V\S*) unchanged // Cut value changed negligibly
3. Compute crossing_weight = sum w'(e) for edges crossing (S*, V\S*)
4. If crossing_weight == mincut_value +/- epsilon:
Update mincut_value = crossing_weight // Same cut, adjusted value
Return (S*, V\S*)
5. Else:
Run full Stoer-Wagner on G' = (V, E, w') // Recompute
Return new mincut
In practice, steps 1-4 handle >90% of frames (the minimum cut partition is spatially stable — people do not teleport), and full recomputation is triggered only when someone crosses the cut boundary. This reduces average per-frame cost to O(E) = O(120) for crossing-weight evaluation plus occasional O(VE) recomputation.
In the streaming model, edges arrive one at a time (or in a stream from multiple ESP32 nodes), and we must estimate the mincut using limited working memory — ideally O(V polylog V) space rather than O(V^2).
This is relevant when:
Ahn, Guha, and McGregor (2012) showed that a single-pass streaming algorithm can compute a (1+epsilon)-approximate mincut using O(V polylog V / epsilon^2) space by maintaining linear sketches of the graph.
Sketch construction:
For our setting:
With k passes over the stream, accuracy improves. Specifically, O(log V) passes suffice to compute exact mincut with O(V polylog V) space.
Practical algorithm (2-pass):
Pass 1: Build a cut sparsifier by sampling edges with probability
proportional to estimated effective resistance.
Pass 2: Refine the sparsifier using importance sampling based on
first-pass estimates.
Result: (1+epsilon)-approximate mincut from the refined sparsifier.
For our TDM protocol, each complete CSI scan across all 16 nodes constitutes one "pass." A two-pass approach means using two consecutive TDM cycles (100 ms total at 20 Hz) to build and refine the sparsifier — acceptable if we can tolerate 100 ms latency on the initial estimate.
In the turnstile model, edge weights can increase and decrease over time. This matches our RF sensing setting where CSI coherence fluctuates.
Ahn, Guha, and McGregor (2013) extended their sketching approach to the turnstile model. The key: L0-sampling sketches allow recovering edges from the sketch difference, enabling dynamic cut estimation.
Space complexity: O(V * polylog(V) / epsilon^2) — same as the insertion-only case.
For RF sensing: This means we can maintain a running sketch that processes CSI weight updates as they arrive from each node, without needing to store the full graph. The sketch naturally accommodates the continuous weight fluctuations of the RF field.
ESP32 Node i:
- Computes CSI for links to all other nodes
- Constructs local sketch S_i of incident edges
- Transmits S_i to coordinator (compact: ~400 bytes)
Coordinator:
- Receives S_1, ..., S_16
- Merges sketches: S = merge(S_1, ..., S_16)
- Extracts approximate mincut from S
- Latency: dominated by network round-trip, not computation
This architecture distributes the sketching computation across nodes, reducing coordinator load and enabling approximate mincut estimation even when some node reports are delayed or missing.
Theorem: For any undirected weighted graph G with V vertices, there exists a subgraph H with O(V log V / epsilon^2) edges such that for every cut (S, V\S):
(1 - epsilon) * w_G(S, V\S) <= w_H(S, V\S) <= (1 + epsilon) * w_G(S, V\S)
Construction algorithm:
Computing strong connectivity: This requires O(VE) time using max-flow computations — as expensive as solving mincut directly. However, approximate strong connectivity can be computed in O(E log^3 V) time using the sparsification itself (bootstrapping).
For our 16-node RF graph:
Static sparsification is unnecessary since E = 120 is already small. However, sparsification is useful as a noise filter:
Practical algorithm for RF:
1. Compute approximate connectivity for each edge using 2-3 rounds
of random spanning tree sampling.
2. Mark edges with connectivity below threshold as "unreliable."
3. Run mincut on the subgraph of reliable edges.
4. If mincut uses an unreliable edge, recompute on full graph.
This typically reduces effective edge count from 120 to 60-80 edges, providing a 1.5-2x speedup on Stoer-Wagner.
When edge weights change (every CSI frame), the sparsifier must be updated. Naive recomputation defeats the purpose. Efficient approaches:
Incremental update (Abraham, Durfee, et al. 2016):
Batch update strategy for RF:
Every frame:
1. Receive new edge weights w' from CSI processing.
2. For each edge e in sparsifier:
a. If |w'(e) - w(e)| / w(e) > epsilon: mark for re-evaluation.
3. Re-evaluate marked edges (update sampling decision).
4. Run mincut on updated sparsifier.
Expected re-evaluations per frame: 10-30 edges (most weights change incrementally). Mincut on sparsifier with ~70 edges and 16 vertices: O(16 * 70) = O(1120) operations.
The graph Laplacian L_G of the RF mesh encodes the complete spatial coupling structure. Its eigenvalues directly relate to cut values:
Spectral sparsification preserves all eigenvalues, meaning:
(1-epsilon) * L_G <= L_H <= (1+epsilon) * L_G (Loewner order)
This is strictly stronger than cut sparsification and preserves:
field_model.rs)pose_tracker.rs)gesture.rs)For the RuvSense pipeline, a spectral sparsifier serves double duty: mincut computation and spatial field modeling.
Classical mincut algorithms are global — they examine the entire graph. Local partitioning algorithms find cuts by exploring only a small region of the graph, running in time proportional to the size of the smaller side of the cut rather than the full graph.
For RF sensing, this is valuable when we want to detect a localized obstruction (a person standing in one area) without scanning the entire 120-edge graph.
Spielman and Teng introduced local graph partitioning via truncated random walks. Their algorithm:
Complexity: O(|S| * polylog V / phi), where phi is the target conductance. The algorithm never examines vertices far from the seed.
For RF sensing:
Andersen, Chung, and Lang (2006) refined local partitioning using personalized PageRank (PPR). The algorithm:
ApproximatePPR(seed, alpha, epsilon):
p = zero vector // PPR estimate
r = indicator(seed) // residual
While exists v with r(v) / degree(v) > epsilon:
Push(v):
p(v) += alpha * r(v)
For each neighbor u of v:
r(u) += (1 - alpha) * r(v) / (2 * degree(v))
r(v) = (1 - alpha) * r(v) / 2
Return p
Properties:
For RF sensing:
The pose_tracker.rs module maintains a Kalman-filtered estimate of
person positions. When the tracker predicts a person near certain nodes,
local partitioning can quickly confirm or refine the detection:
1. Tracker predicts person near nodes {5, 9, 12}.
2. Run PPR from each predicted node with alpha = 0.3.
3. Sweep-cut the PPR vectors to find local cuts.
4. If local cut conductance < threshold:
Person confirmed at predicted location.
5. Feed cut boundary back to tracker as measurement update.
This creates a feedback loop where the tracker guides the graph algorithm and the graph algorithm refines the tracker — running in O(1/alpha/epsilon) time rather than O(VE) for full mincut.
For multiple people, run local partitioning from multiple seeds simultaneously. With k people and V = 16 nodes, each person's local partition explores ~4-6 nodes, totaling ~O(k * 6 * degree) = O(k * 90) work. For k = 3 people, this is O(270) — less than half the cost of full Stoer-Wagner.
The challenge is handling overlapping partitions. Two approaches:
Monte Carlo algorithms return an answer that is correct with probability
= 1 - delta. Running time is fixed, accuracy is probabilistic.
Las Vegas algorithms always return the correct answer. Running time is probabilistic (expected polynomial), correctness is guaranteed.
For safety-critical RF sensing (mass casualty assessment via wifi-densepose-mat),
Las Vegas algorithms are preferred: the mincut answer is always correct, even
if occasionally slow.
Karger's contraction algorithm is Monte Carlo: a single trial finds the mincut with probability >= 2/V^2 = 2/256 ~ 0.78%. Running O(V^2 log V) trials boosts success probability to 1 - 1/V.
Amplification for reliability:
The Karger-Stein recursive contraction can be enhanced with early termination heuristics:
Karger-Stein-ET(G, best_known_cut):
If |V(G)| <= 6:
Return exact mincut via brute force
Contract G to G' with |V'| = |V| / sqrt(2) + 1
If crossing_edges(G') > best_known_cut * (1 + epsilon):
Prune this branch // Cannot improve on best known
Recurse on two independent copies of G'
Return minimum of recursive results
The pruning step eliminates branches early, reducing expected work. For our graph, this rarely helps (V = 16 is already small), but for V > 100 it can reduce the constant factor by 2-5x.
Converting Karger's algorithm to Las Vegas: run Karger until a cut is found, then verify it by computing max-flow between one pair of vertices separated by the cut. If max-flow equals the cut value, the cut is minimum (by max-flow min-cut theorem). Otherwise, continue.
Verification cost: O(V * E) for a single max-flow computation = O(1920). Expected number of verifications before success: O(V^2 / 2) = O(128). This is expensive and not recommended for real-time use.
Better approach: Use Stoer-Wagner (deterministic, always correct) and reserve randomized methods for approximate or multi-cut computations.
For MAT (Mass Casualty Assessment Tool, wifi-densepose-mat), mincut errors
could mean missing a survivor. Reliability requirements:
| Application | Max failure probability | Algorithm class |
|---|---|---|
| Occupancy counting | 10^-2 | Monte Carlo, any |
| Person segmentation | 10^-4 | Monte Carlo, amplified |
| Vital sign isolation | 10^-5 | Las Vegas or deterministic |
| MAT survivor detection | 10^-8 | Deterministic only |
Recommendation: Use deterministic Stoer-Wagner for all safety-critical applications. Use Monte Carlo approximations only for non-critical tasks like gesture recognition or activity classification where a missed frame is acceptable.
Beyond 2-way mincut, k-way partitioning (separating k people) can use randomized LP rounding:
For k = 3 people, the approximation ratio is 4/3 ~ 1.33. For k = 5, it is 8/5 = 1.6. This is practical for real-time person segmentation with known person count.
The implementation targets the ruvector-mincut crate, which already
provides a DynamicPersonMatcher in metrics.rs. The mincut algorithm
should integrate cleanly with existing infrastructure.
Key constraints:
no_std with optional alloc for embedded targets.std::simd or packed_simd2) for batch edge weight updates.Fixed-size adjacency matrix:
/// Adjacency matrix for a complete graph with compile-time size.
/// V = 16 nodes, stored as upper triangular (120 entries).
pub struct RfGraph<const V: usize> {
/// Edge weights stored in upper-triangular order.
/// Index for edge (i, j) where i < j: i * (2*V - i - 1) / 2 + (j - i - 1)
weights: [f32; V * (V - 1) / 2],
/// Cached mincut value (invalidated on weight update).
cached_mincut: Option<f32>,
/// Cached mincut partition (bitvector: bit i = 1 means node i in set S).
cached_partition: Option<u32>,
}
For V = 16, this uses 120 * 4 = 480 bytes for weights, plus 8 bytes for cached values. Total: 488 bytes — fits in a single cache line pair.
Stoer-Wagner state:
/// Reusable state for Stoer-Wagner algorithm.
/// Pre-allocated to avoid per-call allocation.
struct StoerWagnerState<const V: usize> {
/// Merged vertex sets (union-find).
parent: [u16; V],
/// Key values for maximum adjacency ordering.
key: [f32; V],
/// Whether vertex is in the current working set.
active: [bool; V],
/// Best cut found so far.
best_cut: f32,
/// Best partition found so far.
best_partition: u32,
}
impl<const V: usize> RfGraph<V> {
/// Compute exact global minimum cut using Stoer-Wagner.
/// Time: O(V^3) for dense graphs (V^2 phases, V work per phase).
/// For V=16: ~4000 operations, estimated 10-50 us.
pub fn minimum_cut(&mut self) -> (f32, u32) {
if let Some(val) = self.cached_mincut {
return (val, self.cached_partition.unwrap());
}
let mut state = StoerWagnerState::new();
let mut merged: [[f32; V]; V] = self.build_adjacency_matrix();
let mut best_cut = f32::MAX;
let mut best_partition: u32 = 0;
for phase in 0..(V - 1) {
let (s, t, cut_weight) = self.maximum_adjacency_phase(
&mut merged, &mut state, V - phase
);
if cut_weight < best_cut {
best_cut = cut_weight;
best_partition = state.current_partition(t);
}
// Merge s and t
self.merge_vertices(&mut merged, s, t);
}
self.cached_mincut = Some(best_cut);
self.cached_partition = Some(best_partition);
(best_cut, best_partition)
}
}
impl<const V: usize> RfGraph<V> {
/// Update edge weight and determine if mincut needs recomputation.
/// Returns true if the cached mincut is still valid.
pub fn update_edge(&mut self, i: usize, j: usize, new_weight: f32) -> bool {
let idx = self.edge_index(i, j);
let old_weight = self.weights[idx];
self.weights[idx] = new_weight;
// Check if this edge crosses the cached partition
if let Some(partition) = self.cached_partition {
let i_side = (partition >> i) & 1;
let j_side = (partition >> j) & 1;
if i_side != j_side {
// Edge crosses the cut — must update cut value
if let Some(ref mut cut_val) = self.cached_mincut {
*cut_val += new_weight - old_weight;
// Cut value changed but partition might still be optimal
// unless the new cut value exceeds some other cut
// Conservative: invalidate if change > epsilon * cut_val
if (new_weight - old_weight).abs() > 0.1 * *cut_val {
self.cached_mincut = None;
self.cached_partition = None;
return false;
}
return true;
}
}
// Edge does not cross the cut — partition still valid,
// but cut value might no longer be minimum
// Heuristic: if weight decreased significantly, invalidate
if new_weight < old_weight * 0.8 {
self.cached_mincut = None;
self.cached_partition = None;
return false;
}
return true;
}
false
}
/// Batch update all edges from new CSI frame.
/// Uses lazy recomputation: only recomputes if cached cut is invalidated.
pub fn update_frame(&mut self, new_weights: &[f32; V * (V - 1) / 2]) {
let mut needs_recompute = false;
for idx in 0..new_weights.len() {
let old = self.weights[idx];
let new_w = new_weights[idx];
self.weights[idx] = new_w;
if !needs_recompute {
if let Some(partition) = self.cached_partition {
let (i, j) = self.edge_vertices(idx);
let crosses = ((partition >> i) ^ (partition >> j)) & 1 == 1;
if crosses && (new_w - old).abs() > 0.05 * self.cached_mincut.unwrap_or(1.0) {
needs_recompute = true;
}
if !crosses && new_w < old * 0.7 {
needs_recompute = true;
}
} else {
needs_recompute = true;
}
}
}
if needs_recompute {
self.cached_mincut = None;
self.cached_partition = None;
}
}
}
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
impl<const V: usize> RfGraph<V> {
/// Update 4 edge weights at once using SSE.
/// Processes 120 edges in 30 SIMD iterations.
#[cfg(target_arch = "x86_64")]
pub unsafe fn update_weights_simd(
&mut self,
new_weights: &[f32; V * (V - 1) / 2]
) {
let n = V * (V - 1) / 2;
let mut i = 0;
while i + 4 <= n {
let old = _mm_loadu_ps(self.weights.as_ptr().add(i));
let new_v = _mm_loadu_ps(new_weights.as_ptr().add(i));
_mm_storeu_ps(self.weights.as_mut_ptr().add(i), new_v);
// Compute absolute difference for cache invalidation check
let diff = _mm_sub_ps(new_v, old);
let abs_diff = _mm_andnot_ps(_mm_set1_ps(-0.0), diff);
let threshold = _mm_set1_ps(0.05);
let exceeds = _mm_cmpgt_ps(abs_diff, threshold);
if _mm_movemask_ps(exceeds) != 0 {
self.cached_mincut = None;
self.cached_partition = None;
}
i += 4;
}
// Handle remaining edges
while i < n {
self.weights[i] = new_weights[i];
i += 1;
}
}
}
For larger deployments (V > 32), Stoer-Wagner's maximum adjacency ordering can be parallelized:
#[cfg(feature = "parallel")]
use rayon::prelude::*;
impl<const V: usize> RfGraph<V>
where
[(); V * (V - 1) / 2]:,
{
/// Parallel maximum adjacency ordering phase.
/// Splits key-value computation across threads.
#[cfg(feature = "parallel")]
fn parallel_max_adjacency_phase(
&self,
merged: &[[f32; V]; V],
active: &[bool; V],
n_active: usize,
) -> (usize, usize, f32) {
let mut in_set = [false; V];
let mut key = [0.0f32; V];
let mut order = Vec::with_capacity(n_active);
// Start from first active vertex
let start = active.iter().position(|&a| a).unwrap();
in_set[start] = true;
order.push(start);
// Update keys in parallel
for _ in 1..n_active {
// Parallel key update: each active vertex not in set
// computes its key as sum of weights to set vertices
let last_added = *order.last().unwrap();
(0..V)
.into_par_iter()
.filter(|&v| active[v] && !in_set[v])
.for_each(|v| {
// Safety: each thread writes to distinct key[v]
unsafe {
let key_ptr = &key[v] as *const f32 as *mut f32;
*key_ptr += merged[v][last_added];
}
});
// Find max key (sequential — V is small)
let next = (0..V)
.filter(|&v| active[v] && !in_set[v])
.max_by(|&a, &b| key[a].partial_cmp(&key[b]).unwrap())
.unwrap();
in_set[next] = true;
order.push(next);
}
let t = order[n_active - 1];
let s = order[n_active - 2];
let cut_weight = key[t];
(s, t, cut_weight)
}
}
The DynamicPersonMatcher in ruvector-mincut/src/metrics.rs uses mincut
for person segmentation. Integration:
use wifi_densepose_signal::rf_graph::RfGraph;
impl DynamicPersonMatcher {
/// Update the RF graph with new CSI data and detect person boundaries.
pub fn update_with_csi_frame(
&mut self,
csi_weights: &[f32; 120], // 16-node complete graph
) -> Vec<PersonSegment> {
// Update graph weights (lazy invalidation)
self.rf_graph.update_frame(csi_weights);
// Get current minimum cut
let (cut_value, partition) = self.rf_graph.minimum_cut();
// Convert partition bitmask to person segments
let segments = self.partition_to_segments(partition, cut_value);
// Feed segments to Kalman tracker
for segment in &segments {
self.pose_tracker.update_measurement(segment);
}
segments
}
/// Hierarchical multi-cut for multiple people.
/// Recursively bisects the graph until all segments have
/// internal connectivity above threshold.
pub fn hierarchical_cut(
&mut self,
max_people: usize,
) -> Vec<PersonSegment> {
let mut segments = vec![Segment::all(16)];
let mut result = Vec::new();
while let Some(segment) = segments.pop() {
if segment.size() <= 2 || result.len() >= max_people {
result.push(segment);
continue;
}
// Build subgraph for this segment
let subgraph = self.rf_graph.subgraph(&segment.nodes);
let (cut_value, partition) = subgraph.minimum_cut();
// Normalized cut threshold: cut_value / min(|S|, |V\S|)
let smaller_side = partition.count_ones().min(
(segment.size() as u32 - partition.count_ones())
);
let normalized_cut = cut_value / smaller_side as f32;
if normalized_cut > self.connectivity_threshold {
// Segment is internally well-connected — one person or empty
result.push(segment);
} else {
// Split into two sub-segments and continue
let (left, right) = segment.split(partition);
segments.push(left);
segments.push(right);
}
}
result
}
}
| Operation | V=16 | V=32 | V=64 | V=128 |
|---|---|---|---|---|
| Stoer-Wagner (full) | 15 us | 120 us | 1.2 ms | 15 ms |
| Lazy update (no recompute) | 0.5 us | 1 us | 3 us | 10 us |
| Lazy update (recompute) | 15 us | 120 us | 1.2 ms | 15 ms |
| PPR local cut | 5 us | 15 us | 40 us | 100 us |
| SIMD batch weight update | 0.2 us | 0.8 us | 3 us | 12 us |
| Hierarchical multi-cut (k=3) | 40 us | 300 us | 3 ms | 35 ms |
20 Hz budget: 50 ms per frame. At V = 16, all operations fit comfortably within budget. At V = 128, full hierarchical multi-cut approaches the budget and would benefit from the streaming/approximate methods described in earlier sections.
#[cfg(test)]
mod tests {
use super::*;
/// Verify Stoer-Wagner on known graph with documented mincut.
#[test]
fn test_stoer_wagner_known_graph() {
let mut graph = RfGraph::<8>::from_edges(&[
(0, 1, 2.0), (0, 4, 3.0), (1, 2, 3.0), (1, 4, 2.0),
(1, 5, 2.0), (2, 3, 4.0), (2, 6, 2.0), (3, 6, 2.0),
(3, 7, 2.0), (4, 5, 3.0), (5, 6, 1.0), (6, 7, 3.0),
]);
let (cut_val, _) = graph.minimum_cut();
assert!((cut_val - 4.0).abs() < 1e-6);
}
/// Verify lazy update correctness: cache invalidation triggers
/// recomputation when crossing-edge weight changes significantly.
#[test]
fn test_lazy_update_invalidation() { /* ... */ }
/// Verify SIMD and scalar paths produce identical results.
#[test]
fn test_simd_scalar_equivalence() { /* ... */ }
/// Benchmark: 10,000 frames at 20 Hz with random weight perturbations.
/// Verify average per-frame time < 100 us for V=16.
#[test]
fn bench_20hz_sustained() { /* ... */ }
/// Property test: mincut value <= minimum vertex weighted degree.
#[test]
fn prop_mincut_bounded_by_min_degree() { /* ... */ }
}
| Criterion | Stoer-Wagner | Karger-Stein | Dynamic (Thorup) | Streaming | Local PPR | Lazy Hybrid |
|---|---|---|---|---|---|---|
| Exact result | Yes | Prob. | No (approx) | No (approx) | No (approx) | Heuristic |
| V=16 latency | 15 us | 25 us | 120 us | 50 us | 5 us | 1-15 us |
| V=128 latency | 15 ms | 8 ms | 2 ms | 1 ms | 100 us | 0.1-15 ms |
| Incremental | No | No | Yes | Yes | Yes | Yes |
| Safety-critical | Yes | No | No | No | No | Heuristic |
| Implementation complexity | Low | Medium | High | High | Medium | Low |
Primary path (V <= 32):
Secondary path (V > 32 or multi-cut needed):
Safety-critical path (MAT/vital signs):
Distributed mincut: Each ESP32 node computes a sketch of its local view. The coordinator merges sketches for approximate global mincut. Reduces coordinator bottleneck and enables graceful degradation.
GPU-accelerated mincut: For cloud-hosted deployments, batch multiple frames into a GPU kernel for parallel Stoer-Wagner computation across time windows.
Learning-augmented algorithms: Train a small neural network to predict the mincut partition from CSI features, using exact Stoer-Wagner as ground truth. The network predicts in O(1) time; Stoer-Wagner verifies periodically.
Hypergraph mincut: Model multi-body RF interactions (where three or more nodes are simultaneously affected) as hyperedges. Hypergraph mincut algorithms capture higher-order spatial structure.