docs/advanced/parallel-reindexing.md
DataHub's parallel reindexing feature optimizes ElasticSearch index rebuild operations during system upgrades by allowing multiple indices to be reindexed concurrently while maintaining cluster stability through adaptive health monitoring and cost-based tier classification. This significantly reduces upgrade downtime for large deployments.
Key Benefits:
The parallel reindex system consists of three major components:
Phase 1: Index Classification
Cost Formula: (documentCount × primaryShards) / dataNodeCount
Example: 1M docs, 5 shards, 3 data nodes
Cost = (1,000,000 × 5) / 3 = 1,666,667
Classification:
- NORMAL tier (cost < threshold): Can run in parallel (2-4 concurrent)
- LARGE tier (cost ≥ threshold): Must run serially (1-2 concurrent)
Phase 2: Tier-Based Execution
The orchestrator runs tiers sequentially:
Queue (99 system indices)
↓
LARGE TIER (40-50 indices) NORMAL TIER (50 indices)
├─ Index 1 (serial) ├─ Index 1 ─┐
├─ Index 2 (serial) ────────→ ├─ Index 2 ─┤ Parallel (max 4)
├─ Index 3 (serial) ├─ Index 3 ─┤
└─ Index 4 ─┘
Phase 3: Adaptive Health Monitoring
Every 10-30 seconds, the circuit breaker evaluates cluster health:
Health Check
↓
┌─────────────────────────────────┐
│ RED State? │
│ - ES status RED, OR │
│ - Heap ≥ 90%, OR │
│ - Write rejections ≥ 50% │
└─────────────────────────────────┘
│ YES │ NO
↓ ↓
PAUSE Check YELLOW
submissions
│ ├─ ES YELLOW, OR
│ ├─ Heap 75-90%, OR
│ ├─ Write rejections elevated
│ │
│ ├─ YES → YELLOW (throttle)
│ │ RPS: 500 req/s
│ │ Refresh: 60s
│ │
│ └─ NO → GREEN (full speed)
│ RPS: unlimited
│ Refresh: disabled (-1)
│
└─→ Rethrottle all active tasks
Update refresh_interval
Apply new RPS to ES
Phase 4: Dynamic Rethrottling
When health state changes, ALL active tasks are immediately rethrottled in parallel without restarting:
Current State: GREEN → Health degrades → YELLOW
Action:
1. Calculate new RPS: 500 req/s (vs unlimited before)
2. Parallel rethrottle 4 active tasks via ES API:
POST /_reindex/{task1}/_rethrottle?requests_per_second=500
POST /_reindex/{task2}/_rethrottle?requests_per_second=500
... (all 4 in parallel)
3. Update destination index refresh_interval: 60s
4. Continue monitoring, no task restart needed
┌─────────────────────────────────────────────────────────┐
│ DataHub System Upgrade Triggered │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Step 1: Index Classification (IndexCostEstimator) │
│ - Calculate cost for each of 99 system indices │
│ - Split into LARGE (serial) and NORMAL (parallel) │
│ - Separate queues by tier │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Step 2: Execute LARGE Tier Sequentially │
│ - Submit up to 2 indices at a time │
│ - Monitor health every 30 seconds │
│ - Rethrottle active tasks if health changes │
│ - Pause submissions if cluster enters RED state │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Step 3: Execute NORMAL Tier in Parallel │
│ - Submit up to 4 indices at a time │
│ - Monitor health every 30 seconds │
│ - Rethrottle active tasks if health changes │
│ - Pause submissions if cluster enters RED state │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Step 4: Finalization for Each Completed Task │
│ - Verify document count matches source │
│ - Swap alias to new index │
│ - Restore settings (replicas, refresh_interval) │
│ - Delete old temporary index │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Reindex Complete: All 99 indices ready for production │
└─────────────────────────────────────────────────────────┘
The circuit breaker prevents rapid state oscillations by requiring stability before transitioning:
Current State: GREEN
Stability required: 30 seconds
Timeline:
t=0s: Health degrades → YELLOW signal detected
State remains GREEN (waiting for stability)
t=15s: Cluster recovers → GREEN signal
Timer resets (no state change yet)
t=30s: Cluster still degrading → YELLOW signal
Stability window: 30s - 15s = 15s (not met)
t=45s: Cluster continues degrading
Stability window: 45s - 0s = 45s (MET)
State transitions: GREEN → YELLOW (finalize)
Rethrottle all active tasks to YELLOW RPS
Stability Windows (configurable):
yellowStabilitySeconds: 30s - Wait 30s before transitioning to YELLOWgreenStabilitySeconds: 30s - Wait 30s before transitioning back to GREENredRecoverySeconds: 30s - Wait 30s in RED before attempting to return to YELLOWThis prevents flapping when cluster is on the boundary between health states.
When a reindex task completes, it enters the finalization phase:
SUCCESS PATH (Document count matches):
refresh_interval (60s in normal ops)number_of_replicas (1+)REINDEXEDResult: New destination ready for queries/ingestion immediately
FAILURE PATH (Document count mismatch or exception):
refresh_intervalnumber_of_replicasFAILED_DOC_COUNT_MISMATCH or error statusResult: Old destination with normal settings ready for retry
Key Guarantee: Regardless of success or failure, the active destination index always has normal (non-optimized) settings after finalization.
All parameters are configured via environment variables or application.yml under elasticSearch.buildIndices.
| Parameter | Env Variable | Default | Description |
|---|---|---|---|
enableParallelReindex | ELASTICSEARCH_BUILD_INDICES_ENABLE_PARALLEL_REINDEX | false | Enable/disable parallel reindexing (set true to activate) |
taskCheckIntervalSeconds | ELASTICSEARCH_BUILD_INDICES_TASK_CHECK_INTERVAL_SECONDS | 15 | How often to check task status and cluster health (seconds) |
maxReindexHours | ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS | 12 | Maximum total reindex time before timeout (hours) |
| Parameter | Env Variable | Default | Description |
|---|---|---|---|
normalIndexCostThreshold | ELASTICSEARCH_NORMAL_INDEX_COST_THRESHOLD | 500,000 | Cost threshold for NORMAL vs LARGE tier (formula: docCount × shards / nodes) |
maxConcurrentNormalReindex | ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS | 4 | Max concurrent reindex operations for NORMAL tier indices |
maxConcurrentLargeReindex | ELASTICSEARCH_BUILD_INDICES_MAX_CONCURRENT_LARGE_REINDEX | 2 | Max concurrent reindex operations for LARGE tier indices |
Tier Classification Examples:
Scenario: 3-node cluster, 500K documents, 2 primary shards
Cost = (500,000 × 2) / 3 = 333,333
Classification: NORMAL tier (< 500K threshold)
Execution: Can run up to 4 concurrent with other NORMAL indices
---
Scenario: 3-node cluster, 1.5M documents, 5 primary shards
Cost = (1,500,000 × 5) / 3 = 2,500,000
Classification: LARGE tier (≥ 500K threshold)
Execution: Runs serially (max 1-2 concurrent)
| Parameter | Env Variable | Default | Description |
|---|---|---|---|
clusterHealthCheckIntervalSeconds | ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEALTH_CHECK_INTERVAL_SECONDS | 30 | How often to check cluster health (seconds) |
clusterHeapThresholdPercent | ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEAP_THRESHOLD_PERCENT | 90 | Heap usage threshold for RED state |
clusterHeapYellowThresholdPercent | ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEAP_YELLOW_THRESHOLD_PERCENT | 75 | Heap usage threshold for YELLOW state |
writeRejectionRedThreshold | ELASTICSEARCH_BUILD_INDICES_WRITE_REJECTION_RED_THRESHOLD | 50 | Write rejection % threshold for RED state |
Health State Thresholds:
RED State triggered by:
- ES cluster status RED, OR
- Heap usage ≥ 90%, OR
- Write rejections ≥ 50%
Action: Pause new task submissions, rethrottle to 100 req/s, refresh: 30s
YELLOW State triggered by:
- ES cluster status YELLOW (with unassigned replicas), OR
- Heap usage 75-90%, OR
- Write rejections elevated (< 50%)
Action: Continue submissions with standard rate, rethrottle to 500 req/s, refresh: 60s
GREEN State:
- ES cluster status GREEN, AND
- Heap usage < 75%, AND
- Write rejections normal
Action: Full speed, unlimited req/s, refresh: disabled (-1)
| Parameter | Env Variable | Default | Description |
|---|---|---|---|
normalTierRequestsPerSecond | ELASTICSEARCH_BUILD_INDICES_NORMAL_TIER_REQUESTS_PER_SECOND | 500 | Request rate limit during YELLOW health state |
throttledTierRequestsPerSecond | ELASTICSEARCH_BUILD_INDICES_THROTTLED_TIER_REQUESTS_PER_SECOND | 100 | Request rate limit during RED health state |
normalTierRefreshInterval | ELASTICSEARCH_BUILD_INDICES_NORMAL_TIER_REFRESH_INTERVAL | 60s | Refresh interval during YELLOW state |
throttledTierRefreshInterval | ELASTICSEARCH_BUILD_INDICES_THROTTLED_TIER_REFRESH_INTERVAL | 30s | Refresh interval during RED state |
rethrottleExecutorPoolSize | ELASTICSEARCH_BUILD_INDICES_RETHROTTLE_EXECUTOR_POOL_SIZE | 8 | Max parallel rethrottle operations |
RPS and Refresh Behavior:
GREEN state (healthy cluster):
- RPS: -1 (unlimited) - submit requests as fast as possible
- Refresh: -1 (disabled) - maximize throughput, no memory overhead
- Usage: Normal operation with high load tolerance
YELLOW state (elevated load):
- RPS: 500 req/s - moderate throttling
- Refresh: 60s - periodic segment flushes reduce memory
- Usage: Some cluster pressure, balance throughput/stability
RED state (critical):
- RPS: 100 req/s - aggressive throttling
- Refresh: 30s - aggressive flushes for immediate heap relief
- Usage: Cluster near limits, pause submissions, stabilize
| Parameter | Env Variable | Default | Description |
|---|---|---|---|
yellowStabilitySeconds | ELASTICSEARCH_BUILD_INDICES_YELLOW_STABILITY_SECONDS | 30 | Seconds cluster must be in YELLOW before state transition |
greenStabilitySeconds | ELASTICSEARCH_BUILD_INDICES_GREEN_STABILITY_SECONDS | 30 | Seconds cluster must be healthy before returning to GREEN |
redRecoverySeconds | ELASTICSEARCH_BUILD_INDICES_RED_RECOVERY_SECONDS | 30 | Seconds in RED before attempting recovery to YELLOW |
| Parameter | Env Variable | Default | Description |
|---|---|---|---|
docCountValidationRetryCount | ELASTICSEARCH_BUILD_INDICES_DOC_COUNT_VALIDATION_RETRY_COUNT | 10 | Retries for document count validation post-reindex |
docCountValidationRetrySleepMs | ELASTICSEARCH_BUILD_INDICES_DOC_COUNT_VALIDATION_RETRY_SLEEP_MS | 2000 | Sleep between validation retries (ms) |
| Parameter | Env Variable | Default | Description |
|---|---|---|---|
maxConcurrentFinalizations | ELASTICSEARCH_BUILD_INDICES_MAX_CONCURRENT_FINALIZATIONS | 5 | Max concurrent alias swap/finalization operations |
replicaSyncTimeoutMinutes | ELASTICSEARCH_BUILD_INDICES_REPLICA_SYNC_TIMEOUT_MINUTES | 1 | Max time to wait for replica sync before promoting primary |
minimumReplicasForPromotion | ELASTICSEARCH_BUILD_INDICES_MINIMUM_REPLICAS_FOR_PROMOTION | 1 | Minimum replica count required before index promotion |
elasticSearch:
buildIndices:
# Core execution
enableParallelReindex: ${ELASTICSEARCH_BUILD_INDICES_ENABLE_PARALLEL_REINDEX:false}
taskCheckIntervalSeconds: ${ELASTICSEARCH_BUILD_INDICES_TASK_CHECK_INTERVAL_SECONDS:15}
maxReindexHours: ${ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS:12}
# Index tier classification
normalIndexCostThreshold: ${ELASTICSEARCH_NORMAL_INDEX_COST_THRESHOLD:500000}
maxConcurrentNormalReindex: ${ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS:4}
maxConcurrentLargeReindex: ${ELASTICSEARCH_BUILD_INDICES_MAX_CONCURRENT_LARGE_REINDEX:2}
# Cluster health monitoring
clusterHealthCheckIntervalSeconds: ${ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEALTH_CHECK_INTERVAL_SECONDS:30}
clusterHeapThresholdPercent: ${ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEAP_THRESHOLD_PERCENT:90}
clusterHeapYellowThresholdPercent: ${ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEAP_YELLOW_THRESHOLD_PERCENT:75}
writeRejectionRedThreshold: ${ELASTICSEARCH_BUILD_INDICES_WRITE_REJECTION_RED_THRESHOLD:50}
# Adaptive throttling
normalTierRequestsPerSecond: ${ELASTICSEARCH_BUILD_INDICES_NORMAL_TIER_REQUESTS_PER_SECOND:500}
throttledTierRequestsPerSecond: ${ELASTICSEARCH_BUILD_INDICES_THROTTLED_TIER_REQUESTS_PER_SECOND:100}
normalTierRefreshInterval: ${ELASTICSEARCH_BUILD_INDICES_NORMAL_TIER_REFRESH_INTERVAL:60s}
throttledTierRefreshInterval: ${ELASTICSEARCH_BUILD_INDICES_THROTTLED_TIER_REFRESH_INTERVAL:30s}
rethrottleExecutorPoolSize: ${ELASTICSEARCH_BUILD_INDICES_RETHROTTLE_EXECUTOR_POOL_SIZE:8}
# Circuit breaker stability
yellowStabilitySeconds: ${ELASTICSEARCH_BUILD_INDICES_YELLOW_STABILITY_SECONDS:30}
greenStabilitySeconds: ${ELASTICSEARCH_BUILD_INDICES_GREEN_STABILITY_SECONDS:30}
redRecoverySeconds: ${ELASTICSEARCH_BUILD_INDICES_RED_RECOVERY_SECONDS:30}
# Document validation
docCountValidationRetryCount: ${ELASTICSEARCH_BUILD_INDICES_DOC_COUNT_VALIDATION_RETRY_COUNT:10}
docCountValidationRetrySleepMs: ${ELASTICSEARCH_BUILD_INDICES_DOC_COUNT_VALIDATION_RETRY_SLEEP_MS:2000}
# Other
maxConcurrentFinalizations: ${ELASTICSEARCH_BUILD_INDICES_MAX_CONCURRENT_FINALIZATIONS:5}
replicaSyncTimeoutMinutes: ${ELASTICSEARCH_BUILD_INDICES_REPLICA_SYNC_TIMEOUT_MINUTES:1}
minimumReplicasForPromotion: ${ELASTICSEARCH_BUILD_INDICES_MINIMUM_REPLICAS_FOR_PROMOTION:1}
# Enable parallel reindex
ELASTICSEARCH_BUILD_INDICES_ENABLE_PARALLEL_REINDEX=true
# Core execution settings
ELASTICSEARCH_BUILD_INDICES_TASK_CHECK_INTERVAL_SECONDS=15
ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS=12
# Tier classification
ELASTICSEARCH_NORMAL_INDEX_COST_THRESHOLD=500000
ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS=4
ELASTICSEARCH_BUILD_INDICES_MAX_CONCURRENT_LARGE_REINDEX=2
# Cluster health
ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEALTH_CHECK_INTERVAL_SECONDS=30
ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEAP_THRESHOLD_PERCENT=90
ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEAP_YELLOW_THRESHOLD_PERCENT=75
ELASTICSEARCH_BUILD_INDICES_WRITE_REJECTION_RED_THRESHOLD=50
# Adaptive throttling
ELASTICSEARCH_BUILD_INDICES_NORMAL_TIER_REQUESTS_PER_SECOND=500
ELASTICSEARCH_BUILD_INDICES_THROTTLED_TIER_REQUESTS_PER_SECOND=100
ELASTICSEARCH_BUILD_INDICES_NORMAL_TIER_REFRESH_INTERVAL=60s
ELASTICSEARCH_BUILD_INDICES_THROTTLED_TIER_REFRESH_INTERVAL=30s
# Circuit breaker
ELASTICSEARCH_BUILD_INDICES_YELLOW_STABILITY_SECONDS=30
ELASTICSEARCH_BUILD_INDICES_GREEN_STABILITY_SECONDS=30
ELASTICSEARCH_BUILD_INDICES_RED_RECOVERY_SECONDS=30
# Conservative concurrency
ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS=1
ELASTICSEARCH_BUILD_INDICES_MAX_CONCURRENT_LARGE_REINDEX=1
# Strict health thresholds
ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEAP_THRESHOLD_PERCENT=75
ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEAP_YELLOW_THRESHOLD_PERCENT=60
# Fast health checks
ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEALTH_CHECK_INTERVAL_SECONDS=10
ELASTICSEARCH_BUILD_INDICES_TASK_CHECK_INTERVAL_SECONDS=10
# Short timeout
ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS=4
# Slower throttling for single-node
ELASTICSEARCH_BUILD_INDICES_NORMAL_TIER_REQUESTS_PER_SECOND=200
ELASTICSEARCH_BUILD_INDICES_THROTTLED_TIER_REQUESTS_PER_SECOND=50
Why: Single-node clusters have minimal resources. Strict health thresholds and low concurrency prevent circuit breaker trips and OOM errors.
# Balanced concurrency
ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS=2
ELASTICSEARCH_BUILD_INDICES_MAX_CONCURRENT_LARGE_REINDEX=1
# Moderate health thresholds
ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEAP_THRESHOLD_PERCENT=85
ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEAP_YELLOW_THRESHOLD_PERCENT=70
# Standard health checks
ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEALTH_CHECK_INTERVAL_SECONDS=30
ELASTICSEARCH_BUILD_INDICES_TASK_CHECK_INTERVAL_SECONDS=30
# Standard timeout
ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS=12
Why: Small clusters need balanced parallelism. 2 concurrent NORMAL indices with moderate health thresholds prevents resource saturation while still gaining reindex speedup.
# Aggressive concurrency
ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS=4
ELASTICSEARCH_BUILD_INDICES_MAX_CONCURRENT_LARGE_REINDEX=2
# Relaxed health thresholds
ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEAP_THRESHOLD_PERCENT=90
ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEAP_YELLOW_THRESHOLD_PERCENT=80
# Standard health checks (less frequent)
ELASTICSEARCH_BUILD_INDICES_CLUSTER_HEALTH_CHECK_INTERVAL_SECONDS=60
ELASTICSEARCH_BUILD_INDICES_TASK_CHECK_INTERVAL_SECONDS=60
# Long timeout for large datasets
ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS=24
# Higher throttling for stable clusters
ELASTICSEARCH_BUILD_INDICES_NORMAL_TIER_REQUESTS_PER_SECOND=1000
ELASTICSEARCH_BUILD_INDICES_THROTTLED_TIER_REQUESTS_PER_SECOND=200
Why: Large clusters benefit from high parallelism. Relaxed health thresholds account for legitimate background activity. Higher RPS limits take advantage of cluster capacity.
To fall back to sequential reindexing:
ELASTICSEARCH_BUILD_INDICES_ENABLE_PARALLEL_REINDEX=false
When to disable:
| Configuration | Mode | Time | Improvement |
|---|---|---|---|
| Default (4 NORMAL, 2 LARGE) | Parallel | 27m 29s | baseline |
| Disabled (sequential) | Sequential | 52m 17s | 90% slower |
| Aggressive (4 NORMAL, 2 LARGE, 1000 RPS) | Parallel | 22m 15s | 19% faster |
Note: Actual performance depends heavily on cluster capacity, hardware, data complexity, and concurrent load.
Cluster-Level Metrics:
Per-Task Metrics:
Reindex Orchestrator Metrics:
Normal operation:
INFO ParallelReindexOrchestrator - Starting parallel reindex for 99 indices
INFO ParallelReindexOrchestrator - Cost classification: 50 LARGE tier, 49 NORMAL tier
INFO ParallelReindexOrchestrator - Executing LARGE tier indices with max concurrency 2
INFO ParallelReindexOrchestrator - Submitted reindex for datasetindex_v2 (task: abc123) - 1/2 active
INFO ParallelReindexOrchestrator - Monitoring progress: 2 active, 48 pending, 0 completed, 0 failed
INFO CircuitBreakerState - Health check: GREEN (heap=62%, rejections=0%)
INFO ParallelReindexOrchestrator - Reindex completed for datasetindex_v2 (1.2M docs, result: REINDEXED)
INFO ParallelReindexOrchestrator - Executing NORMAL tier indices with max concurrency 4
INFO ParallelReindexOrchestrator - Parallel reindex complete: 50 succeeded, 0 failed
Health state changes:
WARN CircuitBreakerState - Health degraded: GREEN → YELLOW (heap=78%, rejections=5%)
INFO ParallelReindexOrchestrator - Rethrottling 4 active tasks from unlimited to 500 req/s
INFO ParallelReindexOrchestrator - Updated refresh_interval to 60s on 4 destination indices
WARN CircuitBreakerState - Health critical: YELLOW → RED (heap=92%, rejections=48%)
INFO ParallelReindexOrchestrator - Pausing submissions (cluster in RED state)
INFO ParallelReindexOrchestrator - Rethrottling 3 active tasks from 500 to 100 req/s
INFO CircuitBreakerState - Health recovered: RED → YELLOW (heap=81%, rejections=8%)
Failures:
ERROR ParallelReindexOrchestrator - Document count mismatch for chartindex_v2: expected=500000, actual=499800, diff=200
ERROR ParallelReindexOrchestrator - Reindex timeout! 2 tasks still active after 12 hours
WARN ParallelReindexOrchestrator - Failed to cleanup temp index datasetindex_v2_1234567890
WARN HealthCheckPoller - Cluster health check failed, circuit breaker defaulting to RED (safe mode)
Symptoms:
ERROR Reindex timeout! 3 tasks still active after 12 hours
Diagnosis:
Solutions:
maxReindexHours for very large datasets (100M+ docs)maxConcurrentNormalReindex and maxConcurrentLargeReindex to give each task more resourcesGET /_cluster/healthSymptoms:
WARN CircuitBreakerState - Health degraded: GREEN → YELLOW
... (5 seconds later)
WARN CircuitBreakerState - Health recovered: YELLOW → GREEN
... (repeats every few seconds)
Diagnosis: Cluster is hovering right around a threshold (e.g., heap at 75%).
Solutions:
ELASTICSEARCH_BUILD_INDICES_YELLOW_STABILITY_SECONDS=60Symptoms:
ERROR Document count mismatch for datasetindex_v2: expected=1000000, actual=999800
ERROR Reindex failed: FAILED_DOC_COUNT_MISMATCH
Understanding Tolerance:
Solutions:
ELASTICSEARCH_BUILD_INDICES_DOC_COUNT_TOLERANCE_PERCENT=0.5 only for known concurrent write scenariosSymptoms:
ERROR Failed to submit reindex for chartindex_v2: CircuitBreakingException: [parent] Data too large
Solutions:
maxConcurrentNormalReindex (start with 2, increase gradually)GET /_stats/fielddataELASTICSEARCH_BUILD_INDICES_YELLOW_STABILITY_SECONDS=60Symptoms:
WARN CircuitBreakerState - Health degraded to RED (write rejections: 52% > 50% threshold)
INFO ParallelReindexOrchestrator - Pausing submissions
Solutions:
ELASTICSEARCH_BUILD_INDICES_WRITE_REJECTION_RED_THRESHOLD (e.g., 70%) if rejections are expectedGET /_nodes/stats/thread_poolSymptoms:
WARN Failed to cleanup temp index datasetindex_v2_1234567890 after retries
Solutions:
{index}_v2_{timestamp}DELETE /datasetindex_v2_1234567890# Check active reindex tasks with progress
GET /_tasks?actions=*reindex&detailed=true
# Check cluster health
GET /_cluster/health
# Check node stats (heap, GC, rejections)
GET /_nodes/stats
# Check specific index stats
GET /datasetindex_v2/_stats
# Check write rejection rates
GET /_nodes/stats/indices/indexing
# Monitor circuit breaker limits
GET /_nodes/stats/breaker
DataHub maintains 99 system-managed indices across multiple index services:
All 99 indices are reindexed as part of the system upgrade process.
Parallel reindexing is designed specifically for controlled system upgrade scenarios where:
Important: The implementation uses JVM-local locks (not distributed locks). This is safe for single-pod upgrade jobs but NOT recommended for multi-pod runtime scenarios.
Recommended approach:
enableParallelReindex=true in stagingIf parallel reindexing causes issues:
Mid-upgrade: Press Ctrl+C to interrupt
Post-upgrade: If reindex succeeded but caused subsequent issues:
enableParallelReindex: false for next upgrade attemptOrphaned indices: Look for {index}_v2_{timestamp} patterns
DELETE /{pattern}*ParallelReindexOrchestrator
metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/indexbuilder/ParallelReindexOrchestrator.javaIndexCostEstimator
metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/indexbuilder/IndexCostEstimator.javaCircuitBreakerState
metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/indexbuilder/CircuitBreakerState.javaHealthCheckPoller
metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/indexbuilder/HealthCheckPoller.javaBuildIndicesConfiguration
metadata-service/configuration/src/main/java/com/linkedin/metadata/config/search/BuildIndicesConfiguration.javaThe system tracks reindex outcomes with these result types:
REINDEXED - Successful reindex with alias swapREINDEXED_SKIPPED_0DOCS - Skipped empty index (0 documents, no reindex needed)FAILED_TIMEOUT - Task exceeded maxReindexHoursFAILED_DOC_COUNT_MISMATCH - Document count outside tolerance after reindexFAILED_SUBMISSION - Failed to submit reindex task to ESFAILED_SUBMISSION_IO - I/O error during task submissionFAILED_MONITORING_ERROR - Error during task monitoring/status checkPost-reindex validation ensures data integrity:
Q: Will parallel reindexing speed up my upgrades?
A: Yes, typically 40-70% faster for large deployments. Actual speedup depends on cluster capacity, index distribution across NORMAL/LARGE tiers, and current cluster load. A 50-index reindex we tested showed ~27 min (parallel) vs ~52 min (sequential).
Q: What happens if I set concurrency too high?
A: ES cluster may experience high memory pressure, circuit breaker trips, or temporary slowness. The system includes adaptive throttling that automatically pauses submissions when cluster health degrades. Start conservative (default 2-4 NORMAL) and increase gradually based on monitoring.
Q: Can I use this for runtime reindexing outside upgrades?
A: Not recommended. The current implementation uses JVM-local locks suitable for single-pod upgrade jobs. For multi-pod runtime scenarios, you would need distributed locking (not currently implemented).
Q: What if a reindex task hangs indefinitely?
A: The orchestrator has a maxReindexHours timeout (default 12 hours) that will abort stuck tasks via ES task cancellation API and clean up temporary indices automatically.
Q: How do I know if my ES cluster can handle more concurrency?
A: Monitor these metrics during an upgrade:
If healthy, try increasing maxConcurrentNormalReindex by 1-2 for next upgrade. If health degrades, reduce concurrency.
Q: Does this work with OpenSearch?
A: Yes! The implementation uses the standard OpenSearch client and is compatible with both ElasticSearch 7.x+ and OpenSearch 1.x+.
Q: What's the difference between NORMAL and LARGE tier indices?
A: Cost = (docCount × primaryShards) / dataNodeCount. NORMAL tier indices (cost < 500K) can run multiple in parallel. LARGE tier indices (cost ≥ 500K) run serially to avoid cluster overload. A 50M-document index with 5 shards in a 3-node cluster would be LARGE and reindex serially.
Q: Can I change cost thresholds mid-upgrade?
A: You can change normalIndexCostThreshold before upgrade, but it will reclassify all indices and could change the execution plan. Recommended: finalize tuning in staging before production upgrade.