metadata-ingestion/docs/sources/datahubgc/datahub-gc_post.md
Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.
Manages Elasticsearch indices in DataHub, particularly focusing on time-series data.
source:
type: datahub-gc
config:
truncate_indices: true
truncate_index_older_than_days: 30
truncation_watch_until: 10000
truncation_sleep_between_seconds: 30
Manages access tokens in DataHub to maintain security and prevent token accumulation.
source:
type: datahub-gc
config:
cleanup_expired_tokens: true
Manages the lifecycle of data processes, jobs, and their instances (DPIs) within DataHub.
source:
type: datahub-gc
config:
dataprocess_cleanup:
enabled: true
retention_days: 10
keep_last_n: 5
delete_empty_data_jobs: false
delete_empty_data_flows: false
hard_delete_entities: false
batch_size: 500
max_workers: 10
delay: 0.25
Manages DataHub execution request records to prevent accumulation of historical execution data.
source:
type: datahub-gc
config:
execution_request_cleanup:
enabled: true
keep_history_min_count: 10
keep_history_max_count: 1000
keep_history_max_days: 30
batch_read_size: 100
runtime_limit_seconds: 3600
max_read_errors: 10
Manages the permanent removal of soft-deleted entities after a retention period.
source:
type: datahub-gc
config:
soft_deleted_entities_cleanup:
enabled: true
retention_days: 10
batch_size: 500
max_workers: 10
delay: 0.25
entity_types: null # Optional list of entity types to clean
platform: null # Optional platform filter
env: null # Optional environment filter
query: null # Optional custom query filter
limit_entities_delete: 25000
futures_max_at_time: 1000
runtime_limit_seconds: 7200
Each cleanup task maintains detailed reports including:
Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.
If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.