doc/development/geo/blob_replication.md
Blobs such as uploads, LFS objects, and CI job artifacts, are replicated to the secondary site
with the Self-Service Framework. To track the state of syncing, each model has a corresponding registry table,
for example Upload has Geo::UploadRegistry in the PostgreSQL Geo Tracking Database.
Job artifacts are used in the diagrams below, as one example of a blob.
Primary site:
sequenceDiagram
participant R as Runner
participant P as Puma
participant DB as PostgreSQL
participant SsP as Secondary site PostgreSQL
R->>P: Upload artifact
P->>DB: Insert `ci_job_artifacts` row
P->>DB: Insert `geo_events` row
P->>DB: Insert `geo_event_log` row
DB->>SsP: Replicate rows
ci_job_artifacts rowgeo_events row with data like "Job Artifact with ID 123 was updated"geo_event_log row pointing to the geo_events row (because we built SSF on top of some legacy logic)Secondary site, after the PostgreSQL DB rows have been replicated:
sequenceDiagram
participant DB as PostgreSQL
participant GLC as Geo Log Cursor
participant R as Redis
participant S as Sidekiq
participant TDB as PostgreSQL Tracking DB
participant PP as Primary site Puma
GLC->>DB: Query `geo_event_log`
GLC->>DB: Query `geo_events`
GLC->>R: Enqueue `Geo::EventWorker`
S->>R: Pick up `Geo::EventWorker`
S->>TDB: Insert to `job_artifact_registry`, "starting sync"
S->>PP: GET <primary site internal URL>/geo/retrieve/job_artifact/123
S->>TDB: Update `job_artifact_registry`, "synced"
geo_event_log rowgeo_events row
Geo::EventWorker job passing through the geo_events row dataGeo::EventWorker job
job_artifact_registry row in the PostgreSQL Geo Tracking Database because it doesn't exist, and marks it "started sync"job_artifact_registry row as "synced" and "pending verification"Secondary site:
There are two cronjobs running every minute: Geo::Secondary::RegistryConsistencyWorker and Geo::RegistrySyncWorker.
The workflow below is split into two, along those lines.
sequenceDiagram
participant SC as Sidekiq-cron
participant R as Redis
participant S as Sidekiq
participant DB as PostgreSQL
participant TDB as PostgreSQL Tracking DB
SC->>R: Enqueue `Geo::Secondary::RegistryConsistencyWorker`
S->>R: Pick up `Geo::Secondary::RegistryConsistencyWorker`
S->>DB: Query `ci_job_artifacts`
S->>TDB: Query `job_artifact_registry`
S->>TDB: Insert to `job_artifact_registry`
Geo::Secondary::RegistryConsistencyWorker job every minute. As long as it is actively doing work (creating and deleting rows), this job immediately re-enqueues itself. This job uses an exclusive lease to prevent multiple instances of itself from running simultaneously.Geo::Secondary::RegistryConsistencyWorker job
ci_job_artifacts table for up to 10000 rowsjob_artifact_registry table for up to 10000 rowsjob_artifact_registry row in the PostgreSQL Geo Tracking Database corresponding to the existing Job ArtifactsequenceDiagram
participant SC as Sidekiq-cron
participant R as Redis
participant S as Sidekiq
participant DB as PostgreSQL
participant TDB as PostgreSQL Tracking DB
participant PP as Primary site Puma
SC->>R: Enqueue `Geo::RegistrySyncWorker`
S->>R: Pick up `Geo::RegistrySyncWorker`
S->>TDB: Query `*_registry` tables
S->>R: Enqueue `Geo::EventWorker`s
S->>R: Pick up `Geo::EventWorker`
S->>TDB: Insert to `job_artifact_registry`, "starting sync"
S->>PP: GET <primary site internal URL>/geo/retrieve/job_artifact/123
S->>TDB: Update `job_artifact_registry`, "synced"
Geo::RegistrySyncWorker job every minute.
As long as it is actively doing work, this job loops for up to an hour scheduling sync jobs. This job uses an exclusive
lease to prevent multiple instances of itself from running simultaneously.Geo::RegistrySyncWorker job
registry tables in the PostgreSQL Geo Tracking Database for
"never attempted sync" rows. It interleaves rows from each table and adds them to an in-memory queue.registry tables for
"failed sync and ready to retry" rows and interleaves those and adds them to the in-memory queue.Geo::EventWorker jobs with arguments like "Job Artifact with ID 123 was updated" for
each item in the queue, and tracks the enqueued Sidekiq job IDs.Geo::EventWorker jobs when "maximum concurrency limit" settings are reachedGeo::EventWorker job
job_artifact_registry row as "started sync"job_artifact_registry row as "synced" and "pending verification"Primary site:
sequenceDiagram
participant Ru as Runner
participant P as Puma
participant DB as PostgreSQL
participant SC as Sidekiq-cron
participant Rd as Redis
participant S as Sidekiq
participant F as Filesystem
Ru->>P: Upload artifact
P->>DB: Insert `ci_job_artifacts`
P->>DB: Insert `ci_job_artifact_states`
SC->>Rd: Enqueue `Geo::VerificationCronWorker`
S->>Rd: Pick up `Geo::VerificationCronWorker`
S->>DB: Query `ci_job_artifact_states`
S->>Rd: Enqueue `Geo::VerificationBatchWorker`
S->>Rd: Pick up `Geo::VerificationBatchWorker`
S->>DB: Query `ci_job_artifact_states`
S->>DB: Update `ci_job_artifact_states` row, "started"
S->>F: Checksum file
S->>DB: Update `ci_job_artifact_states` row, "succeeded"
ci_job_artifacts rowci_job_artifact_states row to store verification state.
Geo::VerificationCronWorker job every minuteGeo::VerificationCronWorker job
ci_job_artifact_states for the number of rows marked "pending verification" or
"failed verification and ready to retry"Geo::VerificationBatchWorker jobs, limited by the "maximum verification concurrency"
settingGeo::VerificationBatchWorker job
ci_job_artifact_states for rows marked "pending verification"ci_job_artifact_states for rows marked
"failed verification and ready to retry"Secondary site:
sequenceDiagram
participant SC as Sidekiq-cron
participant R as Redis
participant S as Sidekiq
participant TDB as PostgreSQL Tracking DB
participant F as Filesystem
participant DB as PostgreSQL
SC->>R: Enqueue `Geo::VerificationCronWorker`
S->>R: Pick up `Geo::VerificationCronWorker`
S->>TDB: Query `job_artifact_registry`
S->>R: Enqueue `Geo::VerificationBatchWorker`
S->>R: Pick up `Geo::VerificationBatchWorker`
S->>TDB: Query `job_artifact_registry`
S->>TDB: Update `job_artifact_registry` row, "started"
S->>F: Checksum file
S->>DB: Query `ci_job_artifact_states`
S->>TDB: Update `job_artifact_registry` row, "succeeded"
Geo::VerificationCronWorker job every minuteGeo::VerificationCronWorker job
job_artifact_registry in the PostgreSQL Geo Tracking Database
for the number of rows marked "pending verification" or "failed verification and ready to retry"
Geo::VerificationBatchWorker jobs, limited by the "maximum verification concurrency"
settingGeo::VerificationBatchWorker job
job_artifact_registry in the PostgreSQL Geo Tracking Database for rows marked "pending verification"job_artifact_registry for rows marked
"failed verification and ready to retry"ci_job_artifact_states row which was replicated
by PostgreSQLjob_artifact_registry row "succeeded verification"