Back to Starrocks

Cluster Snapshot

docs/en/administration/cluster_snapshot.md

4.1.06.8 KB
Original Source
<head><meta name="docsearch:pagerank" content="100"/></head>

import Beta from '../_assets/commonMarkdown/_beta.mdx' import ClusterSnapshotTerm from '../_assets/commonMarkdown/cluster_snapshot_term.mdx' import ClusterSnapshotTermCRDR from '../_assets/commonMarkdown/cluster_snapshot_term_crdr.mdx' import ClusterSnapshotSyntaxParam from '../_assets/commonMarkdown/cluster_snapshot_syntax_param.mdx' import ClusterSnapshotPurge from '../_assets/commonMarkdown/cluster_snapshot_purge.mdx' import ManualCreateDropClusterSnapshot from '../_assets/commonMarkdown/manual_cluster_snapshot.mdx' import ClusterSnapshotWarning from '../_assets/commonMarkdown/cluster_snapshot_warning.mdx' import ClusterSnapshotCrossRegionRecover from '../_assets/commonMarkdown/cluster_snapshot_cross_region_recover.mdx' import ClusterSnapshotAfterRecover from '../_assets/commonMarkdown/cluster_snapshot_after_recover.mdx' import ClusterSnapshotAppendix from '../_assets/commonMarkdown/cluster_snapshot_appendix.mdx'

Cluster Snapshot

<Beta />

This topic describes how to use Cluster Snapshot for disaster recovery on shared-data clusters.

This feature is supported from v3.4.2 onwards and only available on shared-data clusters.

Overview

The fundamental idea of disaster recovery for shared-data clusters is to ensure that the full cluster state (including data and metadata) is stored in object storage. This way, if the cluster encounters a failure, it can be restored from the object storage as long as the data and metadata remain intact. Additionally, features like backups and cross-region replication offered by cloud providers can be used to achieve remote recovery and cross-region disaster recovery.

In shared-data clusters, the CN state (data) is stored in object storage, but the FE state (metadata) remains local. To ensure that object storage has all the cluster state for restoration, StarRocks now supports Cluster Snapshot for both data and metadata in object storage.

Workflow

Terms

  • Cluster snapshot

    A cluster snapshot refers to a snapshot of the cluster state at a certain moment. It contains all the objects in the cluster, such as catalogs, databases, tables, users & privileges, loading tasks, and more. It does not include all external dependent objects, such as configuration files of external catalogs, and local UDF JAR packages.

  • Generating cluster snapshot

    The system automatically maintains a snapshot closely following the latest cluster state. Historical snapshots will be dropped right after the latest one is created, keeping only one snapshot available all the time.

    <ClusterSnapshotTerm />
  • Cluster Restore

    Restore the cluster from a snapshot.

<ClusterSnapshotTermCRDR />

Automated cluster snapshot

Automated Cluster Snapshot is disabled by default.

Use the following statement to enable this feature:

<ClusterSnapshotSyntaxParam />

Each time FE creates a new metadata image after completing a metadata checkpoint, it automatically creates a snapshot. The name of the snapshot is generated by the system, following the format automated_cluster_snapshot_{timestamp}.

Metadata snapshots are stored under /{storage_volume_locations}/{service_id}/meta/image/automated_cluster_snapshot_timestamp. Data snapshots are stored in the same location as the original data.

FE configuration item automated_cluster_snapshot_interval_seconds controls the snapshot automation cycle. The default value is 600 seconds (10 minutes).

Disable automated cluster snapshot

Use the following statement to disable automated cluster snapshot:

SQL
ADMIN SET AUTOMATED CLUSTER SNAPSHOT OFF
<ClusterSnapshotPurge /> <ManualCreateDropClusterSnapshot />

View cluster snapshot

You can query the view information_schema.cluster_snapshots to view the latest cluster snapshot and the snapshots yet to be dropped.

SQL
SELECT * FROM information_schema.cluster_snapshots;

Return:

FieldDescription
snapshot_nameThe name of the snapshot.
snapshot_typeThe type of the snapshot. Valid values: automated and manual.
created_timeThe time at which the snapshot was created.
fe_journal_idThe ID of the FE journal.
starmgr_journal_idThe ID of the StarManager journal.
propertiesApplies to a feature not yet available.
storage_volumeThe storage volume where the snapshot is stored.
storage_pathThe storage path under which the snapshot is stored.

View cluster snapshot job

You can query the view information_schema.cluster_snapshot_jobs to view the job information of cluster snapshots.

SQL
SELECT * FROM information_schema.cluster_snapshot_jobs;

Return:

FieldDescription
snapshot_nameThe name of the snapshot.
job_idThe ID of the job.
created_timeThe time at which the job was created.
finished_timeThe time at which the job was finished.
stateThe state of the job. Valid values: INITIALIZING, SNAPSHOTING, FINISHED, EXPIRED, DELETED, and ERROR.
detail_infoThe specific progress information of the current execution stage.
error_messageThe error message (if any) of the job.

Restore the cluster

<ClusterSnapshotWarning />

Follow these steps to restore the cluster with the cluster snapshot.

<ClusterSnapshotCrossRegionRecover />
  1. Start the Leader FE node.

    Bash
    ./fe/bin/start_fe.sh --cluster_snapshot --daemon
    
  2. Start other FE nodes after cleaning the meta directories.

    Bash
    ./fe/bin/start_fe.sh --helper <leader_ip>:<leader_edit_log_port> --daemon
    
  3. Start CN nodes after cleaning the storage_root_path directories.

    Bash
    ./be/bin/start_cn.sh --daemon
    

If you have modified cluster_snapshot.yaml in the step 1, the node and storage volumes will be re-configured in the new cluster according to the information in the file.

<ClusterSnapshotAfterRecover /> <ClusterSnapshotAppendix />

Limitations

  • Currently, standby mode is not supported. The primary and secondary clusters cannot be online simultaneously. Otherwise, the normal operation of the secondary cluster cannot be guaranteed.
  • Currently, only one automated cluster snapshot can be retained.