Back to Greptimedb

Summary

docs/rfcs/2025-12-30-export-import-v2.md

1.0.114.1 KB
Original Source

Summary

This RFC proposes a redesigned export/import system (V2) for GreptimeDB that addresses fundamental issues in the current implementation. The new design leverages time-series characteristics for efficient chunking, provides clear storage semantics, and ensures data reliability through comprehensive validation mechanisms.

Motivation

Problems with V1

The current export/import implementation has several critical issues:

  1. Ambiguous path semantics: output_dir serves dual purposes (local vs remote), causing confusion
  2. No chunking strategy: Large databases cannot be exported/imported efficiently
  3. Poor reliability: No resume capability, no progress tracking
  4. Limited format support: Relies on SQL dumps, not optimized for time-series data

Goals

  1. Clear semantics: Explicit distinction between remote storage and server-local paths
  2. Scalability: Support TB-scale databases through time-based chunking
  3. Reliability: Resume capability, progress tracking, data integrity verification
  4. Performance: Streaming export/import using native COPY DATABASE

Non-Goals

  • Schema evolution/migration (out of scope)
  • Cross-version compatibility (V2 does not support V1 format)
  • Real-time replication (use dedicated replication mechanism)

Guide-level Explanation

Core Concepts

Snapshot

A snapshot represents the complete state of a GreptimeDB catalog at a point in time, including schemas and data.

Snapshot structure:

snapshot-20250101/
├── manifest.json              # Snapshot metadata and chunk index
├── schema/
│   ├── schemas.json           # Schema definitions (JSON)
│   ├── tables.json            # Table definitions (JSON)
│   └── views.json             # View definitions (JSON)
└── data/
    ├── 1/
    │   ├── public.metrics.parquet
    │   └── public.logs.parquet
    ├── 2/
    │   ├── public.metrics.parquet
    │   └── public.logs.parquet
    └── 3/
        ├── public.metrics.parquet
        └── public.logs.parquet

Key properties:

  • Self-contained (all information needed for restore)
  • Immutable (content never changes after creation)
  • Verifiable (checksums at file, chunk, and snapshot levels)
  • Schema-only snapshots contain only manifest.json and schema/; data/ is absent, chunks is empty, and later data append is rejected (use --force to recreate)

Chunk

A chunk is a time-range partition of data. Each chunk is independently exportable/importable and retryable.

Chunk properties:

  • Has explicit start_time and end_time (recorded in manifest)
  • Non-overlapping with other chunks
  • Covers a contiguous time range
  • Independent (can be exported/imported in any order)
  • Atomic (either fully succeeds or fully fails)

Chunk directory naming:

  • Sequential numbers: 1/, 2/, 3/, ...
  • Time ranges are recorded in manifest.json

Storage Types

V2 supports two storage types:

TypeExampleUse Case
Remote Storages3://bucket/snapshotsProduction (recommended)
Server Pathfile:///data/backupLocal dev/testing

Important: Local paths (e.g., /tmp/export, ./backup) are not supported because schema export (CLI) and data export (server) run in different processes, which would split the snapshot across two machines.

Basic Usage

Export

bash
# Full snapshot to S3
greptime export create \
  --to s3://my-bucket/snapshots/prod-20250101

# Incremental snapshot (time range)
greptime export create \
  --start-time 2024-12-01T00:00:00Z \
  --end-time 2024-12-31T23:59:59Z \
  --to s3://my-bucket/snapshots/prod-december

# Schema-only export
greptime export create \
  --schema-only \
  --to s3://my-bucket/snapshots/prod-schema-only

Schema-only snapshots cannot be resumed with data; use `--force` to recreate.

# Export with specific format (default: parquet)
greptime export create \
  --format csv \
  --to s3://my-bucket/snapshots/prod-csv

# Resume interrupted export (automatic if snapshot exists)
greptime export create \
  --to s3://my-bucket/snapshots/prod-20250101

# Force recreate (delete existing and start over)
greptime export create \
  --to s3://my-bucket/snapshots/prod-20250101 \
  --force

Import

bash
# Full import
greptime import \
  --from s3://my-bucket/snapshots/prod-20250101

# Partial import (selected schemas)
greptime import \
  --from s3://my-bucket/snapshots/prod-20250101 \
  --schemas public,private

# Dry-run (verify without importing)
greptime import \
  --from s3://my-bucket/snapshots/prod-20250101 \
  --dry-run

Reference-level Explanation

Architecture

The export/import system consists of four main components:

  1. CLI: Command parsing, progress display, state management
  2. Coordinator: Snapshot planning, chunk scheduling, retry logic
  3. Schema Engine: DDL extraction, JSON serialization, schema validation
  4. Data Engine: Time-based chunking, streaming export/import via COPY DATABASE

All components use OpenDAL for storage abstraction, supporting S3, OSS, GCS, Azure Blob, and local filesystem.

Data Format

Manifest File

The manifest is a JSON file containing snapshot metadata and chunk index:

Key fields:

  • snapshot_id: Unique identifier (UUID)
  • catalog, schemas: Catalog and schema list
  • time_range: Overall time range covered
  • schema_only: Whether the snapshot contains schema only
  • chunks[]: Array of chunk metadata
  • format: Data format for exported files
  • checksum: Snapshot-level SHA256 checksum

Chunk metadata structure:

Each chunk entry in the manifest contains:

  • id: Chunk identifier (sequential number)
  • time_range: Start and end timestamps
  • status: Export status (Pending, InProgress, Completed, Failed)
  • files: List of data files in the chunk directory
  • checksum: Chunk-level checksum for integrity verification

Schema Files

Schema definitions are stored as JSON (not SQL) for better version compatibility and programmatic processing.

Why JSON instead of SQL?

  • Version-agnostic (can handle schema evolution)
  • Programmatically processable (direct deserialization)
  • Extensible (easy to add new fields)

Data Files

Data is exported via COPY DATABASE, supporting multiple formats:

  • Parquet (default): Columnar format, efficient compression, recommended for production use
  • CSV: Human-readable, universally compatible, useful for debugging and third-party integration
  • JSON: Structured text format, flexible schema representation
  • Other formats supported by COPY DATABASE

Format is specified via --format flag and recorded in manifest.json. Import automatically detects the format from manifest.

Core Design Decisions

1. Storage Path Validation

Export/import operations validate storage paths to prevent misconfigurations:

Path types:

  • s3://, oss://, gs://, azblob:// → Remote storage (recommended)
  • file:/// → Server-local path (only allowed when CLI and server are co-located)
  • /path, ./path → Rejected (would split snapshot across machines)

Validation:

  • Detect path type by URI scheme
  • For file:///, verify server endpoint resolves to localhost or local IP
  • Reject bare paths without URI scheme

2. Time-based Chunking

Data is partitioned into time-range chunks for efficient parallel processing and retry.

Algorithm:

  • Generate non-overlapping half-open intervals with configurable time window (default: 1 day)
  • Chunks are numbered sequentially (1, 2, 3, ...)
  • Each chunk's time range is recorded in manifest.json
  • Empty chunks (no data in time window) are skipped and not recorded in manifest
  • Guarantees: no gaps, no overlaps, complete coverage of time range

Chunk time window selection:

The optimal chunk time window depends on data density (volume per unit time):

  • Target: 100MB - 1GB per chunk (balances parallelism and retry cost)
  • Default: 1 day (suitable for most workloads)
  • Recommendations:
    • High density (>1GB/day): Use smaller windows like 1h, 6h, or 12h
    • Low density (<100MB/day): Use larger windows like 7d or 30d
    • Time windows can be adjusted flexibly (not required to align to day boundaries)

Example: 500GB database spanning 30 days → ~16.7GB/day → use 1h chunks → ~695MB/chunk

3. Data Export via COPY DATABASE

V2 leverages the existing COPY DATABASE TO for data export, with additional tooling layer for chunking, resume, and metadata management.

How it works:

  1. Export tool generates chunks based on time range and chunk window
  2. For each chunk, calls COPY DATABASE with specific time range:
    sql
    COPY DATABASE <schema> TO '<chunk_path>' WITH (
        START_TIME = '<chunk_start>',
        END_TIME = '<chunk_end>',
        FORMAT = 'parquet'
    )
    
  3. Export tool records chunk metadata, calculates checksums, and updates manifest

Separation of concerns:

  • COPY DATABASE (data layer): Streaming export, format support, time filtering
  • Export tool (tooling layer): Chunking, resume, manifest management, checksum calculation, schema export

4. Data Integrity

Three-layer checksum validation ensures data integrity:

  1. File-level: SHA256 of each Parquet file
  2. Chunk-level: Aggregate checksum of all files in a chunk
  3. Snapshot-level: Aggregate checksum of all chunks

Checksums are verified during import before data is written to the database.

5. Retry and Resume

Chunk-level retry:

  • Each chunk is an independent unit of work
  • Failed chunks are retried with exponential backoff (capped at 5 minutes)
  • Successful chunks are never re-exported

Resume capability:

  • Manifest tracks chunk status (Pending, InProgress, Completed, Failed)
  • Export/import automatically resumes when executed on existing snapshot
  • Skips completed chunks, retries failed/in-progress chunks, processes pending chunks
  • Works across process restarts
  • Use --force (export only) to delete existing snapshot and start over

6. Concurrent Export Safety

Scenario 1: Export to different paths ✅

  • Multiple exports can run simultaneously to different storage locations
  • No conflicts (only reads database, writes to different paths)

Scenario 2: Export to same path ⚠️

  • Concurrent exports to the same path can corrupt the snapshot
  • With default resume behavior: both processes will resume the same export (usually safe)
  • Race condition exists during initial manifest creation, but unlikely in practice
  • Recommendation: Include timestamp in snapshot path to avoid conflicts

CLI Interface

Export Command

greptime export create [OPTIONS] --to <LOCATION>

Required:
  --to <LOCATION>                 Target storage location

Optional:
  --catalog <CATALOG>             Catalog name (default: greptime)
  --schemas <SCHEMAS>             Comma-separated schema list (default: all)
  --start-time <TIMESTAMP>        Time range start (default: earliest)
  --end-time <TIMESTAMP>          Time range end (default: now)
  --chunk-time-window <DURATION>  Chunk time window (default: 1d)
  --parallelism <N>               Concurrency level (default: 1)
  --format <FORMAT>               Export format for data file: parquet (default), csv, json, or other formats supported by COPY DATABASE
  --schema-only                   Export schema only, no data
  --force                         Delete existing snapshot and recreate

Behavior:
  - If snapshot doesn't exist: create new snapshot
  - If snapshot exists: automatically resume export (skip completed chunks,
    retry failed chunks, process pending chunks)
  - If --force is specified: delete existing snapshot first, then create new one

Import Command

greptime import [OPTIONS] --from <SNAPSHOT>

Required:
  --from <SNAPSHOT>         Source snapshot location

Optional:
  --catalog <CATALOG>       Catalog name (default: greptime)
  --schemas <SCHEMAS>       Comma-separated schema list (default: all)
  --parallelism <N>         Concurrency level (default: 1)
  --dry-run                 Verify without importing
  --time-range <RANGE>      Import partial time range only

Behavior:
  - Automatically resumes if import was previously interrupted
  - Skips completed chunks, retries failed chunks, processes pending chunks

Management Commands

bash
# List snapshots
greptime export list --location s3://bucket/snapshots

# Verify snapshot integrity
greptime export verify --snapshot s3://bucket/snapshots/prod-20250101

# Delete snapshot
greptime export delete --snapshot s3://bucket/snapshots/old-snapshot

Drawbacks

  1. Breaking change: V2 does not support V1 format (migration required)
  2. Storage dependency: Relies on object storage for production use
  3. Time-series assumption: Assumes data has time index (not suitable for non-time-series tables)
  4. Chunk granularity: Very sparse data may result in many small chunks

Rationale and Alternatives

Why time-based chunking?

Alternatives considered:

  • Size-based chunking: Requires scanning data to estimate size (double I/O overhead)
  • Adaptive chunking: Complex, requires metadata scanning, marginal benefit

Decision: Fixed time-window chunking

  • Simple and predictable
  • Aligns with time-series data characteristics
  • User can adjust based on data density
  • No pre-scanning required

Why JSON for schema, not SQL?

Alternatives: SQL dumps (V1 approach)

Decision: JSON schema format

  • Version-agnostic (can add compatibility layer)
  • Programmatically processable
  • Easier to validate and transform
  • Avoids SQL dialect issues

Unresolved Questions

  1. Cross-version restore: Should V2 support restoring to older GreptimeDB versions?
  2. Partial schema export: Should we support table-level filtering (not just schema-level)?

Future Possibilities

  1. Incremental backup: Export only data changes since last backup (requires WAL integration)
  2. Parallel chunk processing: Export/import multiple chunks simultaneously (requires careful resource management)
  3. Snapshot metadata service: Centralized snapshot registry and lifecycle management