Summary

This RFC proposes a redesigned export/import system (V2) for GreptimeDB that addresses fundamental issues in the current implementation. The new design leverages time-series characteristics for efficient chunking, provides clear storage semantics, and ensures data reliability through comprehensive validation mechanisms.

Motivation

Problems with V1

The current export/import implementation has several critical issues:

Ambiguous path semantics: output_dir serves dual purposes (local vs remote), causing confusion
No chunking strategy: Large databases cannot be exported/imported efficiently
Poor reliability: No resume capability, no progress tracking
Limited format support: Relies on SQL dumps, not optimized for time-series data

Goals

Clear semantics: Explicit distinction between remote storage and server-local paths
Scalability: Support TB-scale databases through time-based chunking
Reliability: Resume capability, progress tracking, data integrity verification
Performance: Streaming export/import using native COPY DATABASE

Non-Goals

Schema evolution/migration (out of scope)
Cross-version compatibility (V2 does not support V1 format)
Real-time replication (use dedicated replication mechanism)

Guide-level Explanation

Core Concepts

Snapshot

A snapshot represents the complete state of a GreptimeDB catalog at a point in time, including schemas and data.

Snapshot structure:

snapshot-20250101/
├── manifest.json              # Snapshot metadata and chunk index
├── schema/
│   ├── schemas.json           # Schema definitions (JSON)
│   ├── tables.json            # Table definitions (JSON)
│   └── views.json             # View definitions (JSON)
└── data/
    ├── 1/
    │   ├── public.metrics.parquet
    │   └── public.logs.parquet
    ├── 2/
    │   ├── public.metrics.parquet
    │   └── public.logs.parquet
    └── 3/
        ├── public.metrics.parquet
        └── public.logs.parquet

Key properties:

Self-contained (all information needed for restore)
Immutable (content never changes after creation)
Verifiable (checksums at file, chunk, and snapshot levels)
Schema-only snapshots contain only manifest.json and schema/; data/ is absent, chunks is empty, and later data append is rejected (use --force to recreate)

Chunk

A chunk is a time-range partition of data. Each chunk is independently exportable/importable and retryable.

Chunk properties:

Has explicit start_time and end_time (recorded in manifest)
Non-overlapping with other chunks
Covers a contiguous time range
Independent (can be exported/imported in any order)
Atomic (either fully succeeds or fully fails)

Chunk directory naming:

Sequential numbers: 1/, 2/, 3/, ...
Time ranges are recorded in manifest.json

Storage Types

V2 supports two storage types:

Type	Example	Use Case
Remote Storage	`s3://bucket/snapshots`	Production (recommended)
Server Path	`file:///data/backup`	Local dev/testing

Important: Local paths (e.g., /tmp/export, ./backup) are not supported because schema export (CLI) and data export (server) run in different processes, which would split the snapshot across two machines.

Basic Usage

Export

bash

# Full snapshot to S3
greptime export create \
  --to s3://my-bucket/snapshots/prod-20250101

# Incremental snapshot (time range)
greptime export create \
  --start-time 2024-12-01T00:00:00Z \
  --end-time 2024-12-31T23:59:59Z \
  --to s3://my-bucket/snapshots/prod-december

# Schema-only export
greptime export create \
  --schema-only \
  --to s3://my-bucket/snapshots/prod-schema-only

Schema-only snapshots cannot be resumed with data; use `--force` to recreate.

# Export with specific format (default: parquet)
greptime export create \
  --format csv \
  --to s3://my-bucket/snapshots/prod-csv

# Resume interrupted export (automatic if snapshot exists)
greptime export create \
  --to s3://my-bucket/snapshots/prod-20250101

# Force recreate (delete existing and start over)
greptime export create \
  --to s3://my-bucket/snapshots/prod-20250101 \
  --force

Import

bash

# Full import
greptime import \
  --from s3://my-bucket/snapshots/prod-20250101

# Partial import (selected schemas)
greptime import \
  --from s3://my-bucket/snapshots/prod-20250101 \
  --schemas public,private

# Dry-run (verify without importing)
greptime import \
  --from s3://my-bucket/snapshots/prod-20250101 \
  --dry-run

Reference-level Explanation

Architecture

The export/import system consists of four main components:

CLI: Command parsing, progress display, state management
Coordinator: Snapshot planning, chunk scheduling, retry logic
Schema Engine: DDL extraction, JSON serialization, schema validation
Data Engine: Time-based chunking, streaming export/import via COPY DATABASE

All components use OpenDAL for storage abstraction, supporting S3, OSS, GCS, Azure Blob, and local filesystem.

Data Format

Manifest File

The manifest is a JSON file containing snapshot metadata and chunk index:

Key fields:

snapshot_id: Unique identifier (UUID)
catalog, schemas: Catalog and schema list
time_range: Overall time range covered
schema_only: Whether the snapshot contains schema only
chunks[]: Array of chunk metadata
format: Data format for exported files
checksum: Snapshot-level SHA256 checksum

Chunk metadata structure:

Each chunk entry in the manifest contains:

id: Chunk identifier (sequential number)
time_range: Start and end timestamps
status: Export status (Pending, InProgress, Completed, Failed)
files: List of data files in the chunk directory
checksum: Chunk-level checksum for integrity verification

Schema Files

Schema definitions are stored as JSON (not SQL) for better version compatibility and programmatic processing.

Why JSON instead of SQL?

Version-agnostic (can handle schema evolution)
Programmatically processable (direct deserialization)
Extensible (easy to add new fields)

Data Files

Data is exported via COPY DATABASE, supporting multiple formats:

Parquet (default): Columnar format, efficient compression, recommended for production use
CSV: Human-readable, universally compatible, useful for debugging and third-party integration
JSON: Structured text format, flexible schema representation
Other formats supported by COPY DATABASE

Format is specified via --format flag and recorded in manifest.json. Import automatically detects the format from manifest.

Core Design Decisions

1. Storage Path Validation

Export/import operations validate storage paths to prevent misconfigurations:

Path types:

s3://, oss://, gs://, azblob:// → Remote storage (recommended)
file:/// → Server-local path (only allowed when CLI and server are co-located)
/path, ./path → Rejected (would split snapshot across machines)

Validation:

Detect path type by URI scheme
For file:///, verify server endpoint resolves to localhost or local IP
Reject bare paths without URI scheme

2. Time-based Chunking

Data is partitioned into time-range chunks for efficient parallel processing and retry.

Algorithm:

Generate non-overlapping half-open intervals with configurable time window (default: 1 day)
Chunks are numbered sequentially (1, 2, 3, ...)
Each chunk's time range is recorded in manifest.json
Empty chunks (no data in time window) are skipped and not recorded in manifest
Guarantees: no gaps, no overlaps, complete coverage of time range

Chunk time window selection:

The optimal chunk time window depends on data density (volume per unit time):

Target: 100MB - 1GB per chunk (balances parallelism and retry cost)
Default: 1 day (suitable for most workloads)
Recommendations:
- High density (>1GB/day): Use smaller windows like 1h, 6h, or 12h
- Low density (<100MB/day): Use larger windows like 7d or 30d
- Time windows can be adjusted flexibly (not required to align to day boundaries)

Example: 500GB database spanning 30 days → ~16.7GB/day → use 1h chunks → ~695MB/chunk

3. Data Export via COPY DATABASE

V2 leverages the existing COPY DATABASE TO for data export, with additional tooling layer for chunking, resume, and metadata management.

How it works:

Export tool generates chunks based on time range and chunk window

For each chunk, calls COPY DATABASE with specific time range:

sql

COPY DATABASE <schema> TO '<chunk_path>' WITH (
    START_TIME = '<chunk_start>',
    END_TIME = '<chunk_end>',
    FORMAT = 'parquet'
)

Export tool records chunk metadata, calculates checksums, and updates manifest

Separation of concerns:

COPY DATABASE (data layer): Streaming export, format support, time filtering
Export tool (tooling layer): Chunking, resume, manifest management, checksum calculation, schema export

4. Data Integrity

Three-layer checksum validation ensures data integrity:

File-level: SHA256 of each Parquet file
Chunk-level: Aggregate checksum of all files in a chunk
Snapshot-level: Aggregate checksum of all chunks

Checksums are verified during import before data is written to the database.

5. Retry and Resume

Chunk-level retry:

Each chunk is an independent unit of work
Failed chunks are retried with exponential backoff (capped at 5 minutes)
Successful chunks are never re-exported

Resume capability:

Manifest tracks chunk status (Pending, InProgress, Completed, Failed)
Export/import automatically resumes when executed on existing snapshot
Skips completed chunks, retries failed/in-progress chunks, processes pending chunks
Works across process restarts
Use --force (export only) to delete existing snapshot and start over

6. Concurrent Export Safety

Scenario 1: Export to different paths ✅

Multiple exports can run simultaneously to different storage locations
No conflicts (only reads database, writes to different paths)

Scenario 2: Export to same path ⚠️

Concurrent exports to the same path can corrupt the snapshot
With default resume behavior: both processes will resume the same export (usually safe)
Race condition exists during initial manifest creation, but unlikely in practice
Recommendation: Include timestamp in snapshot path to avoid conflicts

CLI Interface

Export Command

greptime export create [OPTIONS] --to <LOCATION>

Required:
  --to <LOCATION>                 Target storage location

Optional:
  --catalog <CATALOG>             Catalog name (default: greptime)
  --schemas <SCHEMAS>             Comma-separated schema list (default: all)
  --start-time <TIMESTAMP>        Time range start (default: earliest)
  --end-time <TIMESTAMP>          Time range end (default: now)
  --chunk-time-window <DURATION>  Chunk time window (default: 1d)
  --parallelism <N>               Concurrency level (default: 1)
  --format <FORMAT>               Export format for data file: parquet (default), csv, json, or other formats supported by COPY DATABASE
  --schema-only                   Export schema only, no data
  --force                         Delete existing snapshot and recreate

Behavior:
  - If snapshot doesn't exist: create new snapshot
  - If snapshot exists: automatically resume export (skip completed chunks,
    retry failed chunks, process pending chunks)
  - If --force is specified: delete existing snapshot first, then create new one

Import Command

greptime import [OPTIONS] --from <SNAPSHOT>

Required:
  --from <SNAPSHOT>         Source snapshot location

Optional:
  --catalog <CATALOG>       Catalog name (default: greptime)
  --schemas <SCHEMAS>       Comma-separated schema list (default: all)
  --parallelism <N>         Concurrency level (default: 1)
  --dry-run                 Verify without importing
  --time-range <RANGE>      Import partial time range only

Behavior:
  - Automatically resumes if import was previously interrupted
  - Skips completed chunks, retries failed chunks, processes pending chunks

Management Commands

bash

# List snapshots
greptime export list --location s3://bucket/snapshots

# Verify snapshot integrity
greptime export verify --snapshot s3://bucket/snapshots/prod-20250101

# Delete snapshot
greptime export delete --snapshot s3://bucket/snapshots/old-snapshot

Drawbacks

Breaking change: V2 does not support V1 format (migration required)
Storage dependency: Relies on object storage for production use
Time-series assumption: Assumes data has time index (not suitable for non-time-series tables)
Chunk granularity: Very sparse data may result in many small chunks

Rationale and Alternatives

Why time-based chunking?

Alternatives considered:

Size-based chunking: Requires scanning data to estimate size (double I/O overhead)
Adaptive chunking: Complex, requires metadata scanning, marginal benefit

Decision: Fixed time-window chunking

Simple and predictable
Aligns with time-series data characteristics
User can adjust based on data density
No pre-scanning required

Why JSON for schema, not SQL?

Alternatives: SQL dumps (V1 approach)

Decision: JSON schema format

Version-agnostic (can add compatibility layer)
Programmatically processable
Easier to validate and transform
Avoids SQL dialect issues

Unresolved Questions

Cross-version restore: Should V2 support restoring to older GreptimeDB versions?
Partial schema export: Should we support table-level filtering (not just schema-level)?

Future Possibilities

Incremental backup: Export only data changes since last backup (requires WAL integration)
Parallel chunk processing: Export/import multiple chunks simultaneously (requires careful resource management)
Snapshot metadata service: Centralized snapshot registry and lifecycle management