DataHaven 🫎

AI-First Decentralized Storage secured by EigenLayer — a verifiable storage network for AI training data, machine learning models, and Web3 applications.

Overview

DataHaven is a decentralized storage and retrieval network designed for applications that need verifiable, production-scale data storage. Built on StorageHub and secured by EigenLayer's restaking protocol, DataHaven separates storage from verification: providers store data off-chain while cryptographic commitments are anchored on-chain for tamper-evident verification.

Core Capabilities:

Verifiable Storage: Files are chunked, hashed into Merkle trees, and committed on-chain — enabling cryptographic proof that data hasn't been tampered with
Provider Network: Main Storage Providers (MSPs) serve data with competitive offerings, while Backup Storage Providers (BSPs) ensure redundancy through decentralized replication with on-chain slashing for failed proof challenges
EigenLayer Security: Validator set secured by Ethereum restaking — DataHaven validators register as EigenLayer operators with slashing for misbehavior
EVM Compatibility: Full Ethereum support via Frontier pallets for smart contracts and familiar Web3 tooling
Cross-chain Bridge: Native, trustless bridging with Ethereum via Snowbridge for tokens and messages

Architecture

DataHaven combines EigenLayer's shared security with StorageHub's decentralized storage infrastructure:

┌─────────────────────────────────────────────────────────────────────────────┐
│                              Ethereum (L1)                                  │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │  EigenLayer AVS Contracts                                             │  │
│  │  • DataHavenServiceManager (validator lifecycle & slashing)           │  │
│  │  • RewardsRegistry (validator performance & rewards)                  │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                    ↕                                        │
│                          Snowbridge Protocol                                │
│                    (trustless cross-chain messaging)                        │
└─────────────────────────────────────────────────────────────────────────────┘
                                     ↕
┌─────────────────────────────────────────────────────────────────────────────┐
│                          DataHaven (Substrate)                              │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │  StorageHub Pallets                     DataHaven Pallets             │  │
│  │  • file-system (file operations)        • External Validators         │  │
│  │  • providers (MSP/BSP registry)         • Native Transfer             │  │
│  │  • proofs-dealer (challenge/verify)     • Rewards                     │  │
│  │  • payment-streams (storage payments)   • Frontier (EVM)              │  │
│  │  • bucket-nfts (bucket ownership)                                     │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘
                                     ↕
┌─────────────────────────────────────────────────────────────────────────────┐
│                        Storage Provider Network                             │
│  ┌─────────────────────────────┐    ┌─────────────────────────────┐        │
│  │  Main Storage Providers     │    │  Backup Storage Providers   │        │
│  │  (MSP)                      │    │  (BSP)                      │        │
│  │  • User-selected            │    │  • Network-assigned         │        │
│  │  • Serve read requests      │    │  • Replicate data           │        │
│  │  • Anchor bucket roots      │    │  • Proof challenges         │        │
│  │  • MSP Backend service      │    │  • On-chain slashing        │        │
│  └─────────────────────────────┘    └─────────────────────────────┘        │
│  ┌─────────────────────────────┐    ┌─────────────────────────────┐        │
│  │  Indexer                    │    │  Fisherman                  │        │
│  │  • Index on-chain events    │    │  • Audit storage proofs     │        │
│  │  • Query storage metadata   │    │  • Trigger challenges       │        │
│  │  • PostgreSQL backend       │    │  • Detect misbehavior       │        │
│  └─────────────────────────────┘    └─────────────────────────────┘        │
└─────────────────────────────────────────────────────────────────────────────┘

How Storage Works

Upload: User selects an MSP, creates a bucket, and uploads files. Files are chunked (8KB default), hashed into Merkle trees, and the root is anchored on-chain.
Replication: The MSP coordinates with BSPs to replicate data across the network based on the bucket's replication policy.
Retrieval: MSP returns files with Merkle proofs that users verify against on-chain commitments.
Verification: BSPs face periodic proof challenges — failure to prove data custody results in on-chain slashing via StorageHub pallets.

Repository Structure

datahaven/
├── contracts/      # EigenLayer AVS smart contracts
│   ├── src/       # Service Manager, Rewards Registry, Slasher
│   ├── script/    # Deployment scripts
│   └── test/      # Foundry test suites
├── operator/       # Substrate-based DataHaven node
│   ├── node/      # Node implementation & chain spec
│   ├── pallets/   # Custom pallets (validators, rewards, transfers)
│   └── runtime/   # Runtime configurations (mainnet/stagenet/testnet)
├── test/           # E2E testing framework
│   ├── suites/    # Integration test scenarios
│   ├── framework/ # Test utilities and helpers
│   └── launcher/  # Network deployment automation
├── deploy/         # Kubernetes deployment charts
│   ├── charts/    # Helm charts for nodes and relayers
│   └── environments/ # Environment-specific configurations
├── tools/          # GitHub automation and release scripts
└── .github/        # CI/CD workflows

Each directory contains its own README with detailed information. See:

contracts/README.md - Smart contract development
operator/README.md - Node building and runtime development
test/README.md - E2E testing and network deployment
deploy/README.md - Kubernetes deployment
tools/README.md - Development tools

Quick Start

Prerequisites

Kurtosis - Network orchestration
Bun v1.3.2+ - TypeScript runtime
Docker - Container management
Foundry - Solidity toolkit
Rust - For building the operator
Helm - Kubernetes deployments (optional)
Zig - For macOS cross-compilation (macOS only)

Launch Local Network

The fastest way to get started is with the interactive CLI:

bash

cd test
bun i                    # Install dependencies
bun cli launch           # Interactive launcher with prompts

This deploys a complete environment including:

Ethereum network: 2x EL clients (reth), 2x CL clients (lodestar)
Block explorers: Blockscout (optional), Dora consensus explorer
DataHaven node: Single validator with fast block times
Storage providers: MSP and BSP nodes for decentralized storage
AVS contracts: Deployed and configured on Ethereum
Snowbridge relayers: Bidirectional message passing

For more options and detailed instructions, see the test README.

Run Tests

bash

cd test
bun test:e2e              # Run all integration tests
bun test:e2e:parallel     # Run with limited concurrency

NOTES: Adding the environment variable INJECT_CONTRACTS=true will inject the contracts when starting the tests to speed up setup.

Development Workflows

Smart Contract Development:

bash

cd contracts
forge build               # Compile contracts
forge test                # Run contract tests

Node Development:

bash

cd operator
cargo build --release --features fast-runtime
cargo test
./scripts/run-benchmarks.sh

After Making Changes:

bash

cd test
bun generate:wagmi        # Regenerate contract bindings
bun generate:types        # Regenerate runtime types

Key Features

Verifiable Decentralized Storage

Production-scale storage with cryptographic guarantees:

Buckets: User-created containers managed by an MSP, summarized by a Merkle-Patricia trie root on-chain
Files: Deterministically chunked, hashed into Merkle trees, with roots serving as immutable fingerprints
Proofs: Merkle proofs enable verification of data integrity without trusting intermediaries
Audits: BSPs prove ongoing data custody via randomized proof challenges

Storage Provider Network

Two-tier provider model balancing performance and reliability:

MSPs: User-selected providers offering data retrieval with competitive service offerings
BSPs: Network-assigned backup providers ensuring data redundancy and availability, with on-chain slashing for failed proof challenges
Fisherman: Auditing service that monitors proofs and triggers challenges for misbehavior
Indexer: Indexes on-chain storage events for efficient querying

EigenLayer Security

DataHaven validators secured through Ethereum restaking:

Validators register as operators via DataHavenServiceManager contract
Economic security through ETH restaking
Slashing for validator misbehavior (separate from BSP slashing which is on-chain)
Performance-based validator rewards through RewardsRegistry

EVM Compatibility

Full Ethereum Virtual Machine support via Frontier pallets:

Deploy Solidity smart contracts
Use existing Ethereum tooling (MetaMask, Hardhat, etc.)
Compatible with ERC-20, ERC-721, and other standards

Cross-chain Communication

Trustless bridging via Snowbridge:

Native token transfers between Ethereum ↔ DataHaven
Cross-chain message passing
Finality proofs via BEEFY consensus
Three specialized relayers (beacon, BEEFY, execution)

Use Cases

DataHaven is designed for applications requiring verifiable, tamper-proof data storage:

AI & Machine Learning: Store training datasets, model weights, and agent configurations with cryptographic proofs of integrity — enabling federated learning and verifiable AI pipelines
DePIN (Decentralized Physical Infrastructure): Persistent storage for IoT sensor data, device configurations, and operational logs with provable data lineage
Real World Assets (RWAs): Immutable storage for asset documentation, ownership records, and compliance data with on-chain verification

Docker Images

Production images published to DockerHub.

Build optimizations:

sccache - Rust compilation caching
cargo-chef - Dependency layer caching
BuildKit cache mounts - External cache restoration

Build locally:

bash

cd test
bun build:docker:operator    # Creates datahavenxyz/datahaven:local

Development Environment

VS Code Configuration

IDE configurations are excluded from version control for personalization, but these settings are recommended for optimal developer experience. Add to your .vscode/settings.json:

Rust Analyzer:

json

{
  "rust-analyzer.linkedProjects": ["./operator/Cargo.toml"],
  "rust-analyzer.cargo.allTargets": true,
  "rust-analyzer.procMacro.enable": false,
  "rust-analyzer.server.extraEnv": {
    "CARGO_TARGET_DIR": "target/.rust-analyzer",
    "SKIP_WASM_BUILD": 1
  },
  "rust-analyzer.diagnostics.disabled": ["unresolved-macro-call"],
  "rust-analyzer.cargo.buildScripts.enable": false
}

Optimizations:

Links operator/ directory as the primary Rust project
Disables proc macros and build scripts for faster analysis (Substrate macros are slow)
Uses dedicated target directory to avoid conflicts
Skips WASM builds during development

Solidity (Juan Blanco's extension):

json

{
  "solidity.formatter": "forge",
  "solidity.compileUsingRemoteVersion": "v0.8.28+commit.7893614a",
  "[solidity]": {
    "editor.defaultFormatter": "JuanBlanco.solidity"
  }
}

Note: Solidity version must match foundry.toml

TypeScript (Biome):

json

{
  "biome.lsp.bin": "test/node_modules/.bin/biome",
  "[typescript]": {
    "editor.defaultFormatter": "biomejs.biome",
    "editor.codeActionsOnSave": {
      "source.organizeImports.biome": "always"
    }
  }
}

CI/CD

Local CI Testing

Run GitHub Actions workflows locally using act:

bash

# Run E2E workflow
act -W .github/workflows/e2e.yml -s GITHUB_TOKEN="$(gh auth token)"

# Run specific job
act -W .github/workflows/e2e.yml -j test-job-name

Automated Workflows

The repository includes GitHub Actions for:

E2E Testing: Full integration tests on PR and main branch
Contract Testing: Foundry test suites for smart contracts
Rust Testing: Unit and integration tests for operator
Docker Builds: Multi-platform image builds with caching
Release Automation: Version tagging and changelog generation

See .github/workflows/ for workflow definitions.

Contributing

Development Cycle

Make Changes: Edit contracts, runtime, or tests
Run Tests: Component-specific tests (forge test, cargo test)
Regenerate Types: Update bindings if contracts/runtime changed
Integration Test: Run E2E tests to verify cross-component behavior
Code Quality: Format and lint (cargo fmt, forge fmt, bun fmt:fix)

Common Pitfalls

Type mismatches: Regenerate with bun generate:types after runtime changes
Contract changes not reflected: Run bun generate:wagmi after modifications
Kurtosis issues: Ensure Docker is running and Kurtosis engine is started
Slow development: Use --features fast-runtime for shorter epochs/eras (block time stays 6s)
Network launch hangs: Check Blockscout - forge output can appear frozen

See CLAUDE.md for detailed development guidance.

License

GPL-3.0 - See LICENSE file for details

DataHaven 🫎

DataHaven 🫎

Overview

Architecture

How Storage Works

Repository Structure

Quick Start

Prerequisites

Launch Local Network

Run Tests

Development Workflows

Key Features

Verifiable Decentralized Storage

Storage Provider Network

EigenLayer Security

EVM Compatibility

Cross-chain Communication

Use Cases

Docker Images

Development Environment

VS Code Configuration

CI/CD

Local CI Testing

Automated Workflows

Contributing

Development Cycle

Common Pitfalls

License

Links