Back to Kubo

How `ipfs add` Works

docs/add-code-flow.md

0.41.07.5 KB
Original Source

How ipfs add Works

This document explains what happens when you run ipfs add to import files into IPFS. Understanding this flow helps when debugging, optimizing imports, or building applications on top of IPFS.

The Big Picture

When you add a file to IPFS, three main things happen:

  1. Chunking - The file is split into smaller pieces
  2. DAG Building - Those pieces are organized into a tree structure (a Merkle DAG)
  3. Pinning - The root of the tree is pinned so it persists in your local node

The result is a Content Identifier (CID) - a hash that uniquely identifies your content and can be used to retrieve it from anywhere in the IPFS network.

mermaid
flowchart LR
    A["Your File
(bytes)"] --> B["Chunker
(split data)"]
    B --> C["DAG Builder
(tree)"]
    C --> D["CID
(hash)"]

Try It Yourself

bash
# Add a simple file
echo "Hello World" > hello.txt
ipfs add hello.txt
# added QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u hello.txt

# See what's inside
ipfs cat QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u
# Hello World

# View the DAG structure
ipfs dag get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u

Step by Step

Step 1: Chunking

Big files are split into chunks because:

  • Large files need to be broken down for efficient transfer
  • Identical chunks across files are stored only once (deduplication)
  • You can fetch parts of a file without downloading the whole thing

Chunking strategies (set with --chunker):

StrategyDescriptionBest For
size-NFixed size chunksGeneral use
rabinContent-defined chunks using rolling hashDeduplication across similar files
buzhashAlternative content-defined chunkingSimilar to rabin

See ipfs add --help for current defaults, or Import for making them permanent.

Content-defined chunking (rabin/buzhash) finds natural boundaries in the data. This means if you edit the middle of a file, only the changed chunks need to be re-stored - the rest can be deduplicated.

Step 2: Building the DAG

Each chunk becomes a leaf node in a tree. If a file has many chunks, intermediate nodes group them together. This creates a Merkle DAG (Directed Acyclic Graph) where:

  • Each node is identified by a hash of its contents
  • Parent nodes contain links (hashes) to their children
  • The root node's hash becomes the file's CID

Layout strategies:

Balanced layout (default):

mermaid
graph TD
    Root --> Node1[Node]
    Root --> Node2[Node]
    Node1 --> Leaf1[Leaf]
    Node1 --> Leaf2[Leaf]
    Node2 --> Leaf3[Leaf]

All leaves at similar depth. Good for random access - you can jump to any part of the file efficiently.

Trickle layout (--trickle):

mermaid
graph TD
    Root --> Leaf1[Leaf]
    Root --> Node1[Node]
    Root --> Node2[Node]
    Node1 --> Leaf2[Leaf]
    Node2 --> Leaf3[Leaf]

Leaves added progressively. Good for streaming - you can start reading before the whole file is added.

Step 3: Storing Blocks

As the DAG is built, each node is stored in the blockstore:

  • Normal mode: Data is copied into IPFS's internal storage (~/.ipfs/blocks/)
  • Filestore mode (--nocopy): Only references to the original file are stored (saves disk space but the original file must remain in place)

Step 4: Pinning

By default, added content is pinned (ipfs add --pin=true). This tells your IPFS node to keep this data - without pinning, content may eventually be removed to free up space.

Alternative: Organizing with MFS

Instead of pinning, you can use the Mutable File System (MFS) to organize content using familiar paths like /photos/vacation.jpg instead of raw CIDs:

bash
# Add directly to MFS path
ipfs add --to-files=/backups/ myfile.txt

# Or copy an existing CID into MFS
ipfs files cp /ipfs/QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u /docs/hello.txt

Content in MFS is implicitly pinned and stays organized across node restarts.

Options

Run ipfs add --help to see all available options for controlling chunking, DAG layout, CID format, pinning behavior, and more.

UnixFS Format

IPFS uses UnixFS to represent files and directories. UnixFS is an abstraction layer that:

  • Gives names to raw data blobs (so you can have /foo/bar.txt instead of just hashes)
  • Represents directories as lists of named links to other nodes
  • Organizes large files as trees of smaller chunks
  • Makes these structures cryptographically verifiable - any tampering is detectable because it would change the hashes

With --raw-leaves, leaf nodes store raw data without the UnixFS wrapper. This is more efficient and is the default when using CIDv1.

Code Architecture

The add flow spans several layers:

mermaid
flowchart TD
    subgraph CLI ["CLI Layer (kubo)"]
        A["core/commands/add.go
parses flags, shows progress"]
    end
    subgraph API ["CoreAPI Layer (kubo)"]
        B["core/coreapi/unixfs.go
UnixfsAPI.Add() entry point"]
    end
    subgraph Adder ["Adder (kubo)"]
        C["core/coreunix/add.go
orchestrates chunking, DAG building, MFS, pinning"]
    end
    subgraph Boxo ["boxo libraries"]
        D["chunker/ - splits data into chunks"]
        E["ipld/unixfs/ - DAG layout and UnixFS format"]
        F["mfs/ - mutable filesystem abstraction"]
        G["pinning/ - pin management"]
        H["blockstore/ - block storage"]
    end
    A --> B --> C --> Boxo

Key Files

ComponentLocation
CLI commandcore/commands/add.go
API implementationcore/coreapi/unixfs.go
Adder logiccore/coreunix/add.go
Chunkingboxo/chunker
DAG layoutsboxo/ipld/unixfs/importer
MFSboxo/mfs
Pinningboxo/pinning/pinner

The Adder

The Adder type in core/coreunix/add.go is the workhorse. It:

  1. Creates an MFS root - temporary in-memory filesystem for building the DAG
  2. Processes files recursively - chunks each file and builds DAG nodes
  3. Commits to blockstore - persists all blocks
  4. Pins the result - keeps content from being removed
  5. Returns the root CID

Key methods:

  • AddAllAndPin() - main entry point
  • addFileNode() - handles a single file or directory
  • add() - chunks data and builds the DAG using boxo's layout builders

Further Reading