Back to Cog

Cog Architecture Overview

architecture/00-overview.md

0.19.34.4 KB
Original Source

Cog Architecture Overview

Cog packages machine learning models into production-ready OCI images.

The Big Picture

mermaid
flowchart LR
    subgraph input["What you write"]
        model["Model Code
+ cog.yaml"]
    end

    subgraph cog["Cog"]
        cli["CLI"]
        sdk["Python SDK"]
        coglet["Coglet (Rust)"]
    end

    subgraph output["What you get"]
        image["Container Image"]
        api["HTTP API"]
    end

    model -->|"imports"| sdk
    model --> cli
    cli -->|"builds"| image
    sdk -.->|"packaged into"| image
    image -->|"runs"| coglet
    coglet -->|"serves"| api

Components

Model Source

What the model author provides: cog.yaml for environment config, a Predictor class with setup() and predict() methods, and optionally model weights.

Deep dive: Model Source


Python SDK

The cog Python package that model authors import. Provides BasePredictor, the type system (Input, Path, Secret, ConcatenateIterator), and the thin server entry point that launches coglet. Installed inside every Cog container as a wheel.

Deep dive: Model Source (covers the SDK's public API)


Schema

An OpenAPI specification generated from the predictor's type hints. Describes what inputs the model accepts and what outputs it produces.

Deep dive: Schema


Prediction API

The HTTP interface for running predictions. A fixed envelope format (PredictionRequest/PredictionResponse) wraps model-specific inputs and outputs.

Deep dive: Prediction API


Container Runtime

The runtime that runs inside the container: a Rust HTTP server (Axum), worker process isolation via subprocess, and prediction execution via PyO3 bindings.

Deep dive: Container Runtime


Build System

Transforms cog.yaml and user code into a Docker image with the right Python version, CUDA libraries, and dependencies.

Deep dive: Build System


CLI

The command-line tool for building, testing, and deploying models.

Deep dive: CLI


How It Fits Together

mermaid
flowchart TB
    subgraph source["Model Source"]
        yaml["cog.yaml"]
        code["predict.py"]
        weights["weights"]
    end

    subgraph build["Build Time"]
        config["Config Parser"]
        generator["Dockerfile Generator"]
        schema_gen["Schema Generator"]
    end

    subgraph image["Container Image"]
        layers["Base + Deps + Code"]
        schema["OpenAPI Schema
(label)"]
    end

    subgraph runtime["Runtime"]
        server["HTTP Server
(Rust/Axum)"]
        worker["Worker Subprocess
(Python)"]
        predictor["Predictor"]
    end

    yaml --> config
    config --> generator
    generator --> layers
    code --> layers
    weights --> layers

    layers --> schema_gen
    schema_gen --> schema

    image --> server
    server --> worker
    worker --> predictor

Terminology

TermMeaning
SDKThe cog Python package -- the framework users build models on
PredictorUser's model class with setup() and predict() methods
SchemaOpenAPI spec describing the model's input/output interface
EnvelopeFixed request/response structure wrapping model-specific data
WorkerIsolated subprocess running user code
SetupOne-time model initialization at container start
CogletRust-based prediction server that runs inside containers
SlotA concurrency unit -- one Unix socket connection to the worker subprocess

Reading Order

For understanding Cog's architecture, we recommend reading in this order:

  1. Model Source -- What users write
  2. Schema -- How the interface is described
  3. Prediction API -- The HTTP contract
  4. Container Runtime -- What runs inside the container
  5. Build System -- How images are built
  6. CLI -- How users interact with it all