Back to Autogpt

Workspace & Media File Architecture

docs/platform/workspace-media-architecture.md

0.6.4411.8 KB
Original Source

Workspace & Media File Architecture

This document describes the architecture for handling user files in AutoGPT Platform, covering persistent user storage (Workspace) and ephemeral media processing pipelines.

Overview

The platform has two distinct file-handling layers:

LayerPurposePersistenceScope
WorkspaceLong-term user file storagePersistent (DB + GCS/local)Per-user, session-scoped access
Media PipelineEphemeral file processing for blocksTemporary (local disk)Per-execution

Database Models

UserWorkspace

Represents a user's file storage space. Created on-demand (one per user).

prisma
model UserWorkspace {
  id        String   @id @default(uuid())
  createdAt DateTime @default(now())
  updatedAt DateTime @updatedAt
  userId    String   @unique
  Files     UserWorkspaceFile[]
}

Key points:

  • One workspace per user (enforced by @unique on userId)
  • Created lazily via get_or_create_workspace()
  • Uses upsert to handle race conditions

UserWorkspaceFile

Represents a file stored in a user's workspace.

prisma
model UserWorkspaceFile {
  id          String    @id @default(uuid())
  workspaceId String
  name        String    // User-visible filename
  path        String    // Virtual path (e.g., "/sessions/abc123/image.png")
  storagePath String    // Actual storage path (gcs://... or local://...)
  mimeType    String
  sizeBytes   BigInt
  checksum    String?   // SHA256 for integrity
  isDeleted   Boolean   @default(false)
  deletedAt   DateTime?
  metadata    Json      @default("{}")

  @@unique([workspaceId, path])  // Enforce unique paths within workspace
}

Key points:

  • path is a virtual path for organizing files (not actual filesystem path)
  • storagePath contains the actual GCS or local storage location
  • Soft-delete pattern: isDeleted flag with deletedAt timestamp
  • Path is modified on delete to free up the virtual path for reuse

WorkspaceManager

Location: backend/util/workspace.py

High-level API for workspace file operations. Combines storage backend operations with database record management.

Initialization

python
from backend.util.workspace import WorkspaceManager

# Basic usage
manager = WorkspaceManager(user_id="user-123", workspace_id="ws-456")

# With session scoping (CoPilot sessions)
manager = WorkspaceManager(
    user_id="user-123",
    workspace_id="ws-456", 
    session_id="session-789"
)

Session Scoping

When session_id is provided, files are isolated to /sessions/{session_id}/:

python
# With session_id="abc123":
manager.write_file(content, "image.png")  
# → stored at /sessions/abc123/image.png

# Cross-session access is explicit:
manager.read_file("/sessions/other-session/file.txt")  # Works

Why session scoping?

  • CoPilot conversations need file isolation
  • Prevents file collisions between concurrent sessions
  • Allows session cleanup without affecting other sessions

Core Methods

MethodDescription
write_file(content, filename, path?, mime_type?, overwrite?)Write file to workspace
read_file(path)Read file by virtual path
read_file_by_id(file_id)Read file by ID
list_files(path?, limit?, offset?, include_all_sessions?)List files
delete_file(file_id)Soft-delete a file
get_download_url(file_id, expires_in?)Get signed download URL
get_file_info(file_id)Get file metadata
get_file_info_by_path(path)Get file metadata by path
get_file_count(path?, include_all_sessions?)Count files

Storage Backends

WorkspaceManager delegates to WorkspaceStorageBackend:

BackendWhen UsedStorage Path Format
GCSWorkspaceStoragemedia_gcs_bucket_name is configuredgcs://bucket/workspaces/{ws_id}/{file_id}/{filename}
LocalWorkspaceStorageNo GCS bucket configuredlocal://{ws_id}/{file_id}/{filename}

store_media_file()

Location: backend/util/file.py

The media normalization pipeline. Handles various input types and normalizes them for processing or output.

Purpose

Blocks receive files in many formats (URLs, data URIs, workspace references, local paths). store_media_file() normalizes these to a consistent format based on what the block needs.

Input Types Handled

Input FormatExampleHow It's Processed
Data URIdata:image/png;base64,iVBOR...Decoded, virus scanned, written locally
HTTP(S) URLhttps://example.com/image.pngDownloaded, virus scanned, written locally
Workspace URIworkspace://abc123 or workspace:///path/to/fileRead from workspace, virus scanned, written locally
Cloud pathgcs://bucket/pathDownloaded, virus scanned, written locally
Local pathimage.pngVerified to exist in exec_file directory

Return Formats

The return_format parameter determines what you get back:

python
from backend.util.file import store_media_file

# For local processing (ffmpeg, MoviePy, PIL)
local_path = await store_media_file(
    file=input_file,
    execution_context=ctx,
    return_format="for_local_processing"
)
# Returns: "image.png" (relative path in exec_file dir)

# For external APIs (Replicate, OpenAI, etc.)
data_uri = await store_media_file(
    file=input_file,
    execution_context=ctx,
    return_format="for_external_api"
)
# Returns: "data:image/png;base64,iVBOR..."

# For block output (adapts to execution context)
output = await store_media_file(
    file=input_file,
    execution_context=ctx,
    return_format="for_block_output"
)
# In CoPilot: Returns "workspace://file-id#image/png"
# In graphs:  Returns "data:image/png;base64,..."

Execution Context

store_media_file() requires an ExecutionContext with:

  • graph_exec_id - Required for temp file location
  • user_id - Required for workspace access
  • workspace_id - Optional; enables workspace features
  • session_id - Optional; for session scoping in CoPilot

Responsibility Boundaries

Virus Scanning

ComponentScans?Notes
store_media_file()✅ YesScans all content before writing to local disk
WorkspaceManager.write_file()✅ YesScans content before persisting

Scanning happens at:

  1. store_media_file() — scans everything it downloads/decodes
  2. WorkspaceManager.write_file() — scans before persistence

Tools like WriteWorkspaceFileTool don't need to scan because WorkspaceManager.write_file() handles it.

Persistence

ComponentPersists ToLifecycle
store_media_file()Temp dir (/tmp/exec_file/{exec_id}/)Cleaned after execution
WorkspaceManagerGCS or local storage + DBPersistent until deleted

Automatic cleanup: clean_exec_files(graph_exec_id) removes temp files after execution completes.


Decision Tree: WorkspaceManager vs store_media_file

text
┌─────────────────────────────────────────────────────┐
│ What do you need to do with the file?               │
└─────────────────────────────────────────────────────┘
                         │
           ┌─────────────┴─────────────┐
           ▼                           ▼
    Process in a block          Store for user access
    (ffmpeg, PIL, etc.)         (CoPilot files, uploads)
           │                           │
           ▼                           ▼
    store_media_file()           WorkspaceManager
    with appropriate             
    return_format                
           │                           
           │                           
    ┌──────┴──────┐                    
    ▼             ▼                    
 "for_local_   "for_block_
 processing"   output"
    │             │
    ▼             ▼
 Get local    Auto-saves to
 path for     workspace in
 tools        CoPilot context

Store for user access
    │
    ├── write_file() ─── Upload + persist (scans internally)
    ├── read_file() / get_download_url() ─── Retrieve
    └── list_files() / delete_file() ─── Manage

Quick Reference

ScenarioUse
Block needs to process a file with ffmpegstore_media_file(..., return_format="for_local_processing")
Block needs to send file to external APIstore_media_file(..., return_format="for_external_api")
Block returning a generated filestore_media_file(..., return_format="for_block_output")
API endpoint handling file uploadWorkspaceManager.write_file() (handles virus scanning internally)
API endpoint serving file downloadWorkspaceManager.get_download_url()
Listing user's filesWorkspaceManager.list_files()

Key Files Reference

FilePurpose
backend/data/workspace.pyDatabase CRUD operations for UserWorkspace and UserWorkspaceFile
backend/util/workspace.pyWorkspaceManager class - high-level workspace API
backend/util/workspace_storage.pyStorage backends (GCS, local) and WorkspaceStorageBackend interface
backend/util/file.pystore_media_file() and media processing utilities
backend/util/virus_scanner.pyVirusScannerService and scan_content_safe()
schema.prismaDatabase model definitions

Common Patterns

Block Processing a User's File

python
async def run(self, input_data, *, execution_context, **kwargs):
    # Normalize input to local path
    local_path = await store_media_file(
        file=input_data.video,
        execution_context=execution_context,
        return_format="for_local_processing",
    )
    
    # Process with local tools
    output_path = process_video(local_path)
    
    # Return (auto-saves to workspace in CoPilot)
    result = await store_media_file(
        file=output_path,
        execution_context=execution_context,
        return_format="for_block_output",
    )
    yield "output", result

API Upload Endpoint

python
from backend.util.virus_scanner import VirusDetectedError, VirusScanError

async def upload_file(file: UploadFile, user_id: str, workspace_id: str):
    content = await file.read()

    # write_file handles virus scanning internally
    manager = WorkspaceManager(user_id, workspace_id)
    try:
        workspace_file = await manager.write_file(
            content=content,
            filename=file.filename,
        )
    except VirusDetectedError:
        raise HTTPException(status_code=400, detail="File rejected: virus detected")
    except VirusScanError:
        raise HTTPException(status_code=503, detail="Virus scanning unavailable")
    except ValueError as e:
        raise HTTPException(status_code=400, detail=str(e))

    return {"file_id": workspace_file.id}

Configuration

SettingPurposeDefault
media_gcs_bucket_nameGCS bucket for workspace storageNone (uses local)
workspace_storage_dirLocal storage directory{app_data}/workspaces
max_file_size_mbMaximum file size in MB100
clamav_service_enabledEnable virus scanningtrue
clamav_service_hostClamAV daemon hostlocalhost
clamav_service_portClamAV daemon port3310
clamav_max_concurrencyMax concurrent scans to ClamAV daemon5
clamav_mark_failed_scans_as_cleanIf true, scan failures pass content through instead of rejecting (⚠️ security risk if ClamAV is unreachable)false