Back to Bytebase

SaaS Multi-Tenancy: Challenges and Roadmap

docs/plans/saas/00.challenges-and-roadmap.md

3.17.111.7 KB
Original Source

SaaS Multi-Tenancy: Challenges and Roadmap

Goal

Refactor Bytebase from per-user isolation (one container per workspace) to a true SaaS model (single container, shared endpoint at console.bytebase.com).

Chosen Approach: Row-Level Isolation

Add a workspace column to root-level tables. Project-child tables reuse their existing project FK for workspace scoping -- the project table has workspace, so any query that filters by project is implicitly workspace-scoped.


Key Challenges

Challenge 1: Workspace Isolation

Separate different workspaces so users only see their own data.

Challenge 2: Database Connection Routing

Single connection pool; workspace resolved from JWT, applied as query filter (WHERE workspace = $1 on root tables, WHERE project = $1 on project children).

Challenge 3: Background Runners

Runners poll for pending tasks across ALL workspaces with a single query. Each task carries workspace for context when processing.

Challenge 4: API Resource Naming

Current API uses flat resource names with no workspace prefix:

  • projects/{project}, instances/{instance}, environments/{environment}
  • projects/{project}/issues/{issue}, instances/{instance}/databases/{database}

Two options:

Option A: Keep flat names, resolve workspace from JWT (recommended)

The middleware resolves workspace from the JWT and injects it as a query parameter. Resource IDs like projects/default are unique per-workspace because project has UNIQUE(workspace, resource_id). Project-child queries use the project FK which is already workspace-scoped.

  • Zero API breaking changes
  • Zero proto changes
  • Resource ID uniqueness scoped by workspace in unique indexes on root tables

Option B: Explicit workspace prefix (workspaces/{workspace}/projects/{project})

AIP-compliant but requires changing every endpoint, proto definition, resource name parser, frontend API call, and external integration (Terraform provider, API clients, webhooks).

Recommendation: Option A. Workspace scoping is enforced at the DB level (root tables via workspace, child tables via project FK), making URL-level scoping redundant.

Challenge 5: Multi-Workspace User Identity

Current model: single workspace per installation. principal table holds END_USER, SERVICE_ACCOUNT, and WORKLOAD_IDENTITY in one table.

SaaS model requires:

  • A user can belong to multiple workspaces
  • Auth (login, password, MFA, OAuth) is workspace-independent
  • After login, user picks a workspace (or auto-redirects if only one)
  • JWT includes both user_id and workspace

Solution: Split the principal table into three separate tables:

  • principal -- keeps only END_USER rows. This IS the global account/identity table. No workspace needed (a user can belong to multiple workspaces). No data migration required -- just remove SERVICE_ACCOUNT/WORKLOAD_IDENTITY rows.
  • service_account -- new table for SERVICE_ACCOUNT rows. Workspace-scoped via project FK (project-level) or workspace (workspace-level).
  • workload_identity -- new table for WORKLOAD_IDENTITY rows. Same scoping as service_account.

The store layer already treats these as separate entities (principal.go, principal_service_account.go, principal_workload_identity.go), all querying the same principal table with WHERE type = ?. The split makes the schema match the existing code structure.

Workspace membership is tracked by the policy table's IAM policy (resource_type = 'WORKSPACE'), so no separate membership table is needed.

Why RLS Was Rejected

RLS policies are per-table: each table must have its own workspace column for the policy USING (workspace = current_setting('app.workspace')) to evaluate. This means:

  • All 30+ workspace-scoped tables would need a workspace column
  • Cannot leverage existing project FK for workspace scoping on child tables
  • Conflicts with the goal of minimizing schema changes

Row-level isolation with project-FK scoping requires workspace on only 12 root tables, while child tables use their existing FK relationships.


Table Classification

Tier 1: Root Tables That Need workspace (12 tables)

These are top-level entities with no parent FK that provides workspace scoping. They get a workspace column directly.

TableNotes
projectTop-level resource, anchor for project-child scoping
instanceTop-level resource
service_accountNew table (split from principal), workspace-scoped
workload_identityNew table (split from principal), workspace-scoped
settingWorkspace configuration
policyWorkspace/env/project policies
roleCustom roles
idpIdentity providers
review_configReview configurations
user_groupGroups
export_archiveData exports
audit_logAudit trail

Tier 2: Project Children -- Scoped via project FK (9 tables)

These already have a project FK referencing project(resource_id). Since project has workspace, queries on these tables filter by WHERE project = $1 where the project is already validated as belonging to the current workspace. No workspace column needed.

TableFK
planproject text NOT NULL REFERENCES project(resource_id)
issueproject text NOT NULL REFERENCES project(resource_id)
dbproject text NOT NULL REFERENCES project(resource_id)
project_webhookproject text NOT NULL REFERENCES project(resource_id)
worksheetproject text NOT NULL REFERENCES project(resource_id)
db_groupproject text NOT NULL REFERENCES project(resource_id)
releaseproject text NOT NULL REFERENCES project(resource_id)
access_grantproject text NOT NULL REFERENCES project(resource_id)
query_historyproject_id text NOT NULL (resource id, no FK constraint)

Tier 3: Grandchild Tables -- Scoped via Parent Chain (11 tables)

These are children of Tier 2 tables. They have no project FK and no workspace. Workspace scoping is inherited through the FK chain -- they are always accessed via their parent, which is already workspace-scoped.

Plan/pipeline grandchildren (5):

TableFK chain to workspace
plan_check_run-> plan(id) -> plan.project -> project.workspace
plan_webhook_delivery-> plan(id) -> plan.project -> project.workspace
task-> plan(id) -> plan.project -> project.workspace
task_run-> task(id) -> plan(id) -> ...
task_run_log-> task_run(id) -> task(id) -> ...

Issue grandchildren (1):

TableFK chain to workspace
issue_comment-> issue(id) -> issue.project -> project.workspace

DB grandchildren (4):

TableFK chain to workspace
db_schema-> db (via instance+db_name) -> db.project -> project.workspace
revision-> db (via instance+db_name) -> db.project -> project.workspace
sync_history-> db (via instance+db_name) -> db.project -> project.workspace
changelog-> db (via instance+db_name) -> db.project -> project.workspace

Worksheet grandchildren (1):

TableFK chain to workspace
worksheet_organizer-> worksheet(id) -> worksheet.project -> project.workspace

Tables That Do NOT Need workspace (5 tables)

TableReason
sheet_blobContent-addressed by SHA256, shared/deduped across workspaces
replica_heartbeatInfrastructure, not workspace-scoped
instance_change_historyBytebase internal migration version tracker
workspaceIt IS the workspace
principalGlobal identity layer (END_USER only, cross-workspace)

Auth Token Tables -- Tied to Identity Layer (no workspace)

TableReason
web_refresh_tokenTied to principal, not workspace
oauth2_clientGlobal OAuth2 app registration
oauth2_authorization_codeLogin flow
oauth2_refresh_tokenLogin flow

Summary

TierTablesworkspace columnWorkspace scoping mechanism
Root12Yes (10 existing + 2 new)Direct WHERE workspace = $1
Project children9NoWHERE project = $1 (project already validated)
Grandchildren11NoParent FK chain (parent validated by caller)
Global5NoNot workspace-scoped
Auth4NoTied to principal (identity) layer
Total4112 have column

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                      Request Flow                           │
│                                                             │
│  Client → Auth Interceptor → ACL Interceptor → Service      │
│              │                    │                │         │
│         Set workspace        Validate          Use          │
│         in context          workspace       workspace       │
│         from JWT            ownership       for queries     │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│                      Store Layer                            │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ Root Tables  │  │ Child Tables │  │ Global Tables    │  │
│  │ workspace FK │  │ project/inst │  │ No workspace     │  │
│  │ (required)   │  │ FK (indirect)│  │ (principal,      │  │
│  │              │  │              │  │  sheet_blob, etc) │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
├─────────────────────────────────────────────────────────────┤
│                    Background Runners                       │
│                                                             │
│  Cross-workspace polling → Resolve workspace from entity    │
│  → Use workspace for settings/policies                      │
└─────────────────────────────────────────────────────────────┘

Roadmap

StepFileDescription
0101.prepare-for-saas.mdPreparatory refactors that can land independently: split principal table, fix workspace IAM policy resource field, migrate serial to bigserial
0202.database-migration.mdCreate workspace table, add workspace to root tables, update unique indexes
0303.application-layer.mdMiddleware, store/service changes to enforce workspace scoping on API requests
0404.runners-and-auth.mdBackground runner workspace resolution, auth flow changes (login, workspace picker, JWT)