docs/plans/saas/01.prepare-for-saas.md
Preparatory refactors that can land independently on the current single-workspace codebase. Each section is a self-contained change with no dependency on the workspace table.
principal into Three TablesThe principal table currently stores three entity types (END_USER, SERVICE_ACCOUNT, WORKLOAD_IDENTITY) differentiated by a type column. In SaaS mode:
principal (END_USER) becomes the global identity table -- no workspace, since a user can belong to multiple workspaces.service_account and workload_identity become workspace-scoped tables with workspace.The store layer already treats them as separate entities (principal.go, principal_service_account.go, principal_workload_identity.go), all querying the same table with WHERE type = ?. Splitting the table makes the schema match the code.
CREATE TABLE principal (
id serial PRIMARY KEY,
deleted boolean NOT NULL DEFAULT FALSE,
created_at timestamptz NOT NULL DEFAULT now(),
type text NOT NULL CHECK (type IN ('END_USER', 'SERVICE_ACCOUNT', 'WORKLOAD_IDENTITY')),
name text NOT NULL,
email text NOT NULL,
password_hash text NOT NULL,
phone text NOT NULL DEFAULT '',
mfa_config jsonb NOT NULL DEFAULT '{}',
profile jsonb NOT NULL DEFAULT '{}',
project text REFERENCES project(resource_id),
CONSTRAINT principal_project_type_check CHECK (
(type = 'END_USER' AND project IS NULL) OR
(type IN ('SERVICE_ACCOUNT', 'WORKLOAD_IDENTITY'))
)
);
creator / deleter Columns13 columns across the schema have REFERENCES principal(email) ON UPDATE CASCADE. Splitting the table requires dropping these FKs since creator emails can belong to any of the three types.
All 13 FK references to principal(email):
| # | Table | Column | Constraint | Can be SA/WI? |
|---|---|---|---|---|
| 1 | plan | creator | NOT NULL, ON UPDATE CASCADE | Yes |
| 2 | task_run | creator | nullable, ON UPDATE CASCADE ON DELETE SET NULL | Yes |
| 3 | issue | creator | NOT NULL, ON UPDATE CASCADE | Yes |
| 4 | issue_comment | creator | NOT NULL, ON UPDATE CASCADE | Yes |
| 5 | query_history | creator | NOT NULL, ON UPDATE CASCADE | Yes |
| 6 | worksheet | creator | NOT NULL, ON UPDATE CASCADE | Yes |
| 7 | worksheet_organizer | principal | NOT NULL, ON UPDATE CASCADE | Yes |
| 8 | revision | deleter | nullable, ON UPDATE CASCADE | Yes |
| 9 | release | creator | NOT NULL, ON UPDATE CASCADE | Yes |
| 10 | access_grant | creator | NOT NULL, ON UPDATE CASCADE | Yes |
| 11 | oauth2_authorization_code | user_email | NOT NULL, ON UPDATE CASCADE | No (END_USER only) |
| 12 | oauth2_refresh_token | user_email | NOT NULL, ON UPDATE CASCADE | No (END_USER only) |
| 13 | web_refresh_token | user_email | NOT NULL, ON UPDATE CASCADE | No (END_USER only) |
Drop FKs on #1-10 (creator/deleter can be SA/WI). Keep FKs on #11-13 (END_USER only, auth tokens).
Where creator/deleter is queried in code (6 places):
| Store file | Query | Has index? |
|---|---|---|
issue.go:250 | WHERE issue.creator = ? | Yes (idx_issue_creator) |
plan.go:301 | WHERE plan.creator = ? | No -- needs adding |
query_history.go:126 | WHERE query_history.creator = ? | Yes (idx_query_history_creator_created_at_project_id) |
worksheet.go:423,479 | WHERE worksheet.creator = ? / != ? | Yes (idx_worksheet_creator_project) |
access_grant.go:350 | WHERE access_grant.creator = ? | Yes (idx_access_grant_project_creator_expire_time) |
project.go:500-533 | Cascading delete by SELECT email FROM principal WHERE project = ? | N/A (will query new tables) |
Other creator/deleter columns (task_run.creator, issue_comment.creator, revision.deleter, release.creator) are only SELECTed, never used in WHERE filters.
Performance impact: none. All creator queries use WHERE creator = 'email' with the stored email value -- they do NOT JOIN to principal. Splitting the table has zero impact on these queries. The only lookup that resolves a creator email to its entity is GetPrincipalByEmail() in the application layer, which already dispatches by email format (IsServiceAccountEmail / IsWorkloadIdentityEmail).
ON UPDATE CASCADE replacement: Only needed for END_USER email changes. UpdateUserEmail() already manually updates policies, issue payloads, groups, and audit logs. After dropping FKs, add explicit UPDATE statements for the 10 creator/deleter columns to that function.
-- Step 1: Create service_account table
CREATE TABLE service_account (
id bigserial PRIMARY KEY,
deleted boolean NOT NULL DEFAULT FALSE,
created_at timestamptz NOT NULL DEFAULT now(),
name text NOT NULL,
email text NOT NULL,
password_hash text NOT NULL,
project text REFERENCES project(resource_id)
);
CREATE UNIQUE INDEX idx_service_account_unique_email ON service_account(email);
CREATE INDEX idx_service_account_project ON service_account(project) WHERE project IS NOT NULL;
-- Migrate SERVICE_ACCOUNT rows
INSERT INTO service_account (id, deleted, created_at, name, email, password_hash, project)
SELECT id, deleted, created_at, name, email, password_hash, project
FROM principal WHERE type = 'SERVICE_ACCOUNT';
-- Reset sequence to max id
SELECT setval('service_account_id_seq', GREATEST(COALESCE((SELECT MAX(id) FROM service_account), 0), 100) + 1, false);
-- Step 2: Create workload_identity table
CREATE TABLE workload_identity (
id bigserial PRIMARY KEY,
deleted boolean NOT NULL DEFAULT FALSE,
created_at timestamptz NOT NULL DEFAULT now(),
name text NOT NULL,
email text NOT NULL,
project text REFERENCES project(resource_id),
-- Extracted from principal.profile->'workloadIdentityConfig'
config jsonb NOT NULL DEFAULT '{}'
);
CREATE UNIQUE INDEX idx_workload_identity_unique_email ON workload_identity(email);
CREATE INDEX idx_workload_identity_project ON workload_identity(project) WHERE project IS NOT NULL;
-- Migrate WORKLOAD_IDENTITY rows (extract config from profile JSON)
INSERT INTO workload_identity (id, deleted, created_at, name, email, project, config)
SELECT id, deleted, created_at, name, email, project,
COALESCE(profile->'workloadIdentityConfig', '{}')
FROM principal WHERE type = 'WORKLOAD_IDENTITY';
-- Reset sequence to max id
SELECT setval('workload_identity_id_seq', GREATEST(COALESCE((SELECT MAX(id) FROM workload_identity), 0), 100) + 1, false);
-- Step 3: Drop FK constraints on creator/deleter columns that can reference SA/WI emails.
-- Keep FKs on oauth2_authorization_code, oauth2_refresh_token, web_refresh_token (END_USER only).
ALTER TABLE plan DROP CONSTRAINT IF EXISTS plan_creator_fkey;
ALTER TABLE task_run DROP CONSTRAINT IF EXISTS task_run_creator_fkey;
ALTER TABLE issue DROP CONSTRAINT IF EXISTS issue_creator_fkey;
ALTER TABLE issue_comment DROP CONSTRAINT IF EXISTS issue_comment_creator_fkey;
ALTER TABLE query_history DROP CONSTRAINT IF EXISTS query_history_creator_fkey;
ALTER TABLE worksheet DROP CONSTRAINT IF EXISTS worksheet_creator_fkey;
ALTER TABLE worksheet_organizer DROP CONSTRAINT IF EXISTS worksheet_organizer_principal_fkey;
ALTER TABLE revision DROP CONSTRAINT IF EXISTS revision_deleter_fkey;
ALTER TABLE release DROP CONSTRAINT IF EXISTS release_creator_fkey;
ALTER TABLE access_grant DROP CONSTRAINT IF EXISTS access_grant_creator_fkey;
-- Step 4: Add missing index for plan.creator (queried in plan filter)
CREATE INDEX idx_plan_creator ON plan(creator);
-- Step 5: Clean up principal table -- remove non-END_USER rows, drop type and project columns
DELETE FROM principal WHERE type != 'END_USER';
ALTER TABLE principal DROP CONSTRAINT IF EXISTS principal_project_type_check;
ALTER TABLE principal DROP CONSTRAINT IF EXISTS principal_type_check;
ALTER TABLE principal DROP COLUMN type;
ALTER TABLE principal DROP COLUMN project;
DROP INDEX IF EXISTS idx_principal_project;
Store layer -- principal_service_account.go and principal_workload_identity.go already have separate message types and query functions. The only change is updating the SQL to query service_account / workload_identity tables instead of principal WHERE type = ?.
Key files to update:
backend/store/principal.go -- remove type from SELECT/INSERT/UPDATE, remove UserTypes filter from FindUserMessagebackend/store/principal.go CreateUser() -- remove type from INSERT (all rows are END_USER)backend/store/principal.go GetPrincipalByEmail() -- already dispatches by email format, just update to query new tablesbackend/store/principal.go BatchGetUsersByEmails() -- needs to handle cross-table lookup (query principal for END_USER emails, service_account for SA emails, workload_identity for WI emails)backend/store/principal.go StatUsers() -- query all three tables and aggregatebackend/store/principal.go UpdateUserEmail() -- add explicit UPDATE statements for the 10 creator/deleter columns that lost ON UPDATE CASCADEbackend/store/principal_service_account.go -- change FROM principal WHERE type = 'SERVICE_ACCOUNT' to FROM service_accountbackend/store/principal_workload_identity.go -- change FROM principal WHERE type = 'WORKLOAD_IDENTITY' to FROM workload_identitybackend/store/principal_filter.go -- update GetAccountListFilter to use table name instead of principal. prefixbackend/store/project.go -- update cascading delete (lines 498-540) to query service_account / workload_identity instead of principal WHERE project = ?backend/store/stats.go -- simplify CountActiveEndUsers (no type filter needed)backend/store/principal_end_user.go -- simplify (no type checks needed)Migration test -- update TestLatestVersion in backend/migrator/migrator_test.go.
Currently, workspace-level IAM policies in the policy table have resource_type = 'WORKSPACE' but resource = '' (empty string). For SaaS mode, each workspace needs its own IAM policy, so we need the resource field to contain the workspace identifier (workspaces/{id}) to distinguish them.
-- Current: resource is empty for workspace policies
SELECT resource_type, resource, type FROM policy
WHERE resource_type = 'WORKSPACE' AND type = 'IAM';
-- Returns: ('WORKSPACE', '', 'IAM')
The workspace identifier is stored in the setting table:
SELECT value->>'workspaceId' FROM setting WHERE name = 'WORKSPACE_PROFILE';
-- Returns the workspace ID (e.g., 'abc123')
-- Set resource = 'workspaces/{workspace}' for all workspace-level policies
UPDATE policy
SET resource = 'workspaces/' || (
SELECT value->>'workspaceId'
FROM setting
WHERE name = 'WORKSPACE_PROFILE'
)
WHERE resource_type = 'WORKSPACE' AND resource = '';
Update the store/service layer where workspace policies are queried or created:
backend/store/policy.go GetWorkspaceIamPolicy() -- currently queries with just ResourceType: WORKSPACE and no resource filter. Update to include the workspace resource name.resource = 'workspaces/{id}' is set.-- The existing unique index on policy is:
-- idx_policy_unique_resource_type_resource_type ON policy(resource_type, resource, type)
-- This already works -- two workspace policies with different resource values are distinct.
For SaaS, tables need different ID strategies depending on their scope:
id (convert to bigserial), auto-increment per project so IDs are human-readable within each project (e.g., project A: issue #1, #2, #3; project B: issue #1, #2, #3).id, auto-increment per instance.id to TEXT PRIMARY KEY (uuid) since these have no natural scope boundary.workspace column for multi-tenancy.| Table | Current PK | Scope | Migration |
|---|---|---|---|
| Top-level (no workspace) | |||
principal | serial | Global | Deprecate id, use email as unique key |
web_refresh_token | token_hash text | User-scoped | None |
sheet_blob | sha256 bytea | Content-addressed | None |
replica_heartbeat | replica_id text | Infra | None |
| Instance-scoped (globally unique instance) | |||
instance | resource_id text | Global | None |
db | (instance, name) | Instance | None |
db_schema | (instance, db_name) | Instance | None |
revision | bigserial | Instance | Per-instance auto-increment id, add FK to instance(resource_id) |
sync_history | bigserial | Instance | Per-instance auto-increment id |
changelog | bigserial | Instance | Per-instance auto-increment id |
Workspace-scoped (need workspace) | |||
idp | resource_id text | Workspace | Add workspace |
project | resource_id text | Workspace | Add workspace |
service_account | email text | Workspace | Add workspace |
workload_identity | email text | Workspace | Add workspace |
setting | name text | Workspace | Add workspace |
role | resource_id text | Workspace | Add workspace |
policy | (resource_type, resource, type) | Workspace | Add workspace |
user_group | id text | Workspace | Add workspace |
review_config | id text | Workspace | Add workspace |
oauth2_client | client_id text | Workspace | Add workspace |
oauth2_authorization_code | code text | Workspace (via client) | Add workspace |
oauth2_refresh_token | token_hash text | Workspace (via client) | Add workspace |
| Project-scoped (per-project auto-increment id) | |||
project_webhook | serial | Project | Drop id, use resource_id as PK (no composite PK needed — id is unused, all lookups use resource_id) |
plan | bigserial | Project | Per-project auto-increment id |
issue | serial | Project | Per-project auto-increment id |
worksheet | serial | Project | Composite PK (project, id), keep resource_id as external identifier |
plan_check_run | serial | Project (via plan) | Add project ref |
plan_webhook_delivery | plan_id bigint | Project (via plan) | Add project ref |
task | serial | Project (via plan) | Add project ref |
task_run | serial | Project (via task) | Add project ref |
task_run_log | (task_run_id, created_at) | Project (via task_run) | Add project ref |
issue_comment | bigserial | Project (via issue) | Drop id, use resource_id as PK (no composite PK needed — id is unused, all lookups use resource_id) |
worksheet_organizer | (worksheet_id, principal) | Project (via worksheet) | Rename worksheet_id to worksheet text, FK to worksheet(resource_id) |
db_group | (project, resource_id) | Project | None (already has project) |
release | (project, train, iteration) | Project | None (already has project) |
| Cross-workspace (TEXT PK with uuid) | |||
audit_log | bigserial | Cross-workspace | Change to id text PK (uuid) |
query_history | bigserial | Cross-workspace | Change to id text PK (uuid), fix project_id FK reference |
instance_change_history | bigserial | Cross-workspace | Change to id text PK (uuid) |
export_archive | serial | Cross-workspace | Change to id text PK (uuid) |
| Skip | |||
access_grant | id text | Project | None (already text PK) |
id ColumnsDrop id columns from tables that never use them — all lookups, JOINs, and FK references use natural keys.
Done in PR #19582 for: idp, project, setting, role, policy, instance, db, db_schema, task_run_log, db_group, release.
Done in Step 2c: worksheet_organizer — renamed worksheet_id to worksheet text, FK to worksheet(resource_id), PK (worksheet, principal). No project column needed.
Add the scope column (project or instance) to child tables, change PKs from (id) to (project, id) or (instance, id), and update all FK references to be composite. This ensures every query includes the scope, which is required for per-scope auto-increment in Step 3 and for tenant isolation in SaaS.
Split by FK chain:
Step 2a: plan chain
Tables: plan, task, task_run, task_run_log, plan_check_run, plan_webhook_delivery
Schema changes:
plan(id) → plan(project, id)
-- plan already has `project` column, just change PK
plan_check_run(id) → plan_check_run(project, id)
-- add `project` column, backfill from plan
-- plan_check_run.plan_id: update UNIQUE index to (project, plan_id)
plan_webhook_delivery(plan_id) → plan_webhook_delivery(project, plan_id)
-- add `project` column, backfill from plan
-- PK becomes (project, plan_id)
task(id) → task(project, id)
-- add `project` column, backfill from plan
-- task.plan_id FK → (project, plan_id) REFERENCES plan(project, id)
task_run(id) → task_run(project, id)
-- add `project` column, backfill from task
-- task_run.task_id FK → (project, task_id) REFERENCES task(project, id)
-- UNIQUE(task_id, attempt) → UNIQUE(project, task_id, attempt)
task_run_log(task_run_id, created_at) → task_run_log(project, task_run_id, created_at)
-- add `project` column, backfill from task_run
-- task_run_log.task_run_id FK → (project, task_run_id) REFERENCES task_run(project, id)
Code changes: all SQL queries AND application logic must use (project, id) together to identify a row — not id alone. This includes:
Signal.uid) must also carry the projectStep 2b: issue chain
Tables: issue, issue_comment
Schema changes:
issue(id) → issue(project, id)
-- issue already has `project` column, just change PK
-- issue.plan_id FK → UNIQUE index on plan(project, plan_id)... but issue.plan_id
-- is nullable and references plan(id). After Step 2a, need composite FK.
-- UNIQUE(plan_id) → UNIQUE(project, plan_id)
issue_comment: drop `id` column, use `resource_id` as PK
-- `id` is unused: no FK references it, all lookups use `resource_id`
-- add `project` column, backfill from issue
-- issue_comment.issue_id FK → (project, issue_id) REFERENCES issue(project, id)
Step 2c: worksheet chain
Tables: worksheet, worksheet_organizer
Schema changes:
worksheet(id) → worksheet(project, id)
-- worksheet already has `project` and `resource_id` columns, just change PK
-- keep `resource_id` (globally unique UUID) — used as the external identifier
in API resource names and as the FK target for worksheet_organizer
-- API resource name: projects/{project}/worksheets/{resource_id}
worksheet_organizer(worksheet_id integer, principal)
→ worksheet_organizer(worksheet text, principal)
-- rename `worksheet_id` to `worksheet`, change type from integer to text
-- backfill with worksheet.resource_id
-- FK: worksheet_organizer.worksheet REFERENCES worksheet(resource_id) ON DELETE CASCADE
-- no `project` column needed — resource_id is globally unique
Code changes:
resource_name.go: GetProjectIDWorksheetID returns (projectID, worksheetID string) — worksheetID is the UUIDFormatWorksheet(projectID, worksheetID string) — both args are stringsresource_id for all single-record operations (get, patch, delete)WorkSheetMessage.UID removed, ResourceID is the primary identifierWorksheetOrganizerMessage uses WorksheetResourceID string — no project neededextractWorksheetID extracts UUID from resource name (no Number() conversion)Step 2d: project_webhook
project_webhook: drop `id` column, use `resource_id` as PK
-- `id` is unused: no FK references it, all lookups use `resource_id`
-- already has `project` column, no other schema changes needed
Step 2e: Instance-scoped tables
Tables: revision, sync_history, changelog
Instead of composite PKs (instance, id), promote resource_id to PK for all three tables. This is simpler — avoids composite FK rewiring and matches the pattern used for project_webhook and issue_comment.
Schema changes:
revision: drop `id` as PK, use `resource_id text PRIMARY KEY DEFAULT gen_random_uuid()::text`
-- `id` is kept as `bigserial` (no PK) for backward compat but not used as PK
-- drop unique index on resource_id (now covered by PK)
sync_history: add `resource_id text` column, promote to PK
-- `id` is kept as `bigserial` (no PK) for backward compat
-- add `resource_id text PRIMARY KEY DEFAULT gen_random_uuid()::text`
-- backfill existing rows with gen_random_uuid()::text
changelog: drop `id` as PK, use `resource_id text PRIMARY KEY DEFAULT gen_random_uuid()::text`
-- `id` is kept as `bigserial` (no PK) for backward compat
-- drop unique index on resource_id (now covered by PK)
-- replace `sync_history_id bigint` with `sync_history text REFERENCES sync_history(resource_id)`
-- backfill: UPDATE changelog SET sync_history = sync_history.resource_id FROM sync_history WHERE changelog.sync_history_id = sync_history.id
-- drop `sync_history_id` column after backfill
Store layer changes:
sync_history.go: GetSyncHistoryByUID(ctx, uid) → GetSyncHistory(ctx, resourceID), CreateSyncHistory returns (string, error) (resource_id)changelog.go: SyncHistoryUID *int64 → SyncHistory *string (text FK), FindChangelogMessage.InstanceID becomes required (string not *string)revision.go: FindRevisionMessage.InstanceID becomes required (string not *string)schemasync/syncer.go: SyncDatabaseSchemaToHistory returns (string, error) instead of (int64, error)After Step 2, project-scoped tables have composite PKs (project, id). Instance-scoped tables (revision, sync_history, changelog) use resource_id (uuid) as PK instead. Now switch project-scoped tables from global sequences to per-project auto-increment.
Convert serial/bigserial to bigint (drop the global sequence default). Implement per-scope ID generation — options:
id_sequence(scope_type, scope_id, table_name, next_val) with atomic UPDATE ... RETURNING.SELECT MAX(id) + 1 within the insert transaction with row-level locking on the scope.SELECT COALESCE(MAX(id), 0) + 1 with advisory lock on (project, table_name).Tables affected:
plan, issue, plan_check_run, task, task_run
project_webhook and issue_comment use resource_id as PK — no per-project auto-increment needed)worksheet uses resource_id (UUID) as the external identifier — per-project auto-increment not needed for API, but id still auto-increments via sequence for the composite PK)revision, sync_history, changelog now use resource_id (uuid) as PK (Step 2e), no per-instance auto-increment neededFor tables with no natural scope, add resource_id text column and promote to PK. Skip instance_change_history — it's only used for Bytebase's own metadata migration and doesn't need the change.
| Table | Current PK | After PK | Notes |
|---|---|---|---|
audit_log | id bigserial | resource_id text | Old id kept as non-PK |
query_history | id bigserial | resource_id text | Old id kept as non-PK; change resource name from queryHistories/{id} to projects/{project}/queryHistories/{id} |
export_archive | id serial | resource_id text | Old id kept as non-PK until JSONB migration completes |
Migration approach (same for all three tables):
ALTER TABLE <table> ADD COLUMN resource_id text DEFAULT gen_random_uuid()::textUPDATE <table> SET resource_id = gen_random_uuid()::text WHERE resource_id IS NULLALTER TABLE <table> ALTER COLUMN resource_id SET NOT NULLid, add PK on resource_idid column stays as non-PK (drop later in cleanup)export_archive JSONB migration: TaskRunResult.export_archive_uid (int32) → TaskRunResult.export_archive_id (string). Requires:
export_archive_uid → export_archive_id, change type int32 → stringUPDATE task_run SET result = jsonb_set(result - 'exportArchiveUid', '{exportArchiveId}', to_jsonb(ea.resource_id)) FROM export_archive ea WHERE (result->>'exportArchiveUid')::int = ea.id (old id column is still present for the join)ExportArchiveUid → ExportArchiveIdid column on export_archive can be droppedquery_history resource name change: queryHistories/{id} → projects/{project}/queryHistories/{id}
QueryHistory.name format commentSearchQueryHistories to format/parse with project prefixquery_history already has project_id column, no schema change needed for this| JSONB Location | Proto Field | Type | References | Needs migration? |
|---|---|---|---|---|
task_run.result | TaskRunResult.export_archive_uid | int32 | export_archive.id | Yes (Step 4) — rename to export_archive_id, change to string, DML to backfill with export_archive.resource_id |
access_grant.payload | AccessGrantPayload.issue_id | int64 | issue.id | No — issue.id stays integer (per-project bigint) |
| PG NOTIFY channel | Signal.uid | int32 | task_run.id or plan_check_run.id | Yes (Step 3) — IDs are only unique per project after Step 3, so Signal must include project alongside uid to uniquely identify the resource |
(id) to (project, id) requires dropping and recreating all dependent FKs, unique indexes, and constraints (e.g., UNIQUE(task_id, attempt) → UNIQUE(project, task_id, attempt)).plan_check_run is a singleton per plan — unique index on plan_id becomes (project, plan_id).issue.plan_id is nullable — composite FK (project, plan_id) referencing plan(project, id) needs special handling for NULL plan_id.