docs/plans/2025-12-19-sheet-content-addressed-storage-design.md
Migrate sheet storage from ID-based to content-addressed (SHA256-based) storage. The store.SheetMessage will reference sheet_blob table directly instead of the sheet table. The sheet table will be dropped entirely.
projects/{project}/sheets/{id} to projects/{project}/sheets/{sha256}Before:
type SheetMessage struct {
ProjectID string
Creator string
Title string
Statement string
Sha256 []byte
UID int
Size int64
CreatedAt time.Time
}
After:
type SheetMessage struct {
// SHA256 hash of the statement (hex-encoded)
Sha256 string
// SQL statement content
Statement string
// Size of the statement in bytes
Size int64
}
sheet_blob table - no changes:
CREATE TABLE sheet_blob (
sha256 bytea NOT NULL PRIMARY KEY,
content text NOT NULL
);
task_run table - replace integer foreign key:
-- Before:
CREATE TABLE task_run (
...
sheet_id integer REFERENCES sheet(id),
...
);
-- After:
CREATE TABLE task_run (
...
sheet_sha256 bytea REFERENCES sheet_blob(sha256),
...
);
sheet table - will be dropped entirely after migration.
proto/store/store/task.proto:
message Task {
// Before:
// int32 sheet_id = 4;
// After:
// SHA256 hash of the sheet content (hex-encoded)
string sheet_sha256 = 4;
}
The proto field stores the hex-encoded SHA256 string. When exposed via v1 API, it gets embedded in the full resource name projects/{project}/sheets/{sha256}.
All changes happen in a single migration. No intermediate state.
Migration SQL:
-- 1. Add new column with sha256 reference
ALTER TABLE task_run ADD COLUMN sheet_sha256 bytea REFERENCES sheet_blob(sha256);
-- 2. Backfill task_run.sheet_sha256 from sheet table
UPDATE task_run tr
SET sheet_sha256 = s.sha256
FROM sheet s
WHERE tr.sheet_id = s.id;
-- 3. Backfill task.payload JSONB (Task proto)
-- Converts sheetId (int) to sheetSha256 (hex string)
UPDATE task t
SET payload = jsonb_set(
payload - 'sheetId',
'{sheetSha256}',
to_jsonb(encode(s.sha256, 'hex'))
)
FROM sheet s
WHERE (t.payload->>'sheetId')::int = s.id
AND t.payload ? 'sheetId';
-- 4. Drop old column
ALTER TABLE task_run DROP COLUMN sheet_id;
-- 5. Drop sheet table entirely
DROP TABLE sheet;
All code changes deploy atomically with the database migration:
buf generate)store.SheetMessage structNo feature flags or compatibility layers needed.
SheetMessage struct:
type SheetMessage struct {
Sha256 string
Statement string
Size int64
}
// Remove GetSha256Hex() method - Sha256 is already hex string
GetSheetMetadata / GetSheetFull:
// Before:
func (s *Store) GetSheetMetadata(ctx context.Context, id int) (*SheetMessage, error)
func (s *Store) GetSheetFull(ctx context.Context, id int) (*SheetMessage, error)
// After:
func (s *Store) GetSheetMetadata(ctx context.Context, sha256Hex string) (*SheetMessage, error)
func (s *Store) GetSheetFull(ctx context.Context, sha256Hex string) (*SheetMessage, error)
Query changes:
// Before:
// SELECT ... FROM sheet
// LEFT JOIN sheet_blob ON sheet.sha256 = sheet_blob.sha256
// WHERE sheet.id = ?
// After:
// SELECT ... FROM sheet_blob
// WHERE sha256 = decode(?, 'hex')
CreateSheets:
// No longer creates rows in sheet table
// Just calls batchCreateSheetBlob
// Returns SheetMessage with sha256, statement, size
func (s *Store) CreateSheets(ctx context.Context, creates ...*SheetMessage) ([]*SheetMessage, error)
Run after updating task.proto:
cd proto && buf generate
Any code calling store.GetSheetMetadata(ctx, sheetID) changes to pass sha256 hex string instead of integer ID. Similarly for creating/updating tasks that reference sheets.
Before:
projects/{project}/sheets/{id}
Example: projects/my-project/sheets/123
After:
projects/{project}/sheets/{sha256}
Example: projects/my-project/sheets/a1b2c3d4e5f67890abcdef...
No change. Authorization works the same way:
Store → API:
// Store has: sheet_sha256 = "a1b2c3d4..."
// API returns: "projects/my-project/sheets/a1b2c3d4..."
sheetResourceName := fmt.Sprintf("projects/%s/sheets/%s", project, sheetSha256)
API → Store:
// Parse resource name to extract sha256
// Call store.GetSheetMetadata(ctx, sha256Hex)
Clients see sheet IDs change from integers (123) to SHA256 hashes (a1b2c3d4...). The API endpoints and request/response structure remain the same.