docs/craft/features/user-library-sync.md
How user-uploaded files (PDFs, spreadsheets, slides) get from the user's library into the agent's sandbox. Replaces the symbiotic-S3 sync that was removed in PR #11042.
Three layers, mirroring the skills pipeline:
HTTP layer backend/onyx/server/features/build/api/user_library.py
│ thin: validate input, call db helpers, return response
▼
DB layer backend/onyx/server/features/build/db/user_library.py
│ owns: file_store I/O, Document upserts, ownership checks, quota
▼
Sandbox sync backend/onyx/server/features/build/sandbox/user_library.py
builds FileSet from DB + file_store, pushes via push daemon
Files are stored in the default file store (get_default_file_store()) — same backend skills use. There is no Craft-specific S3 bucket for user files.
| Field | Where | Purpose |
|---|---|---|
| File contents | file_store (S3/local) | Raw bytes |
Document.id | CRAFT_FILE__{user_id}__{sha256(path)[:16]} | Deterministic, ownership encoded in prefix |
Document.link | file_store file_id | Pointer for retrieval |
Document.file_id | file_store file_id (mirror) | Schema-level pointer |
Document.doc_metadata | {file_path, file_size, mime_type, is_directory, sync_disabled, file_store_id} | Display + filtering |
Connector | shared "User Library" connector, DocumentSource.CRAFT_FILE | Reuses indexing infrastructure |
Credential | one per user, empty credential_json={} | Pairs with connector for ownership |
FileOrigin.USER_FILE is the file-store origin tag.
/workspace/managed/user_library/ in the sandbox pod. Same atomic symlink swap as skills — the agent sees a stable path while the daemon swaps versioned directories underneath.
| Trigger | Calls |
|---|---|
| Session creation (cold start) | hydrate_user_library(sandbox_id, user_id, db_session) |
| Existing session resume | hydrate_user_library(...) (same — re-hydrate to get latest state) |
| File upload (single or zip) | sync_user_library_to_active_sandboxes(user_id, db_session) |
| File delete | sync_user_library_to_active_sandboxes(...) |
| Toggle sync_disabled | sync_user_library_to_active_sandboxes(...) |
All synchronous — no Celery, matching the skills pattern.
build_user_library_fileset(user_id, db_session)
1. list_user_files() → CRAFT_FILE Document rows for user
2. Filter: skip is_directory=True, skip sync_disabled=True
3. For each remaining doc: file_store.read_file(doc.link)
4. Strip leading "/" from file_path (push daemon rejects absolute paths)
5. Return FileSet { relative_path: bytes }
│
▼
sandbox_manager.push_to_sandbox / push_to_sandboxes
mount_path="/workspace/managed/user_library"
→ tar.gz, Ed25519-sign, POST to push daemon, atomic swap
Empty filesets push too (clears stale files via the swap).
backend/onyx/server/features/build/db/user_library.py:
| Function | Purpose |
|---|---|
get_or_create_craft_connector(db, user) | Returns (connector_id, credential_id). Idempotent. |
get_user_storage_bytes(db, user_id) | SQL aggregation for quota |
build_document_id(user_id, path) | Deterministic doc_id |
list_user_files(db, user_id) | All CRAFT_FILE docs for the user |
fetch_user_file_for_user(db, doc_id, user_id) | Lookup + ownership check; raises OnyxError(NOT_FOUND) on miss |
store_user_file(db, ..., file_path, content, mime_type) | Save to file store + upsert Document. Returns (doc_id, file_id, old_blob_id_to_delete). The new blob is saved first; the caller must pass old_blob_id_to_delete to cleanup_old_blobs after their final commit. |
cleanup_old_blobs(blob_ids) | Delete superseded blobs. Must be called after the final DB commit — if called before and the commit fails, the document rolls back to point at the now-deleted blob. |
create_directory_record(db, user_id, connector_id, credential_id, dir_path) | Virtual directory document (no file store object) |
set_sync_disabled(db, user_id, doc, sync_disabled) | Toggle file or directory (recursive into children) |
delete_user_file(db, doc) | Delete blob from file store + Document row |
The HTTP layer (api/user_library.py) calls these directly — no business logic in the endpoints.
Encoded in the document_id prefix: CRAFT_FILE__{user_id}__{hash}. fetch_user_file_for_user checks the prefix matches the calling user before any DB lookup. A foreign user's doc_id returns NOT_FOUND (not 403), so existence is never leaked across users.
New:
backend/onyx/server/features/build/sandbox/user_library.py — sync modulebackend/tests/external_dependency_unit/craft/test_user_library_sync.py — ext-dep testsModified:
backend/onyx/server/features/build/db/user_library.py — added all the CRUD/storage helpersbackend/onyx/server/features/build/api/user_library.py — thinned to call db helpers; PersistentDocumentWriter usage removed; HTTPException → OnyxErrorbackend/onyx/server/features/build/session/manager.py — _hydrate_user_library called in both create_session__no_commit and get_or_create_empty_sessionbackend/onyx/skills/push.py — per-failure logging (consistent with user_library push logging)backend/tests/external_dependency_unit/craft/conftest.py — SandboxHandle.provision_for now returns (Sandbox, Path) tuple, eliminating duplicated _provision_with_status helpersDeleted:
backend/onyx/server/features/build/indexing/persistent_document_writer.pybackend/tests/external_dependency_unit/craft/test_persistent_document_writer.pyTests follow the layered paradigm in docs/craft/tests/coverage-and-overview.md.
test_user_library_sync.py)Real Postgres + real LocalSandboxManager on tmp_path. Seeds via the same DB helpers production uses (no DocumentMetadata construction in test code).
test_hydrate_pushes_files_to_sandbox — happy pathtest_sync_disabled_files_excluded — filter contract via set_sync_disabledtest_directories_excluded_from_fileset — directory exclusion via create_directory_recordtest_sync_after_delete_removes_file — atomic swap cleanup after delete_user_filetest_user_library_api.py, pre-existing)HTTP-level coverage including cross-user 404, toggle, delete, upload caps. Already on the test-overhaul branch.
No new K8s tests — push daemon contract (signed tarball, atomic swap) is already covered by existing K8s tests. User library sync reuses write_files_to_sandbox() unchanged.