doc/plans/2026-05-06-llm-wiki-paperclip-asset-security-gate.md
Status: accepted Phase 5 policy Date: 2026-05-06 Owner: Security engineering Scope: Paperclip-derived ingestion into the LLM Wiki before any asset or work-product content indexing ships
Phase 5 remains fail-closed for Paperclip assets and work products.
/api/assets/:id/content, dereference a work-product url, scrape preview pages, or embed binary/blob content into source bundles or source snapshots.This keeps the secure path easier than the insecure one and avoids broadening the wiki into a second content-distribution channel.
These source kinds may contribute body text to Paperclip-derived source bundles:
| Source kind | Allowed body fields | Reason |
|---|---|---|
| Issue | title, description, identifier/status metadata | First-party Paperclip text under company ACL |
| Comment | body | First-party Paperclip text under company ACL |
| Document | body, title, key, revision metadata | First-party Paperclip text under company ACL |
Allowed in Phase 5:
issueId, issueCommentId, attachmentId, assetId, originalFilename, contentType, byteSize, sha256, createdAt, createdByAgentId, createdByUserIdDisallowed in Phase 5:
/api/assets/:id/contenttext/plain, text/markdown, application/json, images, SVG, PDFs, archives, or office formatscontentPath in wiki source bundles or source snapshotsAllowed in Phase 5:
issueId, workProductId, type, provider, title, status, reviewState, healthStatus, externalId, isPrimary, createdAt, updatedAthasUrl: trueDisallowed in Phase 5:
urlurl values in wiki source bundles or source snapshotsNo MIME allowlist is approved for asset content extraction in Phase 5 because no asset body extraction is approved at all.
Any future issue that wants blob parsing must define:
Metadata-only means structured facts only, not capability-bearing links.
contentPath for assets.url values.assetId, workProductId, externalId) over links.This addresses Sensitive Information Disclosure, Unsafe Consumption of APIs, and Insecure Output Handling risks.
Every metadata-only reference must preserve enough provenance to explain where it came from without reading the underlying content:
companyIdissueIdmetadata_only marker in any future reference/snapshot schemaHuman review is not required for plain metadata-only references that stay inside the allowlisted fields above.
Human review is required, with a separate security sign-off issue, before enabling any of the following:
This gate exists because the current host surfaces have different trust properties:
/api/assets/:id/content) and can carry prompt-injection or parser-risk payloadsurl, which reintroduces SSRF, token leakage, and prompt-injection risk if dereferenced automaticallyRelevant threat classes:
A follow-up implementation issue is justified only for metadata-only references.
That implementation must:
metadata_onlycontentPath or raw work-product url fields