LLM Wiki Paperclip Asset And Work-Product Security Gate

Status: accepted Phase 5 policy Date: 2026-05-06 Owner: Security engineering Scope: Paperclip-derived ingestion into the LLM Wiki before any asset or work-product content indexing ships

Decision

Phase 5 remains fail-closed for Paperclip assets and work products.

Paperclip-derived text extraction is allowed only for issue titles/descriptions, issue comments, and issue documents.
Paperclip assets/attachments and issue work products are metadata-only in Phase 5.
Linked summaries and content extraction for assets/work products are not approved in Phase 5.
No implementation may fetch /api/assets/:id/content, dereference a work-product url, scrape preview pages, or embed binary/blob content into source bundles or source snapshots.

This keeps the secure path easier than the insecure one and avoids broadening the wiki into a second content-distribution channel.

Allowed Source Kinds

These source kinds may contribute body text to Paperclip-derived source bundles:

Source kind	Allowed body fields	Reason
Issue	`title`, `description`, identifier/status metadata	First-party Paperclip text under company ACL
Comment	`body`	First-party Paperclip text under company ACL
Document	`body`, `title`, `key`, revision metadata	First-party Paperclip text under company ACL

Assets And Work Products

Assets / attachments

Allowed in Phase 5:

metadata-only references built from allowlisted structured fields already stored in Paperclip
recommended fields: issueId, issueCommentId, attachmentId, assetId, originalFilename, contentType, byteSize, sha256, createdAt, createdByAgentId, createdByUserId

Disallowed in Phase 5:

fetching asset bytes from /api/assets/:id/content
parsing any blob body, including text/plain, text/markdown, application/json, images, SVG, PDFs, archives, or office formats
storing contentPath in wiki source bundles or source snapshots
model summarization of attachment bodies

Work products

Allowed in Phase 5:

metadata-only references built from allowlisted structured fields already stored in Paperclip
recommended fields: issueId, workProductId, type, provider, title, status, reviewState, healthStatus, externalId, isPrimary, createdAt, updatedAt
optional boolean/derived metadata such as hasUrl: true

Disallowed in Phase 5:

fetching or crawling the work-product url
scraping preview pages, artifacts, pull requests, branches, commits, or custom provider targets through the wiki ingestion path
storing raw url values in wiki source bundles or source snapshots
model-authored linked summaries derived from off-record content

MIME Allowlists And Size Caps

No MIME allowlist is approved for asset content extraction in Phase 5 because no asset body extraction is approved at all.

Every asset MIME type is treated as opaque for Paperclip-derived indexing.
Existing upload limits remain storage concerns, not ingestion approvals.
Work-product destinations are also opaque regardless of MIME type or size.

Any future issue that wants blob parsing must define:

a positive MIME allowlist
per-type parser strategy
per-source size caps
sandbox/isolation requirements
prompt-injection handling
regression tests for refusal paths

Redaction Rules

Metadata-only means structured facts only, not capability-bearing links.

Do not persist contentPath for assets.
Do not persist raw work-product url values.
Do not persist query strings, fragments, signed URL tokens, or userinfo.
Prefer stable identifiers (assetId, workProductId, externalId) over links.

This addresses Sensitive Information Disclosure, Unsafe Consumption of APIs, and Insecure Output Handling risks.

Provenance Rules

Every metadata-only reference must preserve enough provenance to explain where it came from without reading the underlying content:

companyId
issueId
attachment/work-product id
producer identity when available
timestamps
an explicit metadata_only marker in any future reference/snapshot schema

Review-Required Behavior

Human review is not required for plain metadata-only references that stay inside the allowlisted fields above.

Human review is required, with a separate security sign-off issue, before enabling any of the following:

asset body extraction
work-product URL fetching
linked summaries generated from asset/work-product content
storing raw blob links or raw remote URLs in wiki source material
non-default-space routing for Paperclip-derived asset/work-product references

Security Rationale

This gate exists because the current host surfaces have different trust properties:

issue/comment/document text is first-party Paperclip content already exposed through company-scoped issue/document APIs
asset content is a blob download surface (/api/assets/:id/content) and can carry prompt-injection or parser-risk payloads
work products can point at arbitrary destinations through url, which reintroduces SSRF, token leakage, and prompt-injection risk if dereferenced automatically

Relevant threat classes:

OWASP LLM Top 10: Prompt Injection, Sensitive Information Disclosure, Insecure Output Handling, Excessive Agency
OWASP API Top 10: SSRF, Unsafe Consumption of APIs, Broken Object Property Level Authorization
Saltzer & Schroeder: Least Privilege, Fail Securely, Complete Mediation, Secure Defaults

Follow-Up Implementation Scope

A follow-up implementation issue is justified only for metadata-only references.

That implementation must:

keep assets/work products out of source-bundle body text
never fetch blob bytes or remote URLs
redact capability-bearing link fields
mark references as metadata_only
ship tests proving source bundles/snapshots never contain contentPath or raw work-product url fields