internal/ai/vision/openai/README.md
Last Updated: November 14, 2025
This package contains PhotoPrism’s adapter for the OpenAI Responses API. It enables existing caption and label workflows (GenerateCaption, GenerateLabels, and the photoprism vision run CLI) to call OpenAI models alongside TensorFlow and Ollama without changing worker or API code. The implementation focuses on predictable results, structured outputs, and clear observability so operators can opt in gradually.
internal/ai/vision/api_client.go) and must honour PhotoPrism’s timeout, logging, and ACL rules.output_text responses are parsed both as JSON and as plain captions.detail=low) and capped token budgets (512 caption, 1024 labels).Service.Key) with fallbacks to OPENAI_API_KEY / _FILE. Logs must redact sensitive data.vision.yml.generate model type or combined caption/label endpoint (reserved for a later phase).gpt-5-nano, gpt-5-mini). These models support image inputs, structured outputs, and deterministic settings. Set Name to the exact provider identifier so defaults are applied correctly. Caption models share the same configuration surface and run through the same adapter.defaults.go. Captions use a single-sentence instruction; labels use LabelPromptDefault (or LabelPromptNSFW when PhotoPrism requests NSFW metadata). Custom prompts should retain schema reminders so structured outputs stay valid.schema.LabelsJsonSchema(nsfw); the response format name is derived via schema.JsonSchemaName (e.g. photoprism_vision_labels_v1). Captions omit schemas unless operators explicitly request a structured format.System, Prompt, Schema, and Options unset yields stable output with minimal configuration. Override them only when domain-specific language or custom scoring is necessary, and add regression tests alongside.Budget-conscious operators can experiment with lighter prompts or lower-resolution thumbnails, but should keep token limits and determinism settings intact to avoid unexpected bills and UI churn.
reasoning.effort=low) has negligible impact but improves traceability.OPENAI_API_KEY / OPENAI_API_KEY_FILE — fallback credentials when a model’s Service.Key is unset.PHOTOPRISM_VISION_* variables remain authoritative (see the Getting Started Guide for full lists).vision.yml ExamplesModels:
- Type: caption
Name: gpt-5-nano
Engine: openai
Disabled: false # opt in manually
Resolution: 720 # optional; default is 720
Options:
Detail: low # optional; defaults to low
MaxOutputTokens: 512
Service:
Uri: https://api.openai.com/v1/responses
FileScheme: data
Key: ${OPENAI_API_KEY}
- Type: labels
Name: gpt-5-mini
Engine: openai
Disabled: false
Resolution: 720
Options:
Detail: low
MaxOutputTokens: 1024
ForceJson: true # redundant but explicit
Service:
Uri: https://api.openai.com/v1/responses
FileScheme: data
Key: ${OPENAI_API_KEY}
Keep TensorFlow entries in place so PhotoPrism falls back when the external service is unavailable.
data: URLs (base64) for all OpenAI models.vision.Thumb(ModelTypeCaption|Labels)).MaxOutputTokens raised to 512 (caption) / 1024 (labels); ForceJson=false for captions, true for labels; reasoning.effort="low".Temperature and TopP set to 0 for gpt-5* models; inherited values (0.1/0.9) remain for other engines. openaiBuilder.Build performs this override while preserving the struct defaults for non-OpenAI adapters.schema.JsonSchemaName, so operators may omit SchemaVersion.text.format with type: "json_schema" and a schema name derived from the content. The parser then prefers output_json, but also attempts to decode output_text payloads that contain JSON objects.temperature=0 and top_p=0 to minimise variance, while still allowing developers to override values in vision.yml if needed.reasoning.effort="low" so OpenAI returns structured reasoning usage counters, helping operators track token consumption.OpenAI calls respect the existing limiter.Auth configuration used by the vision service. Failed requests surface standard HTTP errors and are not automatically retried; operators should ensure they have adequate account limits and consider external rate limiting when sharing credentials.
go test ./internal/ai/vision/openai ./internal/ai/vision -run OpenAI -count=1. Fixtures under internal/ai/vision/openai/testdata/ replay real Responses payloads (captions and labels).photoprism vision run -m labels --count 1 --force with trace logging enabled to inspect sanitised Responses.openai) in the UI or via photoprism vision ls.internal/ai/vision/openai (defaults, schema helpers, transport, tests).internal/ai/vision/api_request.go, api_client.go, engine_openai.go, engine_openai_test.go.internal/workers/vision.go, internal/commands/vision_run.go.internal/ai/vision/schema, pkg/clean, pkg/media.generate model type that combines captions, labels, and optional markers.