docs/superpowers/plans/2026-05-02-transcription-settings.md
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Replace the implicit "first AI provider with an API key wins" transcription flow with an explicit, instance-level TranscriptionConfig that names a provider, model, default language, and prompt hint — enabling Whisper / Groq / self-hosted Whisper-compatible endpoints, restoring multi-provider flexibility, and exposing the Whisper API's prompt field for proper-noun spelling hints.
Architecture: Schema-additive. Add TranscriptionConfig (provider_id, model, language, prompt) to both proto/store/instance_setting.proto and proto/api/v1/instance_service.proto under InstanceAISetting / InstanceSetting.AISetting. Server-side Transcribe resolves provider/model/language/prompt from the persisted config when not overridden in the request, falling through to the existing DefaultTranscriptionModel for the model. UpdateInstanceSetting validates transcription.provider_id references an existing provider and that the persisted config's strings respect length caps. Frontend splits the existing AISection into two groups — "AI Integrations" (existing providers list, renamed in copy) and "Transcription" (new four-field form: Provider / Model / Language / Prompt) — and the home MemoEditor reads aiSetting.transcription instead of scanning providers.
Tech Stack: Backend Go 1.26, Connect RPC + protobuf via buf (remote plugins), github.com/pkg/errors, gRPC status.Errorf. Frontend React 18 + TypeScript 6, @bufbuild/protobuf v2, Connect-ES, Tailwind v4, Radix UI primitives via @/components/ui/*. Tests: Go testing + testify/require; frontend manual verification (no component tests in Settings/).
Spec: docs/superpowers/specs/2026-05-02-transcription-settings-design.md
Branch note: This work is intended for a fresh worktree off main (e.g. feat/transcription-settings). The spec was committed on feat/calendar-date-prefill because that branch was active at brainstorm time; before starting Task 1, create a new worktree:
git worktree add -b feat/transcription-settings ../memos-transcription main
cd ../memos-transcription
git cherry-pick <spec-commit-sha> # bring the spec doc onto the new branch
Created
Modified — protobuf source (changes regenerate Go + TS + OpenAPI via buf generate)
proto/store/instance_setting.proto — add TranscriptionConfig message and transcription field on InstanceAISetting.proto/api/v1/instance_service.proto — add parallel TranscriptionConfig message and transcription field inside the nested AISetting message.Modified — backend Go
server/router/api/v1/instance_service.go — extend convertInstanceAISettingFromStore / convertInstanceAISettingToStore to round-trip transcription; extend prepareInstanceAISettingForUpdate to validate transcription.provider_id exists in providers[] (when set) and length-cap model / language / prompt; preserve unchanged transcription fields when an UpdateInstanceSetting request omits them.server/router/api/v1/ai_service.go — read InstanceAISetting.transcription at the start of Transcribe; resolve provider_id / model / language / prompt via "request override → persisted setting → engine default"; return FailedPrecondition when no provider can be resolved; remove the now-redundant provider_id REQUIRED gate (becomes optional in the proto).proto/api/v1/ai_service.proto — relax TranscribeRequest.provider_id from REQUIRED to OPTIONAL.Modified — backend tests
server/router/api/v1/test/ai_service_test.go — add cases: persisted transcription.provider_id resolves when request omits it; persisted transcription.model overrides default; per-call Config.prompt wins over persisted prompt; FailedPrecondition when neither request nor setting names a provider.server/router/api/v1/test/instance_service_test.go — add cases for the new validation: unknown transcription.provider_id rejected; oversized model / language / prompt rejected; existing transcription preserved when the field is omitted on update.Modified — frontend
web/src/components/Settings/AISection.tsx — restructure into two SettingGroup blocks: "AI Integrations" (existing provider table) and "Transcription" (new). Add TranscriptionForm component co-located in the same file or split if it grows past ~120 LOC. Wire local state, change tracking via lodash-es/isEqual, save to the same InstanceSetting_Key.AI setting.web/src/components/MemoEditor/index.tsx — replace the transcriptionProvider lookup with a canTranscribe boolean derived from aiSetting.transcription.providerId plus the referenced provider's existence and apiKeySet.web/src/components/MemoEditor/services/transcriptionService.ts — drop the provider parameter; call transcribe() with no providerId (server resolves from the setting).web/src/locales/en.json — add new strings for the Transcription form. Other locale files are left for the maintainer's translation pass (consistent with how byok-* strings were originally added).TranscriptionConfig to the store protoFiles:
proto/store/instance_setting.protoThe store-level TranscriptionConfig is the persistent shape written to disk. Field numbers are fresh (1–4); the new field on InstanceAISetting reuses the next slot (2).
proto/store/instance_setting.protoIn the file, locate message InstanceAISetting { ... } (around lines 149–152) and replace it with the version below. Then append the new TranscriptionConfig message immediately after the existing AIProviderConfig block (after the AIProviderType enum at the bottom of the file).
message InstanceAISetting {
// providers is the list of AI provider configurations available instance-wide.
repeated AIProviderConfig providers = 1;
// transcription is the speech-to-text feature configuration.
// When unset or transcription.provider_id is empty, transcription is disabled.
TranscriptionConfig transcription = 2;
}
After the existing enum AIProviderType { ... } block, append:
// TranscriptionConfig configures the speech-to-text feature.
message TranscriptionConfig {
// provider_id references an entry in InstanceAISetting.providers[].id.
// Empty string means transcription is disabled.
string provider_id = 1;
// model is the provider-specific model identifier.
// Empty string falls back to the engine default
// (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
string model = 2;
// language is the default ISO 639-1 language hint sent to the provider.
// Empty string lets the provider auto-detect.
string language = 3;
// prompt is a default spelling/vocabulary hint passed to the provider.
// Used as the OpenAI Whisper "prompt" parameter and folded into the Gemini
// generation prompt as a "Context and spelling hints" block.
string prompt = 4;
}
Run from the proto/ directory:
cd proto && buf format -w && buf generate
Expected: command exits 0; files under proto/gen/store/instance_setting.pb.go and web/src/types/proto/store/instance_setting_pb.ts updated to include TranscriptionConfig and the new Transcription field.
Run from repo root:
go build ./...
Expected: PASS. (Backend code does not yet reference the new field, so this just confirms the generation is well-formed.)
git add proto/store/instance_setting.proto proto/gen/store/ web/src/types/proto/store/
git commit -m "feat(proto/store): add TranscriptionConfig to InstanceAISetting
Adds provider_id / model / language / prompt fields for the new
explicit transcription configuration. Schema-additive (field 2 on
InstanceAISetting); existing instances default to provider_id=\"\"
which means transcription is disabled until the operator selects
a provider in settings."
TranscriptionConfig into the API protoFiles:
proto/api/v1/instance_service.protoThe API-level message mirrors the store version. They live in different proto packages (memos.api.v1 vs memos.store), matching the existing parallel-message pattern (User, Memo, AIProviderConfig, etc.).
proto/api/v1/instance_service.protoLocate the nested message AISetting { ... } block (around lines 226–230) and replace it with:
// AI provider configuration settings.
message AISetting {
// providers is the list of AI provider configurations available instance-wide.
repeated AIProviderConfig providers = 1;
// transcription is the speech-to-text feature configuration.
// When unset or transcription.provider_id is empty, transcription is disabled.
TranscriptionConfig transcription = 2;
}
Immediately after the existing enum AIProviderType { ... } block (currently the last child of InstanceSetting, around lines 247–251), append the new nested message — keep the indentation: it lives inside message InstanceSetting { ... }:
// TranscriptionConfig configures the speech-to-text feature.
message TranscriptionConfig {
// provider_id references an entry in AISetting.providers[].id.
// Empty string means transcription is disabled.
string provider_id = 1;
// model is the provider-specific model identifier.
// Empty string falls back to the engine default
// (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
string model = 2;
// language is the default ISO 639-1 language hint sent to the provider.
// Empty string lets the provider auto-detect.
string language = 3;
// prompt is a default spelling/vocabulary hint passed to the provider.
string prompt = 4;
}
cd proto && buf format -w && buf generate
Expected: PASS. Updates proto/gen/api/v1/instance_service.pb.go, proto/gen/openapi.yaml, and web/src/types/proto/api/v1/instance_service_pb.ts.
go build ./...
Expected: PASS. Existing convertInstanceAISetting* functions still compile because the new field defaults to nil/zero on round-trip.
git add proto/api/v1/instance_service.proto proto/gen/ web/src/types/proto/api/
git commit -m "feat(proto/api): add TranscriptionConfig to AISetting
Mirrors the store-level TranscriptionConfig. Both messages live in
their own packages (memos.api.v1 vs memos.store) following the
existing parallel-message pattern used for AIProviderConfig."
transcription through convertInstanceAISetting{From,To}StoreFiles:
server/router/api/v1/instance_service.go:505-551The existing converters drop unknown fields silently because they only copy named fields. Without explicit handling, transcription would be lost on every round-trip. This task is purely plumbing — no validation yet.
convertInstanceAISettingFromStoreReplace the function body (currently lines 505–528) so the returned aiSetting carries the new field:
func convertInstanceAISettingFromStore(setting *storepb.InstanceAISetting) *v1pb.InstanceSetting_AISetting {
if setting == nil {
return nil
}
aiSetting := &v1pb.InstanceSetting_AISetting{
Providers: make([]*v1pb.InstanceSetting_AIProviderConfig, 0, len(setting.Providers)),
Transcription: convertTranscriptionConfigFromStore(setting.GetTranscription()),
}
for _, provider := range setting.Providers {
if provider == nil {
continue
}
apiKey := provider.GetApiKey()
aiSetting.Providers = append(aiSetting.Providers, &v1pb.InstanceSetting_AIProviderConfig{
Id: provider.GetId(),
Title: provider.GetTitle(),
Type: v1pb.InstanceSetting_AIProviderType(provider.GetType()),
Endpoint: provider.GetEndpoint(),
ApiKeySet: apiKey != "",
ApiKeyHint: maskAPIKey(apiKey),
})
}
return aiSetting
}
convertInstanceAISettingToStoreReplace the function body (currently lines 530–551):
func convertInstanceAISettingToStore(setting *v1pb.InstanceSetting_AISetting) *storepb.InstanceAISetting {
if setting == nil {
return nil
}
aiSetting := &storepb.InstanceAISetting{
Providers: make([]*storepb.AIProviderConfig, 0, len(setting.Providers)),
Transcription: convertTranscriptionConfigToStore(setting.GetTranscription()),
}
for _, provider := range setting.Providers {
if provider == nil {
continue
}
aiSetting.Providers = append(aiSetting.Providers, &storepb.AIProviderConfig{
Id: provider.GetId(),
Title: provider.GetTitle(),
Type: storepb.AIProviderType(provider.GetType()),
Endpoint: provider.GetEndpoint(),
ApiKey: provider.GetApiKey(),
})
}
return aiSetting
}
Append immediately after convertInstanceAISettingToStore:
func convertTranscriptionConfigFromStore(setting *storepb.TranscriptionConfig) *v1pb.InstanceSetting_TranscriptionConfig {
if setting == nil {
return nil
}
return &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: setting.GetProviderId(),
Model: setting.GetModel(),
Language: setting.GetLanguage(),
Prompt: setting.GetPrompt(),
}
}
func convertTranscriptionConfigToStore(setting *v1pb.InstanceSetting_TranscriptionConfig) *storepb.TranscriptionConfig {
if setting == nil {
return nil
}
return &storepb.TranscriptionConfig{
ProviderId: setting.GetProviderId(),
Model: setting.GetModel(),
Language: setting.GetLanguage(),
Prompt: setting.GetPrompt(),
}
}
go build ./...
Expected: PASS.
git add server/router/api/v1/instance_service.go
git commit -m "feat(api/instance): round-trip transcription through AI setting converters"
transcription in prepareInstanceAISettingForUpdateFiles:
server/router/api/v1/instance_service.go:564-623The spec lists four validation rules: provider_id must reference an existing entry in providers[] (when set); length caps on model (256), language (32), prompt (4096). Plus the "preserve previous on omit" rule that mirrors how API keys are preserved when a request omits them.
Open server/router/api/v1/test/instance_service_test.go and append a new sub-test inside the existing top-level TestUpdateInstanceSetting-equivalent function (the same one that currently contains "UpdateInstanceSetting - AI provider keys are write-only and preserved on empty" near line 670). Find the closing brace of that sub-test and insert before it:
t.Run("UpdateInstanceSetting - transcription provider_id must reference an existing provider", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
hostUser, err := ts.CreateHostUser(ctx, "admin")
require.NoError(t, err)
adminCtx := ts.CreateUserContext(ctx, hostUser.ID)
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{
Setting: &v1pb.InstanceSetting{
Name: "instance/settings/AI",
Value: &v1pb.InstanceSetting_AiSetting{
AiSetting: &v1pb.InstanceSetting_AISetting{
Providers: []*v1pb.InstanceSetting_AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: v1pb.InstanceSetting_OPENAI,
ApiKey: "sk-test",
},
},
Transcription: &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "does-not-exist",
},
},
},
},
})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription provider_id")
})
t.Run("UpdateInstanceSetting - transcription strings are length-capped", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
hostUser, err := ts.CreateHostUser(ctx, "admin")
require.NoError(t, err)
adminCtx := ts.CreateUserContext(ctx, hostUser.ID)
base := &v1pb.InstanceSetting{
Name: "instance/settings/AI",
Value: &v1pb.InstanceSetting_AiSetting{
AiSetting: &v1pb.InstanceSetting_AISetting{
Providers: []*v1pb.InstanceSetting_AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: v1pb.InstanceSetting_OPENAI,
ApiKey: "sk-test",
},
},
},
},
}
oversizedModel := strings.Repeat("a", 257)
base.GetAiSetting().Transcription = &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "openai-main",
Model: oversizedModel,
}
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{Setting: base})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription model")
oversizedLanguage := strings.Repeat("a", 33)
base.GetAiSetting().Transcription = &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "openai-main",
Language: oversizedLanguage,
}
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{Setting: base})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription language")
oversizedPrompt := strings.Repeat("a", 4097)
base.GetAiSetting().Transcription = &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "openai-main",
Prompt: oversizedPrompt,
}
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{Setting: base})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription prompt")
})
t.Run("UpdateInstanceSetting - transcription is preserved when omitted on update", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
hostUser, err := ts.CreateHostUser(ctx, "admin")
require.NoError(t, err)
adminCtx := ts.CreateUserContext(ctx, hostUser.ID)
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{
Setting: &v1pb.InstanceSetting{
Name: "instance/settings/AI",
Value: &v1pb.InstanceSetting_AiSetting{
AiSetting: &v1pb.InstanceSetting_AISetting{
Providers: []*v1pb.InstanceSetting_AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: v1pb.InstanceSetting_OPENAI,
ApiKey: "sk-test",
},
},
Transcription: &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "openai-main",
Model: "whisper-1",
Language: "en",
Prompt: "names: Alice",
},
},
},
},
})
require.NoError(t, err)
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{
Setting: &v1pb.InstanceSetting{
Name: "instance/settings/AI",
Value: &v1pb.InstanceSetting_AiSetting{
AiSetting: &v1pb.InstanceSetting_AISetting{
Providers: []*v1pb.InstanceSetting_AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: v1pb.InstanceSetting_OPENAI,
ApiKey: "",
},
},
},
},
},
})
require.NoError(t, err)
stored, err := ts.Store.GetInstanceAISetting(ctx)
require.NoError(t, err)
require.NotNil(t, stored.GetTranscription())
require.Equal(t, "openai-main", stored.GetTranscription().GetProviderId())
require.Equal(t, "whisper-1", stored.GetTranscription().GetModel())
require.Equal(t, "en", stored.GetTranscription().GetLanguage())
require.Equal(t, "names: Alice", stored.GetTranscription().GetPrompt())
})
Confirm strings is already imported in this test file. If not, add "strings" to its import block.
go test -run TestUpdateInstanceSetting -v ./server/router/api/v1/test/... 2>&1 | tail -40
Expected: the three new sub-tests FAIL because prepareInstanceAISettingForUpdate does not yet validate or preserve transcription.
prepareInstanceAISettingForUpdateOpen server/router/api/v1/instance_service.go. Add these constants near the top of the file (or next to existing instance setting constants — search for any existing length cap constants and place these alongside):
const (
maxTranscriptionConfigModelLength = 256
maxTranscriptionConfigLanguageLength = 32
maxTranscriptionConfigPromptLength = 4096
)
Then, at the very end of the existing prepareInstanceAISettingForUpdate function (immediately before its closing return nil), insert:
if err := preparePersistedTranscriptionConfig(setting, existing); err != nil {
return err
}
And add this new function next to prepareInstanceAISettingForUpdate:
func preparePersistedTranscriptionConfig(setting *storepb.InstanceAISetting, existing *storepb.InstanceAISetting) error {
// Preserve the previously stored transcription config when the request omits it,
// matching the same "absence == keep" semantics used for API keys.
if setting.Transcription == nil {
if existing != nil {
setting.Transcription = existing.GetTranscription()
}
return nil
}
cfg := setting.Transcription
cfg.ProviderId = strings.TrimSpace(cfg.ProviderId)
cfg.Model = strings.TrimSpace(cfg.Model)
cfg.Language = strings.TrimSpace(cfg.Language)
cfg.Prompt = strings.TrimSpace(cfg.Prompt)
if cfg.ProviderId != "" {
referenced := false
for _, provider := range setting.Providers {
if provider != nil && provider.Id == cfg.ProviderId {
referenced = true
break
}
}
if !referenced {
return errors.Errorf("transcription provider_id %q does not reference any configured provider", cfg.ProviderId)
}
}
if len(cfg.Model) > maxTranscriptionConfigModelLength {
return errors.Errorf("transcription model is too long; maximum length is %d characters", maxTranscriptionConfigModelLength)
}
if len(cfg.Language) > maxTranscriptionConfigLanguageLength {
return errors.Errorf("transcription language is too long; maximum length is %d characters", maxTranscriptionConfigLanguageLength)
}
if len(cfg.Prompt) > maxTranscriptionConfigPromptLength {
return errors.Errorf("transcription prompt is too long; maximum length is %d characters", maxTranscriptionConfigPromptLength)
}
return nil
}
go test -run TestUpdateInstanceSetting -v ./server/router/api/v1/test/... 2>&1 | tail -40
Expected: PASS for all three new sub-tests plus all existing sub-tests.
git add server/router/api/v1/instance_service.go server/router/api/v1/test/instance_service_test.go
git commit -m "feat(api/instance): validate and preserve transcription config
Validates transcription.provider_id references an existing provider
and length-caps model (256), language (32), and prompt (4096). When
an update omits transcription, the previously stored config is
preserved — same semantics as the existing API-key preservation."
TranscribeRequest.provider_id optionalFiles:
proto/api/v1/ai_service.proto:24The persisted setting becomes the source of truth; the request field is now an override for advanced callers.
proto/api/v1/ai_service.protoChange line 24 from:
// Required. The instance AI provider ID to use.
string provider_id = 1 [(google.api.field_behavior) = REQUIRED];
to:
// Optional. The instance AI provider ID to use. When empty, the server
// resolves the provider from InstanceAISetting.transcription.provider_id.
string provider_id = 1 [(google.api.field_behavior) = OPTIONAL];
cd proto && buf format -w && buf generate
Expected: PASS.
go build ./...
Expected: PASS. (The Connect/gRPC stub regenerates with the same Go field shape; field_behavior is metadata only.)
git add proto/api/v1/ai_service.proto proto/gen/
git commit -m "feat(proto/api): make TranscribeRequest.provider_id optional
When omitted, the server resolves the provider from the persisted
InstanceAISetting.transcription configuration."
Transcribe RPCFiles:
server/router/api/v1/ai_service.goThe current implementation requires provider_id and uses DefaultTranscriptionModel for the model. The new flow: per-call request → persisted transcription → engine default. Per-call Config.prompt and Config.language already exist; they should now fall through to the persisted defaults when empty.
Open server/router/api/v1/test/ai_service_test.go and append, inside the existing TestTranscribe function (before the closing brace of the function — currently line 280), a new sub-test:
t.Run("resolves provider from persisted transcription setting when request omits provider_id", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
user, err := ts.CreateRegularUser(ctx, "alice-fallthrough")
require.NoError(t, err)
userCtx := ts.CreateUserContext(ctx, user.ID)
openAIServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
require.NoError(t, r.ParseMultipartForm(10<<20))
require.Equal(t, "whisper-1", r.FormValue("model"))
require.Equal(t, "fr", r.FormValue("language"))
require.Equal(t, "names: Alice", r.FormValue("prompt"))
w.Header().Set("Content-Type", "application/json")
require.NoError(t, json.NewEncoder(w).Encode(map[string]string{"text": "ok"}))
}))
defer openAIServer.Close()
_, err = ts.Store.UpsertInstanceSetting(ctx, &storepb.InstanceSetting{
Key: storepb.InstanceSettingKey_AI,
Value: &storepb.InstanceSetting_AiSetting{
AiSetting: &storepb.InstanceAISetting{
Providers: []*storepb.AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: storepb.AIProviderType_OPENAI,
Endpoint: openAIServer.URL,
ApiKey: "sk-test",
},
},
Transcription: &storepb.TranscriptionConfig{
ProviderId: "openai-main",
Model: "whisper-1",
Language: "fr",
Prompt: "names: Alice",
},
},
},
})
require.NoError(t, err)
resp, err := ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
Config: &v1pb.TranscriptionConfig{},
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
Filename: "voice.wav",
ContentType: "audio/wav",
},
})
require.NoError(t, err)
require.Equal(t, "ok", resp.Text)
})
t.Run("per-call config overrides persisted prompt and language", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
user, err := ts.CreateRegularUser(ctx, "alice-override")
require.NoError(t, err)
userCtx := ts.CreateUserContext(ctx, user.ID)
openAIServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
require.NoError(t, r.ParseMultipartForm(10<<20))
require.Equal(t, "de", r.FormValue("language"))
require.Equal(t, "override prompt", r.FormValue("prompt"))
w.Header().Set("Content-Type", "application/json")
require.NoError(t, json.NewEncoder(w).Encode(map[string]string{"text": "ok"}))
}))
defer openAIServer.Close()
_, err = ts.Store.UpsertInstanceSetting(ctx, &storepb.InstanceSetting{
Key: storepb.InstanceSettingKey_AI,
Value: &storepb.InstanceSetting_AiSetting{
AiSetting: &storepb.InstanceAISetting{
Providers: []*storepb.AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: storepb.AIProviderType_OPENAI,
Endpoint: openAIServer.URL,
ApiKey: "sk-test",
},
},
Transcription: &storepb.TranscriptionConfig{
ProviderId: "openai-main",
Language: "fr",
Prompt: "names: Alice",
},
},
},
})
require.NoError(t, err)
_, err = ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
Config: &v1pb.TranscriptionConfig{
Language: "de",
Prompt: "override prompt",
},
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
Filename: "voice.wav",
ContentType: "audio/wav",
},
})
require.NoError(t, err)
})
t.Run("returns FailedPrecondition when no provider configured", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
user, err := ts.CreateRegularUser(ctx, "alice-empty")
require.NoError(t, err)
userCtx := ts.CreateUserContext(ctx, user.ID)
_, err = ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
Config: &v1pb.TranscriptionConfig{},
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
Filename: "voice.wav",
ContentType: "audio/wav",
},
})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription is not configured")
})
go test -run TestTranscribe -v ./server/router/api/v1/test/... 2>&1 | tail -60
Expected: the three new sub-tests FAIL — "resolves provider from persisted setting" fails because the current code requires request.ProviderId; "per-call config overrides" fails because the current code does not read the persisted prompt/language at all (so the persisted-only case isn't tested but the override path doesn't merge); "returns FailedPrecondition" fails because the current error is InvalidArgument: provider_id is required.
Transcribe to resolve from the persisted settingIn server/router/api/v1/ai_service.go, replace the block from the start of the Transcribe method that validates provider_id and resolves the provider — currently lines 54–101 — with the version below. Keep the audio validation block (lines 68–91) intact: it stays AFTER the provider resolution because audio errors should still surface as InvalidArgument regardless of transcription config.
Specifically, replace lines 54–101 with:
if request.Config == nil {
return nil, status.Errorf(codes.InvalidArgument, "config is required")
}
if request.Audio == nil {
return nil, status.Errorf(codes.InvalidArgument, "audio is required")
}
if request.Audio.GetUri() != "" {
return nil, status.Errorf(codes.InvalidArgument, "audio uri is not supported")
}
content := request.Audio.GetContent()
if len(content) == 0 {
return nil, status.Errorf(codes.InvalidArgument, "audio content is required")
}
if len(content) > maxTranscriptionAudioSizeBytes {
return nil, status.Errorf(codes.InvalidArgument, "audio file is too large; maximum size is 25 MiB")
}
filename := strings.TrimSpace(request.Audio.GetFilename())
if len(filename) > maxTranscriptionFilenameLength {
return nil, status.Errorf(codes.InvalidArgument, "filename is too long; maximum length is %d characters", maxTranscriptionFilenameLength)
}
contentType := strings.TrimSpace(request.Audio.GetContentType())
if contentType == "" {
contentType = http.DetectContentType(content)
}
if !isSupportedTranscriptionContentType(contentType) {
return nil, status.Errorf(codes.InvalidArgument, "audio content type %q is not supported", contentType)
}
aiSetting, err := s.Store.GetInstanceAISetting(ctx)
if err != nil {
return nil, status.Errorf(codes.Internal, "failed to get AI setting: %v", err)
}
persisted := aiSetting.GetTranscription()
providerID := strings.TrimSpace(request.GetProviderId())
if providerID == "" {
providerID = persisted.GetProviderId()
}
if providerID == "" {
return nil, status.Errorf(codes.FailedPrecondition, "transcription is not configured")
}
provider, err := s.resolveAIProvider(aiSetting, providerID)
if err != nil {
return nil, err
}
model := strings.TrimSpace(request.GetConfig().GetModel())
if model == "" {
model = persisted.GetModel()
}
if model == "" {
defaultModel, err := ai.DefaultTranscriptionModel(provider.Type)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "%v", err)
}
model = defaultModel
}
prompt := strings.TrimSpace(request.GetConfig().GetPrompt())
if prompt == "" {
prompt = persisted.GetPrompt()
}
if len(prompt) > maxTranscriptionPromptLength {
return nil, status.Errorf(codes.InvalidArgument, "prompt is too long; maximum length is %d characters", maxTranscriptionPromptLength)
}
language := strings.TrimSpace(request.GetConfig().GetLanguage())
if language == "" {
language = persisted.GetLanguage()
}
if len(language) > maxTranscriptionLanguageLength {
return nil, status.Errorf(codes.InvalidArgument, "language is too long; maximum length is %d characters", maxTranscriptionLanguageLength)
}
Note: request.GetConfig().GetModel() requires that the API-level TranscriptionConfig actually have a model field. The current proto only has prompt and language. We don't add a model override field at this step — the GetModel() accessor will not exist. Remove the model-override line entirely so the precedence is persisted setting → engine default, with no per-call override:
Replace the model resolution block above with this simpler version (which is the actual code to commit):
model := persisted.GetModel()
if model == "" {
defaultModel, err := ai.DefaultTranscriptionModel(provider.Type)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "%v", err)
}
model = defaultModel
}
Also delete the existing helper resolveAIProviderForTranscription (currently lines 119–142) and replace it with this slimmer one that takes a pre-fetched setting:
func (s *APIV1Service) resolveAIProvider(setting *storepb.InstanceAISetting, providerID string) (ai.ProviderConfig, error) {
providers := make([]ai.ProviderConfig, 0, len(setting.GetProviders()))
for _, provider := range setting.GetProviders() {
if provider == nil {
continue
}
providers = append(providers, convertAIProviderConfigFromStore(provider))
}
provider, err := ai.FindProvider(providers, providerID)
if err != nil {
return ai.ProviderConfig{}, status.Errorf(codes.NotFound, "AI provider not found")
}
return *provider, nil
}
The remainder of Transcribe (the call to ai.NewTranscriber, the transcriber.Transcribe(...) call, the response construction) is unchanged — prompt, language, model, provider are all already in scope.
go test -run TestTranscribe -v ./server/router/api/v1/test/... 2>&1 | tail -60
Expected: PASS for all old sub-tests plus the three new ones.
go test -race ./server/... ./internal/...
Expected: PASS.
git add server/router/api/v1/ai_service.go server/router/api/v1/test/ai_service_test.go
git commit -m "feat(api/ai): resolve transcription from persisted setting
Transcribe now resolves provider, model, language, and prompt with
this precedence: per-call request → persisted transcription config
→ engine default. provider_id may be omitted from the request when
the operator has selected a provider in settings. Returns
FailedPrecondition when no provider can be resolved."
AISection into Integrations + TranscriptionFiles:
web/src/components/Settings/AISection.tsxweb/src/locales/en.jsonThe existing provider list stays as-is in a renamed group. A new TranscriptionForm group is added below it. Both groups share a single Save action that writes the entire AISetting (this matches the existing pattern — the protobuf save is already whole-message).
Open web/src/locales/en.json. Locate the "ai": { ... } block (starting around line 411). Inside that block, add the following keys (alphabetically sorted to match the file's convention; most fall between keep-api-key and label):
"integrations-description": "Provider keys are supplied by the instance owner and used by server-side AI features.",
"integrations-title": "AI integrations",
"transcription-description": "Speech-to-text settings used when recording audio in the memo composer.",
"transcription-empty-providers": "Add an AI integration first to enable transcription.",
"transcription-language-help": "ISO 639-1 short code (e.g. en, de, zh). Leave empty to auto-detect.",
"transcription-language-placeholder": "auto-detect",
"transcription-language": "Default language",
"transcription-model-help": "Free text. Use the provider's model identifier — e.g. whisper-1, gpt-4o-transcribe, whisper-large-v3-turbo.",
"transcription-model-placeholder-gemini": "gemini-2.5-flash",
"transcription-model-placeholder-openai": "whisper-1",
"transcription-model": "Model",
"transcription-no-provider": "None — transcription disabled",
"transcription-prompt-help": "Improves spelling of proper nouns and jargon. Whisper limit is roughly 224 tokens.",
"transcription-prompt-placeholder": "Names: Alice, Bob. Glossary: kubernetes, OAuth.",
"transcription-prompt": "Prompt hints",
"transcription-provider": "Provider",
"transcription-title": "Transcription",
"transcription-warning-gemini-webm": "Gemini does not accept browser-recorded audio/webm. For in-editor recording, use an OpenAI-compatible provider.",
"transcription-warning-no-key": "The selected provider has no API key set. Edit the integration above to add one.",
Also leave the existing "providers": "Providers" key — AISection.tsx no longer uses it, but other locale files reference it; we won't churn translations for an unused string.
AISection.tsxOpen web/src/components/Settings/AISection.tsx. The strategy: keep the existing provider table inside a new SettingGroup titled with setting.ai.integrations-title, and add a sibling SettingGroup for transcription. Reuse useState/isEqual change tracking, but for both providers and transcription combined.
Replace the file contents with the structure below. (This is a full rewrite of the file; the dialog component is unchanged from the existing implementation and is included verbatim at the bottom.)
import { create } from "@bufbuild/protobuf";
import { isEqual } from "lodash-es";
import { MoreVerticalIcon, PlusIcon } from "lucide-react";
import { useEffect, useMemo, useState } from "react";
import { toast } from "react-hot-toast";
import ConfirmDialog from "@/components/ConfirmDialog";
import { Button } from "@/components/ui/button";
import { Dialog, DialogContent, DialogDescription, DialogFooter, DialogHeader, DialogTitle } from "@/components/ui/dialog";
import { DropdownMenu, DropdownMenuContent, DropdownMenuItem, DropdownMenuTrigger } from "@/components/ui/dropdown-menu";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from "@/components/ui/select";
import { Textarea } from "@/components/ui/textarea";
import { useInstance } from "@/contexts/InstanceContext";
import {
InstanceSetting_AIProviderConfig,
InstanceSetting_AIProviderConfigSchema,
InstanceSetting_AIProviderType,
InstanceSetting_AISettingSchema,
InstanceSetting_Key,
InstanceSetting_TranscriptionConfig,
InstanceSetting_TranscriptionConfigSchema,
InstanceSettingSchema,
} from "@/types/proto/api/v1/instance_service_pb";
import { useTranslate } from "@/utils/i18n";
import SettingGroup from "./SettingGroup";
import { SettingPanel } from "./SettingList";
import SettingSection from "./SettingSection";
import SettingTable from "./SettingTable";
import useInstanceSettingUpdater, { buildInstanceSettingName } from "./useInstanceSettingUpdater";
type LocalAIProvider = {
id: string;
title: string;
type: InstanceSetting_AIProviderType;
endpoint: string;
apiKey: string;
apiKeySet: boolean;
apiKeyHint: string;
};
type LocalTranscription = {
providerId: string;
model: string;
language: string;
prompt: string;
};
const providerTypeOptions = [InstanceSetting_AIProviderType.OPENAI, InstanceSetting_AIProviderType.GEMINI];
const byokNotes = ["setting.ai.byok-key-note", "setting.ai.byok-storage-note", "setting.ai.byok-model-note"] as const;
const createProviderID = () => {
if (typeof crypto !== "undefined" && "randomUUID" in crypto) {
return crypto.randomUUID();
}
return `ai-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`;
};
const getProviderTypeLabel = (type: InstanceSetting_AIProviderType) => {
return InstanceSetting_AIProviderType[type] ?? "UNKNOWN";
};
const toLocalProvider = (provider: InstanceSetting_AIProviderConfig): LocalAIProvider => ({
id: provider.id,
title: provider.title,
type: provider.type,
endpoint: provider.endpoint,
apiKey: "",
apiKeySet: provider.apiKeySet,
apiKeyHint: provider.apiKeyHint,
});
const toLocalTranscription = (config: InstanceSetting_TranscriptionConfig | undefined): LocalTranscription => ({
providerId: config?.providerId ?? "",
model: config?.model ?? "",
language: config?.language ?? "",
prompt: config?.prompt ?? "",
});
const newProvider = (): LocalAIProvider => ({
id: createProviderID(),
title: "",
type: InstanceSetting_AIProviderType.OPENAI,
endpoint: "",
apiKey: "",
apiKeySet: false,
apiKeyHint: "",
});
const toProviderConfig = (provider: LocalAIProvider) =>
create(InstanceSetting_AIProviderConfigSchema, {
id: provider.id,
title: provider.title.trim(),
type: provider.type,
endpoint: provider.endpoint.trim(),
apiKey: provider.apiKey,
});
const toTranscriptionConfig = (transcription: LocalTranscription) =>
create(InstanceSetting_TranscriptionConfigSchema, {
providerId: transcription.providerId,
model: transcription.model.trim(),
language: transcription.language.trim(),
prompt: transcription.prompt,
});
const AISection = () => {
const t = useTranslate();
const saveInstanceSetting = useInstanceSettingUpdater();
const { aiSetting: originalSetting } = useInstance();
const [providers, setProviders] = useState<LocalAIProvider[]>(() => originalSetting.providers.map(toLocalProvider));
const [transcription, setTranscription] = useState<LocalTranscription>(() => toLocalTranscription(originalSetting.transcription));
const [editingProvider, setEditingProvider] = useState<LocalAIProvider | undefined>();
const [deleteTarget, setDeleteTarget] = useState<LocalAIProvider | undefined>();
useEffect(() => {
setProviders(originalSetting.providers.map(toLocalProvider));
setTranscription(toLocalTranscription(originalSetting.transcription));
}, [originalSetting.providers, originalSetting.transcription]);
const originalProviders = useMemo(() => originalSetting.providers.map(toLocalProvider), [originalSetting.providers]);
const originalTranscription = useMemo(() => toLocalTranscription(originalSetting.transcription), [originalSetting.transcription]);
const hasChanges = !isEqual(providers, originalProviders) || !isEqual(transcription, originalTranscription);
const transcriptionProviderRef = useMemo(
() => providers.find((provider) => provider.id === transcription.providerId),
[providers, transcription.providerId],
);
const handleCreateProvider = () => {
setEditingProvider(newProvider());
};
const handleEditProvider = (provider: LocalAIProvider) => {
setEditingProvider({ ...provider, apiKey: "" });
};
const handleSaveProvider = (provider: LocalAIProvider) => {
const title = provider.title.trim();
const endpoint = provider.endpoint.trim();
if (!title) {
toast.error(t("setting.ai.provider-title-required"));
return;
}
if (!provider.apiKeySet && !provider.apiKey.trim()) {
toast.error(t("setting.ai.api-key-required"));
return;
}
const normalizedProvider = { ...provider, title, endpoint };
setProviders((prev) => {
const exists = prev.some((item) => item.id === normalizedProvider.id);
if (!exists) {
return [...prev, normalizedProvider];
}
return prev.map((item) => (item.id === normalizedProvider.id ? normalizedProvider : item));
});
setEditingProvider(undefined);
};
const handleDeleteProvider = () => {
if (!deleteTarget) return;
setProviders((prev) => prev.filter((provider) => provider.id !== deleteTarget.id));
if (transcription.providerId === deleteTarget.id) {
setTranscription((prev) => ({ ...prev, providerId: "" }));
}
setDeleteTarget(undefined);
};
const handleSaveSetting = async () => {
if (transcription.providerId && !transcriptionProviderRef) {
toast.error(t("setting.ai.transcription-empty-providers"));
return;
}
await saveInstanceSetting({
key: InstanceSetting_Key.AI,
setting: create(InstanceSettingSchema, {
name: buildInstanceSettingName(InstanceSetting_Key.AI),
value: {
case: "aiSetting",
value: create(InstanceSetting_AISettingSchema, {
providers: providers.map(toProviderConfig),
transcription: toTranscriptionConfig(transcription),
}),
},
}),
errorContext: "Update AI setting",
});
};
return (
<SettingSection
title={t("setting.ai.label")}
actions={
<Button onClick={handleCreateProvider}>
<PlusIcon className="w-4 h-4 mr-2" />
{t("setting.ai.add-provider")}
</Button>
}
>
<SettingPanel className="bg-muted/30 px-4 py-3">
<div className="flex max-w-3xl flex-col gap-2">
<div className="flex flex-wrap items-center gap-2">
<span className="rounded-md border border-border bg-background px-2 py-0.5 text-xs font-medium text-foreground">
{t("setting.ai.byok-label")}
</span>
<h4 className="text-sm font-semibold text-foreground">{t("setting.ai.byok-title")}</h4>
</div>
<p className="text-sm text-muted-foreground">{t("setting.ai.byok-description")}</p>
<ul className="space-y-1 text-sm text-muted-foreground">
{byokNotes.map((note) => (
<li key={note} className="flex gap-2">
<span className="mt-2 size-1 rounded-full bg-muted-foreground/60" aria-hidden />
<span>{t(note)}</span>
</li>
))}
</ul>
</div>
</SettingPanel>
<SettingGroup title={t("setting.ai.integrations-title")} description={t("setting.ai.integrations-description")}>
<SettingTable
columns={[
{
key: "title",
header: t("common.name"),
render: (_, provider: LocalAIProvider) => (
<div className="flex flex-col gap-0.5">
<span className="text-foreground">{provider.title}</span>
<span className="font-mono text-xs text-muted-foreground">{provider.id}</span>
</div>
),
},
{
key: "type",
header: t("setting.ai.provider-type"),
render: (_, provider: LocalAIProvider) => <span>{getProviderTypeLabel(provider.type)}</span>,
},
{
key: "endpoint",
header: t("setting.ai.endpoint"),
render: (_, provider: LocalAIProvider) => (
<span className="font-mono text-xs">{provider.endpoint || t("setting.ai.default-endpoint")}</span>
),
},
{
key: "apiKeySet",
header: t("setting.ai.api-key"),
render: (_, provider: LocalAIProvider) => (
<span className="font-mono text-xs">{provider.apiKeySet ? provider.apiKeyHint || t("setting.ai.configured") : "-"}</span>
),
},
{
key: "actions",
header: "",
className: "text-right",
render: (_, provider: LocalAIProvider) => (
<DropdownMenu>
<DropdownMenuTrigger asChild>
<Button variant="outline" size="sm">
<MoreVerticalIcon className="w-4 h-auto" />
</Button>
</DropdownMenuTrigger>
<DropdownMenuContent align="end" sideOffset={2}>
<DropdownMenuItem onClick={() => handleEditProvider(provider)}>{t("common.edit")}</DropdownMenuItem>
<DropdownMenuItem onClick={() => setDeleteTarget(provider)} className="text-destructive focus:text-destructive">
{t("common.delete")}
</DropdownMenuItem>
</DropdownMenuContent>
</DropdownMenu>
),
},
]}
data={providers}
emptyMessage={t("setting.ai.no-providers")}
getRowKey={(provider) => provider.id}
/>
</SettingGroup>
<SettingGroup
title={t("setting.ai.transcription-title")}
description={t("setting.ai.transcription-description")}
showSeparator
>
<TranscriptionForm
providers={providers}
transcription={transcription}
onChange={setTranscription}
referencedProvider={transcriptionProviderRef}
/>
</SettingGroup>
<div className="w-full flex justify-end">
<Button disabled={!hasChanges} onClick={handleSaveSetting}>
{t("common.save")}
</Button>
</div>
<AIProviderDialog
provider={editingProvider}
onOpenChange={(open) => !open && setEditingProvider(undefined)}
onSave={handleSaveProvider}
/>
<ConfirmDialog
open={!!deleteTarget}
onOpenChange={(open) => !open && setDeleteTarget(undefined)}
title={deleteTarget ? t("setting.ai.delete-provider", { title: deleteTarget.title }) : ""}
confirmLabel={t("common.delete")}
cancelLabel={t("common.cancel")}
onConfirm={handleDeleteProvider}
confirmVariant="destructive"
/>
</SettingSection>
);
};
interface TranscriptionFormProps {
providers: LocalAIProvider[];
transcription: LocalTranscription;
referencedProvider: LocalAIProvider | undefined;
onChange: (next: LocalTranscription) => void;
}
const TranscriptionForm = ({ providers, transcription, referencedProvider, onChange }: TranscriptionFormProps) => {
const t = useTranslate();
const noProviders = providers.length === 0;
const update = (partial: Partial<LocalTranscription>) => {
onChange({ ...transcription, ...partial });
};
const placeholderForProvider = (provider: LocalAIProvider | undefined) => {
if (!provider) return "";
return provider.type === InstanceSetting_AIProviderType.GEMINI
? t("setting.ai.transcription-model-placeholder-gemini")
: t("setting.ai.transcription-model-placeholder-openai");
};
return (
<div className="grid grid-cols-1 sm:grid-cols-2 gap-3 max-w-3xl">
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.transcription-provider")}</Label>
<Select
value={transcription.providerId || "__none__"}
onValueChange={(value) => update({ providerId: value === "__none__" ? "" : value })}
disabled={noProviders}
>
<SelectTrigger className="w-full">
<SelectValue />
</SelectTrigger>
<SelectContent>
<SelectItem value="__none__">{t("setting.ai.transcription-no-provider")}</SelectItem>
{providers.map((provider) => (
<SelectItem key={provider.id} value={provider.id}>
{provider.title || provider.id}
</SelectItem>
))}
</SelectContent>
</Select>
{noProviders && <p className="text-xs text-muted-foreground">{t("setting.ai.transcription-empty-providers")}</p>}
{referencedProvider && !referencedProvider.apiKeySet && (
<p className="text-xs text-destructive">{t("setting.ai.transcription-warning-no-key")}</p>
)}
{referencedProvider?.type === InstanceSetting_AIProviderType.GEMINI && (
<p className="text-xs text-muted-foreground">{t("setting.ai.transcription-warning-gemini-webm")}</p>
)}
</div>
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.transcription-model")}</Label>
<Input
value={transcription.model}
onChange={(e) => update({ model: e.target.value })}
placeholder={placeholderForProvider(referencedProvider)}
disabled={!transcription.providerId}
/>
<p className="text-xs text-muted-foreground">{t("setting.ai.transcription-model-help")}</p>
</div>
<div className="flex flex-col gap-1.5">
<Label>{t("setting.ai.transcription-language")}</Label>
<Input
value={transcription.language}
onChange={(e) => update({ language: e.target.value })}
placeholder={t("setting.ai.transcription-language-placeholder")}
disabled={!transcription.providerId}
/>
<p className="text-xs text-muted-foreground">{t("setting.ai.transcription-language-help")}</p>
</div>
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.transcription-prompt")}</Label>
<Textarea
value={transcription.prompt}
onChange={(e) => update({ prompt: e.target.value })}
placeholder={t("setting.ai.transcription-prompt-placeholder")}
rows={3}
disabled={!transcription.providerId}
/>
<p className="text-xs text-muted-foreground">{t("setting.ai.transcription-prompt-help")}</p>
</div>
</div>
);
};
interface AIProviderDialogProps {
provider?: LocalAIProvider;
onOpenChange: (open: boolean) => void;
onSave: (provider: LocalAIProvider) => void;
}
const AIProviderDialog = ({ provider, onOpenChange, onSave }: AIProviderDialogProps) => {
const t = useTranslate();
const [draft, setDraft] = useState<LocalAIProvider>(() => provider ?? newProvider());
useEffect(() => {
const next = provider ?? newProvider();
setDraft(next);
}, [provider]);
const updateDraft = (partial: Partial<LocalAIProvider>) => {
setDraft((prev) => ({ ...prev, ...partial }));
};
const handleSave = () => {
onSave(draft);
};
return (
<Dialog open={!!provider} onOpenChange={onOpenChange}>
<DialogContent size="2xl">
<DialogHeader>
<DialogTitle>{provider?.apiKeySet ? t("setting.ai.edit-provider") : t("setting.ai.add-provider")}</DialogTitle>
<DialogDescription>{t("setting.ai.dialog-description")}</DialogDescription>
</DialogHeader>
<div className="grid grid-cols-1 sm:grid-cols-2 gap-3">
<div className="flex flex-col gap-1.5">
<Label>{t("setting.ai.provider-title")}</Label>
<Input value={draft.title} onChange={(e) => updateDraft({ title: e.target.value })} placeholder="OpenAI" />
</div>
<div className="flex flex-col gap-1.5">
<Label>{t("setting.ai.provider-type")}</Label>
<Select
value={String(draft.type)}
onValueChange={(value) => updateDraft({ type: Number(value) as InstanceSetting_AIProviderType })}
>
<SelectTrigger className="w-full">
<SelectValue />
</SelectTrigger>
<SelectContent>
{providerTypeOptions.map((type) => (
<SelectItem key={type} value={String(type)}>
{getProviderTypeLabel(type)}
</SelectItem>
))}
</SelectContent>
</Select>
</div>
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.endpoint")}</Label>
<Input
value={draft.endpoint}
onChange={(e) => updateDraft({ endpoint: e.target.value })}
placeholder={getDefaultEndpointPlaceholder(draft.type)}
/>
<p className="text-xs text-muted-foreground">{t("setting.ai.endpoint-hint")}</p>
</div>
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.api-key")}</Label>
<Input
type="password"
value={draft.apiKey}
onChange={(e) => updateDraft({ apiKey: e.target.value })}
placeholder={draft.apiKeySet ? t("setting.ai.keep-api-key") : ""}
/>
{draft.apiKeySet && (
<p className="text-xs text-muted-foreground">{t("setting.ai.current-key", { key: draft.apiKeyHint || "-" })}</p>
)}
</div>
</div>
<DialogFooter>
<Button variant="ghost" onClick={() => onOpenChange(false)}>
{t("common.cancel")}
</Button>
<Button onClick={handleSave}>{t("common.save")}</Button>
</DialogFooter>
</DialogContent>
</Dialog>
);
};
const getDefaultEndpointPlaceholder = (type: InstanceSetting_AIProviderType) => {
switch (type) {
case InstanceSetting_AIProviderType.OPENAI:
return "https://api.openai.com/v1";
case InstanceSetting_AIProviderType.GEMINI:
return "https://generativelanguage.googleapis.com/v1beta";
default:
return "";
}
};
export default AISection;
Note: this references Textarea from @/components/ui/textarea. Verify that component exists by running:
ls web/src/components/ui/textarea.tsx
If the file is missing, the project doesn't have a Textarea primitive yet — fall back to the native <textarea> element with the same classes used by the project's Input for visual consistency. Keep the same props (value, onChange, placeholder, rows, disabled).
cd web && pnpm lint 2>&1 | tail -40
Expected: PASS. Common failures: missing import, mismatch between schema name (InstanceSetting_TranscriptionConfigSchema) and what buf generate produced — verify the exact name in web/src/types/proto/api/v1/instance_service_pb.ts (it may be InstanceSetting_TranscriptionConfig paired with InstanceSetting_TranscriptionConfigSchema, matching the AISetting pattern).
Start backend and frontend:
go run ./cmd/memos --port 8081 &
cd web && pnpm dev
Open http://localhost:3001/, sign in as the host user, navigate to Settings → AI:
Verify the AI integrations group shows the existing provider table (or empty state).
Verify the Transcription group renders with the four fields disabled when no provider is selected.
Add a provider with type OPENAI and a key. The transcription section now lets you select it. Pick the provider; the model placeholder shows whisper-1. Type whisper-1 in the model field, leave language empty, leave prompt empty, save.
Refresh the page. The transcription section retains the saved provider and model.
Change the provider title → save. The transcription section still references the same provider by id (title in dropdown updates).
Delete the provider → the transcription section's providerId is cleared (model field becomes disabled).
Step 5: Commit
git add web/src/components/Settings/AISection.tsx web/src/locales/en.json
git commit -m "feat(settings): add Transcription configuration section
Splits the AI settings page into 'AI integrations' (existing
provider list) and 'Transcription' (new). The transcription form
chooses a provider, model, default language, and prompt hint. Save
writes the entire AI setting in one request."
MemoEditor to the persisted transcription configFiles:
web/src/components/MemoEditor/index.tsx:31-67,130-159web/src/components/MemoEditor/services/transcriptionService.tsThe editor previously scanned aiSetting.providers for the first one with an API key. It now reads aiSetting.transcription.providerId, validates the reference, and calls the service without a provider argument.
transcriptionService.tsReplace the entire contents of web/src/components/MemoEditor/services/transcriptionService.ts with:
import { create } from "@bufbuild/protobuf";
import { aiServiceClient } from "@/connect";
import { TranscribeRequestSchema, TranscriptionAudioSchema, TranscriptionConfigSchema } from "@/types/proto/api/v1/ai_service_pb";
export const transcriptionService = {
async transcribeFile(file: File): Promise<string> {
const content = new Uint8Array(await file.arrayBuffer());
const response = await aiServiceClient.transcribe(
create(TranscribeRequestSchema, {
config: create(TranscriptionConfigSchema, {}),
audio: create(TranscriptionAudioSchema, {
source: {
case: "content",
value: content,
},
filename: file.name,
contentType: file.type,
}),
}),
);
return response.text;
},
};
Note: providerId is intentionally omitted — the server resolves it from InstanceAISetting.transcription.providerId.
MemoEditor/index.tsxOpen the file. Two regions change:
Region A — replace lines 31–67 (the TRANSCRIPTION_PROVIDER_TYPES constant, the transcriptionProvider lookup, and the unused InstanceSetting_AIProviderType import path) with:
// (delete the TRANSCRIPTION_PROVIDER_TYPES constant entirely — no longer needed)
And inside MemoEditorImpl, replace:
const transcriptionProvider = useMemo(
() => aiSetting.providers.find((provider) => provider.apiKeySet && TRANSCRIPTION_PROVIDER_TYPES.includes(provider.type)),
[aiSetting.providers],
);
with:
const canTranscribe = useMemo(() => {
const providerId = aiSetting.transcription?.providerId ?? "";
if (!providerId) return false;
const provider = aiSetting.providers.find((p) => p.id === providerId);
return Boolean(provider?.apiKeySet);
}, [aiSetting.providers, aiSetting.transcription?.providerId]);
Then update the import line at the top of the file. Currently:
import { InstanceSetting_AIProviderType, InstanceSetting_Key } from "@/types/proto/api/v1/instance_service_pb";
Becomes (drop the InstanceSetting_AIProviderType import — it's no longer referenced in this file):
import { InstanceSetting_Key } from "@/types/proto/api/v1/instance_service_pb";
Region B — replace lines 130–159 (the handleTranscribeRecordedAudio callback and any guards) so it consults canTranscribe and calls the service without a provider arg:
const handleTranscribeRecordedAudio = useCallback(
async (localFile: LocalFile) => {
if (!canTranscribe) {
dispatch(actions.addLocalFile(localFile));
setIsTranscribingAudio(false);
setIsAudioRecorderOpen(false);
return;
}
try {
const text = (await transcriptionService.transcribeFile(localFile.file)).trim();
if (!text) {
dispatch(actions.addLocalFile(localFile));
toast.error(t("editor.audio-recorder.transcribe-empty"));
return;
}
insertTranscribedText(text);
toast.success(t("editor.audio-recorder.transcribe-success"));
} catch (error) {
console.error(error);
toast.error(errorService.getErrorMessage(error) || t("editor.audio-recorder.transcribe-error"));
dispatch(actions.addLocalFile(localFile));
} finally {
setIsTranscribingAudio(false);
setIsAudioRecorderOpen(false);
}
},
[actions, canTranscribe, dispatch, insertTranscribedText, t],
);
Then update handleTranscribeAudioRecording (currently around line 225) so its guard uses canTranscribe:
const handleTranscribeAudioRecording = () => {
if (!canTranscribe || isTranscribingAudio) {
return;
}
setIsTranscribingAudio(true);
const didStop = audioRecorder.stopRecording("transcribe");
if (!didStop) {
setIsTranscribingAudio(false);
}
};
Finally, search the file for any remaining references to transcriptionProvider and replace them with canTranscribe. Also update the prop passed to <AudioRecorderPanel canTranscribe={...}> if it currently uses transcriptionProvider — replace with canTranscribe.
cd web && pnpm lint 2>&1 | tail -20
Expected: PASS.
With the dev server running:
whisper-1, save. Open the recorder; the Transcribe button is now visible. Record a short clip in English, click Transcribe — text appears in the editor.git add web/src/components/MemoEditor/index.tsx web/src/components/MemoEditor/services/transcriptionService.ts
git commit -m "feat(memo-editor): use persisted transcription config
The editor's transcribe button now reflects InstanceAISetting.
transcription.providerId rather than an implicit \"first provider
with apiKey\" pick. The transcribeFile service no longer takes a
provider argument — the server resolves it from settings."
provider_id in the serverFiles:
server/router/api/v1/ai_service.go (clean-up only)Task 6 already deleted the provider_id is required InvalidArgument branch. This task confirms there's no orphaned helper or constant left behind.
grep -n "resolveAIProviderForTranscription\|provider_id is required" server/router/api/v1/ai_service.go
Expected: no matches (Task 6 should have removed both). If anything matches, delete it.
golangci-lint run ./server/router/api/v1/...
Expected: PASS.
go test -race ./server/... ./internal/...
Expected: PASS.
If Step 1 returned matches and you removed code, commit:
git add server/router/api/v1/ai_service.go
git commit -m "chore(api/ai): remove orphaned helpers from old transcribe flow"
Otherwise, skip the commit — the task is a no-op verification.
Files: none — verification only.
go test -race ./...
Expected: PASS.
cd web && pnpm lint && pnpm build
Expected: PASS for both.
With backend on :8081 and frontend on :3001:
https://api.openai.com/v1), api-key sk-... (real or fake).whisper-1, leave language and prompt empty. Save. Toast confirms.https://api.groq.com/openai/v1, api-key gsk_.... Switch transcription's provider dropdown to Groq, set model to whisper-large-v3-turbo, save.If verification surfaced any minor fix (e.g., a string typo in the locale), commit it now:
git add -A
git commit -m "chore(settings): polish transcription section copy"
Spec coverage:
TranscriptionConfig (store + api): Tasks 1, 2 ✓Transcribe start, fall-through resolution, FailedPrecondition when unconfigured: Task 6 ✓provider_id references existing provider; length caps on model/language/prompt: Task 4 ✓TranscribeRequest.provider_id becomes optional: Task 5 ✓aiSetting.transcription.providerId, service drops provider arg: Task 8 ✓provider_id and fall into "transcription disabled" branch ✓transcription-warning-gemini-webm string in Task 7 ✓Placeholder scan: No "TBD", "TODO", or "implement later" markers in this plan. Each step contains exact file paths, exact code, exact commands, expected output.
Type / name consistency:
InstanceAISetting.transcription → TranscriptionConfig (Task 1).storepb.TranscriptionConfig (referenced in Tasks 3, 4, 6).InstanceSetting.AISetting.transcription → InstanceSetting.TranscriptionConfig (Task 2). Generated Go type: v1pb.InstanceSetting_TranscriptionConfig (Tasks 3, 4). Generated TS type: InstanceSetting_TranscriptionConfig and InstanceSetting_TranscriptionConfigSchema (Task 7).Things to watch during execution:
buf generate produces TS type names that differ slightly across generators. If InstanceSetting_TranscriptionConfigSchema doesn't exist after Task 2's regeneration, check the actual export name in web/src/types/proto/api/v1/instance_service_pb.ts and adjust Task 7's imports accordingly.Textarea UI primitive existence is verified mid-Task 7. If absent, fall back to <textarea> with project styling.prepareInstanceAISettingForUpdate runs only when the AI setting key is being updated; the existing lookup uses s.Store.GetInstanceAISetting(ctx) which already returns the current state with transcription populated (after Task 1).