Model Management

This page answers the most practical first-time question:

what do you need to configure first so Prompt Optimizer can actually run?

If you are still going through first-time setup, read this together with Quick Start.

The entry point is the Model Management button in the top-right corner.

!!! note For your first run, do not configure too many providers at once. One working text model is more useful than a long unfinished provider list.

First-time users: only do these 3 steps

Add one text model
Run one optimize / test / evaluate flow in a text workspace
Only then decide whether you need a second text model or an image model

Most first-time users do not need a large model list.

Minimum working setup

Your goal	Minimum setup
Start using text workspaces	1 text model
Compare results	2 text models
Use image workspaces	1 text model + 1 image model
Use reference-image replication or style learning in text-to-image	1 text model + 1 image model + 1 image recognition model

This is enough to understand at first

Text model vs image model

Text models handle left-side analysis, optimization, iteration, and text-side testing/evaluation
Image models only handle actual image generation on the right side

Left-side model vs right-side model

In text workspaces:

left-side model: analyzes and improves prompts
right-side model: executes prompts and produces evidence

They can be the same model, but they do not have to be.

How to configure models for the first run

Case A: you just want the app to work

Configure one text model.

That one model is enough to start:

left-side analysis / optimization
right-side testing
right-side Result Evaluation
right-side Compare Evaluation

Case B: you want real result comparison

Configure two text models:

one main model
one comparison model

This makes it easier to tell whether the difference comes from the prompt or from the model.

Case C: you want image workspaces

Configure at least:

one text model
one image model

Because:

the left side still uses a text model to improve image prompts
the right side uses an image model to generate the actual image

Case D: you want reference-image actions inside text-to-image

If you want to use:

reference-image replication
style learning
prompt-variable extraction from images

you also need an image recognition model.

Those actions are not normal image generation. They first require a model that can understand the image and turn it into prompt clues or variables.

Recommended setup order

Step 1: add one text model

Choose the provider you know best and can connect with the least friction.

Step 2: make sure connection testing succeeds

After you add the model, run Test Connection.

Step 3: run one text workspace

The simplest starting points are:

If you can complete:

left-side optimization
right-side testing
one evaluation

then your minimum setup is already good enough.

Step 4: add more models only when needed

Add a second text model only if you want comparison. Add an image model only if you are entering image workspaces.

Three common connection patterns

1. Public model platforms

Examples:

OpenAI
Gemini
DeepSeek
SiliconFlow

In most cases you only need:

choose the provider
paste the API key
select the model
run connection testing

Some providers have provider-specific request details:

OpenAI-compatible text models may use either Chat Completions or Responses request style, depending on the configured provider and model capability.
DeepSeek configurations can expose thinking or reasoning parameters in advanced settings. If output behavior looks different from what you expect, check whether those parameters are enabled.

2. Ollama

If you run Ollama locally, use the built-in Ollama provider.

Typical behavior:

default endpoint: http://localhost:11434/v1
API key often not required
model list can refresh from your installed local models

3. Custom

If your service is OpenAI-compatible, use Custom.

Typical cases:

LM Studio
internal company gateway
self-hosted OpenAI-compatible service
any service that needs a custom base URL

Example:

text

Provider: Custom
Base URL: https://your-api.example.com/v1
Model: your-model-name
API Key: fill based on your service

If connection fails, then check deployment and environment

Web / hosted version

The browser sends requests directly to your model service, so you may hit:

CORS
mixed content when HTTPS pages call local HTTP endpoints

Desktop app

Usually better for:

Ollama
LM Studio
local network services
internal APIs
custom gateways with browser restrictions

Docker

Docker packages the web UI and MCP together, but the page still runs in the browser, so browser restrictions still matter.

Supported text providers

The current codebase currently includes:

OpenAI
Gemini
Anthropic
DeepSeek
SiliconFlow
Zhipu AI
DashScope
OpenRouter
ModelScope
MiniMax
Ollama
Custom (OpenAI-compatible endpoints)

What the model manager can do

In addition to add / edit / delete, the text-side manager supports:

connection testing
cloning configs
refreshing model lists
advanced parameters
provider-specific API-key links for some providers

The image-side manager supports:

add / edit / clone / delete
enable / disable
connection testing
preview test image
provider / model / capability tags

Built-in image presets may expose capability differences between model versions. For example, Seedream 4.5 supports multi-image scenarios, while Seedream 5.0 Lite has its own default settings. Prefer checking the capability tags in the model manager instead of assuming from the model name alone.

There is also a function-model area for image recognition.

If you want image extraction, reference-image replication, or style learning, do not stop at text and image generation models. Make sure the image recognition model is configured too.

How to tell whether setup is already good enough

You can stop tuning model setup for now if all three are true:

at least one text model passes connection testing
you can produce one real result in a text workspace
you can run one evaluation on that result

Where configuration is stored

web / hosted version: current browser storage
desktop app: local application data
extension: extension-local storage

If you need backup or migration, use Data Management.

Common questions

Connection test passes, but real runs still fail

Common reasons:

quota or billing limits
wrong model name
browser-side CORS / mixed-content blocking
left-side model and right-side model are not what you thought they were

Do I need many models on day one?

No. In most cases:

one text model is enough for text workspaces
add a second text model only for comparison
add image models only for image workspaces

I configured a model, but the app still won’t run

Check these first:

did connection testing actually succeed?
is this a text model when the page expects text?
are you in a browser trying to call a local HTTP endpoint?
does this workspace also need an image model or additional inputs?

Model Management

Model Management

First-time users: only do these 3 steps

Minimum working setup

This is enough to understand at first

Text model vs image model

Left-side model vs right-side model

How to configure models for the first run

Case A: you just want the app to work

Case B: you want real result comparison

Case C: you want image workspaces

Case D: you want reference-image actions inside text-to-image

Recommended setup order

Step 1: add one text model

Step 2: make sure connection testing succeeds

Step 3: run one text workspace

Step 4: add more models only when needed

Three common connection patterns

1. Public model platforms

2. Ollama

3. Custom

If connection fails, then check deployment and environment

Web / hosted version

Desktop app

Docker

Supported text providers

What the model manager can do

How to tell whether setup is already good enough

Where configuration is stored

Common questions

Connection test passes, but real runs still fail

Do I need many models on day one?

I configured a model, but the app still won’t run

Related pages