mkdocs/docs/en/basic/models.md
This page answers the most practical first-time question:
what do you need to configure first so Prompt Optimizer can actually run?
If you are still going through first-time setup, read this together with Quick Start.
The entry point is the Model Management button in the top-right corner.
!!! note For your first run, do not configure too many providers at once. One working text model is more useful than a long unfinished provider list.
Most first-time users do not need a large model list.
| Your goal | Minimum setup |
|---|---|
| Start using text workspaces | 1 text model |
| Compare results | 2 text models |
| Use image workspaces | 1 text model + 1 image model |
| Use reference-image replication or style learning in text-to-image | 1 text model + 1 image model + 1 image recognition model |
In text workspaces:
They can be the same model, but they do not have to be.
Configure one text model.
That one model is enough to start:
Configure two text models:
This makes it easier to tell whether the difference comes from the prompt or from the model.
Configure at least:
Because:
If you want to use:
you also need an image recognition model.
Those actions are not normal image generation. They first require a model that can understand the image and turn it into prompt clues or variables.
Choose the provider you know best and can connect with the least friction.
After you add the model, run Test Connection.
The simplest starting points are:
If you can complete:
then your minimum setup is already good enough.
Add a second text model only if you want comparison. Add an image model only if you are entering image workspaces.
Examples:
In most cases you only need:
Some providers have provider-specific request details:
If you run Ollama locally, use the built-in Ollama provider.
Typical behavior:
http://localhost:11434/v1If your service is OpenAI-compatible, use Custom.
Typical cases:
Example:
Provider: Custom
Base URL: https://your-api.example.com/v1
Model: your-model-name
API Key: fill based on your service
The browser sends requests directly to your model service, so you may hit:
Usually better for:
Docker packages the web UI and MCP together, but the page still runs in the browser, so browser restrictions still matter.
Related pages:
The current codebase currently includes:
In addition to add / edit / delete, the text-side manager supports:
The image-side manager supports:
Built-in image presets may expose capability differences between model versions. For example, Seedream 4.5 supports multi-image scenarios, while Seedream 5.0 Lite has its own default settings. Prefer checking the capability tags in the model manager instead of assuming from the model name alone.
There is also a function-model area for image recognition.
If you want image extraction, reference-image replication, or style learning, do not stop at text and image generation models. Make sure the image recognition model is configured too.
You can stop tuning model setup for now if all three are true:
If you need backup or migration, use Data Management.
Common reasons:
No. In most cases:
Check these first: