fern/docs/pages/getting-started/quickstart.mdx
PrivateGPT connects to any OpenAI-compatible LLM server and exposes a private, self-hosted AI API. This guide gets you from zero to a running server in four steps.
<Note> **Prerequisites:** You need an OpenAI-compatible LLM server running locally. Pick one from the [Providers](/providers/overview) page — [Ollama](/providers/ollama) is the easiest way to start. </Note> <Steps> <Step title="Install PrivateGPT"> <Tabs> <Tab title="Linux"> ```bash # Install uv first curl -LsSf https://astral.sh/uv/install.sh | sh # Then install PrivateGPT
uv tool install --python 3.11 \
--find-links https://wheels.privategpt.dev/packages/ \
"private-gpt[core]"
```
</Tab>
<Tab title="macOS">
```bash
brew tap zylon-ai/tap
brew install private-gpt
```
</Tab>
<Tab title="Windows">
```powershell
# Install uv first
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# Then install PrivateGPT
uv tool install --python 3.11 `
--find-links https://wheels.privategpt.dev/packages/ `
"private-gpt[core]"
```
</Tab>
</Tabs>
<Tabs>
<Tab title="Ollama">
```bash
# Example: pull a model and start the server
ollama pull qwen3.5:35b # LLM (~24 GB)
ollama pull mxbai-embed-large # Embeddings (~670 MB)
# Start the server (runs on port 11434)
ollama serve
```
<Warning>
Ollama does not expose a tokenizer endpoint. PrivateGPT falls back to approximate token counting, which may affect context-window management. See [Ollama limitations](/providers/ollama#limitations).
</Warning>
</Tab>
<Tab title="LM Studio">
1. Download and install [LM Studio](https://lmstudio.ai).
2. Load a model (e.g. `Qwen3-35B-A3B`).
3. In **Developer → Local Server**, set the chat model to your LLM and the **Embedding model** to something like `mxbai-embed-large`.
4. Click **Start Server** (default port: 1234).
See the [LM Studio provider guide](/providers/lmstudio) for full setup.
</Tab>
<Tab title="LlamaCPP Server">
```bash
# Start the LLM server
llama-server \
--model qwen3-35b-a3b.gguf \
--port 8000
# Start a second server for embeddings
llama-server \
--model mxbai-embed-large-v1-f16.gguf \
--port 8001 \
--embeddings
```
See the [LlamaCPP provider guide](/providers/llamacpp) for download and build instructions.
</Tab>
<Tab title="vLLM">
```bash
# Start the LLM server
docker run --gpus all \
-p 8000:8000 \
vllm/vllm-openai:latest \
--model Qwen/Qwen3.5-35B-A3B-GPTQ-Int4
# Start a second server for embeddings
docker run --gpus all \
-p 8001:8000 \
vllm/vllm-openai:latest \
--model mixedbread-ai/mxbai-embed-large-v1 \
--task embed
```
See the [vLLM provider guide](/providers/vllm) for full setup. Requires an NVIDIA GPU.
</Tab>
</Tabs>
<Tabs>
<Tab title="macOS / Linux">
```bash
OPENAI_API_BASE=http://localhost:<llm-port>/v1 \
OPENAI_EMBEDDING_API_BASE=http://localhost:<embedding-port>/v1 \
private-gpt serve
```
</Tab>
<Tab title="Windows (PowerShell)">
```powershell
$env:OPENAI_API_BASE = "http://localhost:<llm-port>/v1"
$env:OPENAI_EMBEDDING_API_BASE = "http://localhost:<embedding-port>/v1"
private-gpt serve
```
</Tab>
<Tab title="Windows (CMD)">
```cmd
set OPENAI_API_BASE=http://localhost:<llm-port>/v1
set OPENAI_EMBEDDING_API_BASE=http://localhost:<embedding-port>/v1
private-gpt serve
```
</Tab>
</Tabs>
If startup succeeds, PrivateGPT will be available on port `8080`.
The API is available at `http://localhost:8080` and follows the Anthropic API spec. See the [API Reference](/api-reference/api-reference) for all endpoints.