Back to Docsgpt

PostgreSQL for User Data

docs/content/Deploying/Postgres-Migration.mdx

0.17.15.0 KB
Original Source

import { Callout } from 'nextra/components'

PostgreSQL for User Data

DocsGPT stores conversations, agents, prompts, sources, attachments, workflows, logs, and token usage in PostgreSQL. MongoDB is no longer required.

<Callout type="info" emoji="ℹ️"> Vector stores are independent — `VECTOR_STORE` can still be `pgvector`, `faiss`, `qdrant`, `milvus`, `elasticsearch`, or `mongodb`. </Callout>

Quickstart

Three common paths. Each assumes Postgres 13+ and the default env vars AUTO_MIGRATE=true / AUTO_CREATE_DB=true (both ship enabled).

Docker Compose

The bundled compose file ships a postgres service. App boot handles the rest — no sidecar, no init job.

bash
cd deployment && docker compose up

Managed Postgres (Neon, RDS, Supabase, Cloud SQL)

Point POSTGRES_URI at the provider-given URI. The app applies the schema on first boot.

bash
export POSTGRES_URI="postgresql://user:pass@host/docsgpt?sslmode=require"
flask --app application/app.py run --host=0.0.0.0 --port=7091

Bare-metal Postgres

Run Postgres locally and point POSTGRES_URI at the default superuser. First boot creates both the database and the schema.

bash
export POSTGRES_URI="postgresql://postgres@localhost/docsgpt"
flask --app application/app.py run --host=0.0.0.0 --port=7091

Prefer a dedicated non-superuser role? Create it once as superuser — the app never creates roles.

sql
CREATE ROLE docsgpt LOGIN PASSWORD 'docsgpt' CREATEDB;
-- Then: POSTGRES_URI=postgresql://docsgpt:docsgpt@localhost/docsgpt

How auto-bootstrap works

Two env vars control startup behavior. Both default to true in the app and are idempotent.

SettingEffectRequires
AUTO_CREATE_DBIf the target database is missing, connects to the server's postgres maintenance DB and issues CREATE DATABASE.CREATEDB privilege (or superuser)
AUTO_MIGRATERuns alembic upgrade head against the target database.Table-owner or superuser on the target DB

Concurrent workers serialize through alembic_version, so rolling restarts are safe. If the role lacks the required privilege, startup fails fast with a clear error rather than silently skipping.

<Callout type="info" emoji="ℹ️"> Convenient in dev. In production, disable both and run migrations as an explicit step — see [Production hardening](#production-hardening). </Callout>

Production hardening

Set both flags to false in prod and run migrations as a gated, auditable step before rolling out the app.

env
AUTO_MIGRATE=false
AUTO_CREATE_DB=false

Run migrations from your CI/CD pipeline, a Kubernetes Job, or an init-container ahead of the app rollout:

bash
python scripts/db/init_postgres.py
# equivalently:
alembic -c application/alembic.ini upgrade head

The reasoning: the app's runtime role shouldn't carry DDL privileges, migrations should gate each rollout, and an explicit step is auditable — implicit first-boot bootstrap is fine for dev but muddies prod deploys.

<Callout type="warning" emoji="⚠️"> Migrations are not reversible by the app. Always back up production Postgres before running `alembic upgrade head` on a new release. </Callout>

Migrating from MongoDB

One-shot, offline, app stopped. The app itself will create the Postgres schema when it boots — you only need to run the data copy.

bash
pip install -r application/requirements.txt
pip install 'pymongo>=4.6'

export POSTGRES_URI="postgresql://docsgpt:docsgpt@localhost:5432/docsgpt"
export MONGO_URI="mongodb://user:pass@host:27017/docsgpt"

python scripts/db/backfill.py --dry-run    # preview
python scripts/db/backfill.py              # real run
# or: python scripts/db/backfill.py --tables users,agents

Then unset MONGO_URI and start the backend — nothing consults Mongo in the default path anymore. The backfill is idempotent (per-table ON CONFLICT upserts, event-log tables deduped via mongo_id), so re-running is safe and re-syncs any drifted rows. Keep Mongo online until you've verified Postgres is complete; decommission afterwards unless you still use it as a vector store.

<Callout type="warning" emoji="⚠️"> No dual-write window and no runtime flag — on the current version, Postgres is the only user-data store the backend reads or writes. </Callout>

Troubleshooting

  • relation "..." does not exist — schema not applied. Either let the app bootstrap it (AUTO_MIGRATE=true) or run python scripts/db/init_postgres.py.
  • permission denied to create database — the role lacks CREATEDB. As superuser: ALTER ROLE <name> CREATEDB;. Or create the database manually and set AUTO_CREATE_DB=false.
  • role "docsgpt" does not exist — roles are never auto-created. As superuser: CREATE ROLE docsgpt LOGIN PASSWORD '...';.
  • SSL errors on a managed provider — append ?sslmode=require to POSTGRES_URI.
  • ModuleNotFoundError: pymongopip install 'pymongo>=4.6' (only needed for the one-shot Mongo backfill).