packages/transcribe/README.md
The transcribe server embeds the llama.cpp binary directly in the Docker image. The AI models must be downloaded separately and mounted as a volume.
mkdir -p ./data/models
chmod 755 ./data
wget -O ./data/models/Model-7.6B-Q4_K_M.gguf https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf/resolve/main/Model-7.6B-Q4_K_M.gguf
wget -O ./data/models/mmproj-model-f16.gguf https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf/resolve/main/mmproj-model-f16.gguf
.env-transcribe-sample to your Docker configuration directory..env-transcribe.API_KEY to a secure value.docker run --rm --env-file .env-transcribe -p 4567:4567 \
-v ./data:/data \
joplin/transcribe:amd64-latest
The container automatically creates the following inside /data:
images/ - uploaded imagesmodels/ - AI models (you provide these)queue.sqlite3 - job queue databaseThe minimal configuration is provided in .env-sample and docker-compose.server.yml.
Run cp .env-sample .env
Update any options you need in .env
Start the server:
docker compose -f docker-compose.server.yml --profile full up --detached
For advanced configuration, refer to .env-sample-transcribe.
The transcribe container runs with these security measures:
transcribe user, not root/app/packages/transcribe/images and /tmp are writable)Integration tests requiring the full model do not run by default (including on CI). Be cautious when modifying the model or prompts.
The disabled test is located at: workers/JobProcessor.test.ts.
Run all tests with:
yarn test-all
From packages/transcribe, run:
yarn start
Required:
API_KEY: Authentication key for API requestsDATA_DIR: Base directory for all data (images, models, database)HTR_CLI_BINARY_PATH: Path to the llama-mtmd-cli binaryOptional:
QUEUE_DRIVER: sqlite (default in Docker) or pg for PostgreSQLThe following paths are automatically derived from DATA_DIR:
$DATA_DIR/images - uploaded images$DATA_DIR/models - AI models$DATA_DIR/queue.sqlite3 - SQLite database (when using sqlite driver)All requests must include the Authorization header with the value set to your API_KEY.
/transcribeCreates a transcription job. The uploaded image is resized, stored on disk, and assigned to a job record in the database.
Request Body:
multipart/form-datafile (required) – the image file to processResponse:
{
"jobId": "bcd2e633-eb10-44cb-a280-bf723238c12e"
}
Example (cURL):
curl --request POST \
--url http://localhost:4567/transcribe \
--header 'Authorization: api-key' \
--header 'Content-Type: multipart/form-data' \
--form file=@/home/js/Pictures/2025-07-24_17-42_1.png
/transcribe/{jobId}Fetches the result of a transcription job created with POST /transcribe.
Request:
jobId.Example Responses:
{
"id": "57ebd2e2-b496-40ab-9008-5f861bcb7858",
"state": "created"
}
{
"id": "07f09553-f5e9-467e-b98d-406778e61969",
"state": "active"
}
{
"id": "57ebd2e2-b496-40ab-9008-5f861bcb7858",
"completedOn": "2025-06-11T18:20:22.000Z",
"output": {
"result": "markdown\r\n# Main title\r\n\r\nSome text here. This should take more than one line.\r\n\r\n## Sub title\r\n\r\n- One kind\r\n - of list\r\n - sub-item\r\n\r\n## Conclusion\r\n\r\nLet's finish here."
},
"state": "completed"
}
Example (cURL):
curl --request GET \
--url http://localhost:4567/transcribe/57ebd2e2-b496-40ab-9008-5f861bcb7858 \
--header 'Authorization: api-key'