plugins/plugin-vision/native/doctr.cpp/README.md
C++ port of Mindee doCTR built directly on ggml. Two stages:
db_mobilenet_v3_large backbone + DBNet head → probability map.crnn_mobilenet_v3_small backbone + BiLSTM + CTC head → per-crop logits.The post-processing (DBNet contour → bbox, CTC greedy decode) stays in TypeScript — both are trivial and runtime-portable. This C++ library runs only the forward pass; the JS caller orchestrates det → crop → rec → decode.
Phase 1 (current): FFI surface scaffolded; weight conversion script
authored; build glue authored. GGUF weight files are not yet built. The TS
binding throws a clear error until vision/doctr-det.gguf and
vision/doctr-rec.gguf are present on disk.
cd plugins/plugin-vision/native/doctr.cpp
cmake -B build -S . -DGGML_METAL=ON # macOS arm64
cmake --build build --config Release
Produces a single shared library libdoctr.dylib / .so / .dll consumed
via bun:ffi from plugin-vision/src/native/doctr-ffi.ts.
python scripts/convert.py \
--variant db_mobilenet_v3_large \
--out vision/doctr-det.gguf
python scripts/convert.py \
--variant crnn_mobilenet_v3_small \
--out vision/doctr-rec.gguf
The detection variant writes a single tensor graph + mean/std metadata.
The recognition variant additionally writes the character vocabulary as a
doctr.charset KV entry inside the GGUF file.
See include/doctr.h. The ABI is intentionally minimal:
doctr_det_ctx * doctr_det_init(const char * gguf_path);
int doctr_det_run(doctr_det_ctx *, const float * rgb_chw, int h, int w,
float * out_prob, int * out_h, int * out_w);
void doctr_det_free(doctr_det_ctx *);
doctr_rec_ctx * doctr_rec_init(const char * gguf_path);
int doctr_rec_run(doctr_rec_ctx *, const float * rgb_chw, int h, int w,
float * out_logits, int * out_T, int * out_C);
const char * doctr_rec_charset(doctr_rec_ctx *);
void doctr_rec_free(doctr_rec_ctx *);