examples/patient_intake_extraction_dspy/README.md
Intake forms are visual: checkboxes, hand-filled fields, tables of medications — so we extract from the rendered page, not the text.
</p> <p align="center"> <strong>Star us ❤️ →</strong> <a href="https://github.com/cocoindex-io/cocoindex" title="Star CocoIndex on GitHub"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/star-btn-small-light.svg"></picture></a> · <a href="https://cocoindex.io/docs/examples/patient-intake-dspy/" title="Read the full walkthrough"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/docs-inline-light.svg"></picture></a> · <a href="https://discord.com/invite/zpA9S2DR7s" title="Join the CocoIndex Discord"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://cocoindex.io/blobs/github/homepage/discord-inline-light.svg"></picture></a> </p> <div align="center"> </div>Intake forms are visual — checkboxes, hand-filled fields, tables of medications — so instead of extracting text, this pipeline renders each PDF page to an image and lets a vision model read the form the way a person would. DSPy handles the prompting: you declare a typed Signature and it produces a Pydantic Patient, no prompt strings to hand-tune. You declare the transformation in native Python and your own types — target_state = transformation(source_state) — and the heavy lifting (incremental processing, change tracking, managed targets) runs in a Rust engine underneath, so only changed forms get re-extracted and each one becomes exactly one JSON file on disk.
The output is just Python types: Patient (in models.py) is a Pydantic model with nested models for address, insurance, medications, and the rest. That model is the contract — DSPy fills it in, Pydantic validates it, it serializes straight to JSON. Rather than write a prompt, you declare a typed Signature and wrap it in ChainOfThought so the model reasons before it answers; extract_patient rasterizes every page to a PNG and runs the extractor. Read it in main.py:
class PatientExtractionSignature(dspy.Signature):
"""Extract structured patient information from a medical intake form image."""
form_images: list[dspy.Image] = dspy.InputField(desc="Images of the intake form pages")
patient: Patient = dspy.OutputField(desc="Extracted patient information")
class PatientExtractor(dspy.Module):
def __init__(self) -> None:
super().__init__()
self.extract = dspy.ChainOfThought(PatientExtractionSignature)
def forward(self, form_images: list[dspy.Image]) -> Patient:
return self.extract(form_images=form_images).patient
@coco.fn
def extract_patient(pdf_content: bytes) -> Patient:
pdf_doc = pymupdf.open(stream=pdf_content, filetype="pdf")
form_images = []
for page in pdf_doc:
pix = page.get_pixmap(matrix=pymupdf.Matrix(2, 2)) # 2x scale: small print stays legible
form_images.append(dspy.Image(pix.tobytes("png")))
pdf_doc.close()
return PatientExtractor()(form_images=form_images)
Because the OutputField is typed as Patient, DSPy asks the model for that exact shape and hands back a validated object, not a string to parse. The LM is configured once at module load — dspy.configure(lm=dspy.LM("gemini/gemini-2.5-flash")). process_patient_form (decorated @coco.fn(memo=True)) reads each PDF, extracts the Patient, and declares one JSON file named after the source form; app_main runs one component per file with mount_each.
Step-by-step walkthrough with the Pydantic schema, the DSPy signature and module, the PDF rasterization, and what happens on each kind of change.
</p>Matrix(2, 2) so small hand-entered text stays legible — the difference between reading a zip code and guessing one. No OCR, no Markdown step.Signature plus ChainOfThought is the whole spec; DSPy compiles the typed in/out into the actual prompt and reasons before answering on dense, checkbox-heavy forms.OutputField is Patient, so DSPy returns a validated Pydantic object; optional fields and default_factory=list mean a form that omits medications yields an empty list, not a failure.@coco.fn(memo=True) skips a form entirely when its bytes and the function's code are unchanged, so you never re-run the slow, paid vision extraction on a PDF you've already processed.1. Install — this pulls in DSPy, PyMuPDF (to rasterize PDFs), and Pillow (for the image bytes DSPy passes along):
pip install -e .
2. Configure — the extraction runs on a Gemini vision model:
cp .env.example .env # set GEMINI_API_KEY
3. Run the pipeline — a catch-up run scans the forms, extracts, writes, and exits:
cocoindex update main.py
Each PDF in data/patient_forms/ becomes a JSON file in output_patients/, named after the source form:
ls output_patients/
# Patient_Intake_Form_David_Artificial.json
# Patient_Intake_Form_Emily_Artificial.json
# ...one .json per intake PDF
Open one and you'll see the full Patient record — name, date of birth, address, insurance, the medication and allergy lists, consent — extracted straight from the rendered form and validated against the schema. Add a field to the Patient model or switch the LM, and the next run re-extracts only the affected forms.
<a href="https://cocoindex.io/docs">Docs</a> · <a href="https://cocoindex.io/docs/examples/patient-intake-dspy/">Walkthrough</a> · <a href="https://discord.com/invite/zpA9S2DR7s">Discord</a> · <a href="https://github.com/cocoindex-io/cocoindex/tree/main/examples"><b>See all examples →</b></a>
</p>