examples/patient_intake_extraction_dspy/README.md
We appreciate a star ⭐ at CocoIndex Github if this is helpful.
This example shows how to use DSPy with Gemini 2.5 Flash (vision model) to extract structured data from patient intake PDFs using CocoIndex v1.
main.py) - Defines the data structure using Pydantic for type safetymain.py) - Defines the extraction signature and module using DSPy's ChainOfThought with vision supportmain.py) - Wraps DSPy in a custom function, processes files incrementally, and writes results to JSON filesImage type with ChainOfThought for visual document understandingoutput_patients/Install from the project's pyproject.toml:
pip install -e .
This installs Pillow, which DSPy uses to process image bytes.
Create a .env file in the example directory:
echo "GEMINI_API_KEY=your_api_key_here" > .env
Replace your_api_key_here with your actual Gemini API key.
cocoindex update main.py
This will:
data/patient_forms/output_patients/After running, check the output_patients/ directory:
ls -la output_patients/
You should see JSON files such as:
Patient_Intake_Form_David_Artificial.jsonPatient_Intake_Form_Emily_Artificial.jsonPatient_Intake_Form_Joe_Artificial.jsonPatient_Intake_Form_Jane_Artificial.json