examples/patient_intake_extraction_baml/README.md
We appreciate a star ⭐ at CocoIndex Github if this is helpful.
This example shows how to use BAML to extract structured data from patient intake PDFs using CocoIndex v1. BAML provides type-safe structured data extraction with native PDF support.
baml_src/patient.baml) - Defines the data structure and extraction functionmain.py) - Wraps BAML in a custom function, processes files incrementally, and writes results to JSON filesInstall from the project's pyproject.toml:
pip install -e .
This is a required step that generates the Python client code from your BAML schema:
baml generate
This will create a baml_client/ directory with the generated Python code.
Create a .env file in the example directory:
echo "GEMINI_API_KEY=your_api_key_here" > .env
Replace your_api_key_here with your actual Gemini API key.
cocoindex update main.py
This will:
data/patient_forms/output_patients/After running, check the output_patients/ directory:
ls -la output_patients/
You should see JSON files such as:
Patient_Intake_Form_David_Artificial.jsonPatient_Intake_Form_Emily_Artificial.jsonPatient_Intake_Form_Joe_Artificial.jsonPatient_Intake_Form_Jane_Artificial.json