Comparing Pydantic - Baml

Pydantic is a popular library for data validation in Python used by most -- if not all -- LLM frameworks, like instructor.

BAML also uses Pydantic. The BAML Rust compiler can generate Pydantic models from your .baml files. But that's not all the compiler does -- it also takes care of fixing common LLM parsing issues, supports more data types, handles retries, and reduces the amount of boilerplate code you have to write.

Let's dive into how Pydantic is used and its limitations.

Why working with LLMs requires more than just Pydantic

Pydantic can help you get structured output from an LLM easily at first glance:

python

class Resume(BaseModel):
    name: str
    skills: List[str]

def create_prompt(input_text: str) -> str:
    PROMPT_TEMPLATE = f"""Parse the following resume and return a structured representation of the data in the schema below.
Resume:
---
{input_text}
---

Schema:
{Resume.model_json_schema()['properties']}

Output JSON:
"""
    return PROMPT_TEMPLATE

def extract_resume(input_text: str) -> Union[Resume, None]:
    prompt = create_prompt(input_text)
    chat_completion = client.chat.completions.create(
        model="gpt-5", messages=[{"role": "system", "content": prompt}]
    )
    try:
        output = chat_completion.choices[0].message.content
        if output:
            return Resume.model_validate_json(output)
        return None
    except Exception as e:
        raise e

That's pretty good, but now we want to add an Education model to the Resume model. We add the following code:

diff

...
+class Education(BaseModel):
+    school: str
+    degree: str
+    year: int

class Resume(BaseModel):
    name: str
    skills: List[str]
+   education: List[Education]

def create_prompt(input_text: str) -> str:
    additional_models = ""
+    if "$defs" in Resume.model_json_schema():
+        additional_models += f"\nUse these other schema definitions as +well:\n{Resume.model_json_schema()['$defs']}"
    PROMPT_TEMPLATE = f"""Parse the following resume and return a structured representation of the data in the schema below.
Resume:
---
{input_text}
---

Schema:
{Resume.model_json_schema()['properties']}

+ {additional_models}

Output JSON:
""".strip()
    return PROMPT_TEMPLATE
...

A little ugly, but still readable... But managing all these prompt strings can make your codebase disorganized very quickly.

Then you realize the LLM sometimes outputs some text before giving you the json, like this:

diff

+ The output is:
{
  "name": "John Doe",
  ... // truncated for brevity
}

So you add a regex to address that that extracts everything in {}:

diff

def extract_resume(input_text: str) -> Union[Resume, None]:
    prompt = create_prompt(input_text)
    print(prompt)
    chat_completion = client.chat.completions.create(
        model="gpt-5", messages=[{"role": "system", "content": prompt}]
    )
    try:
        output = chat_completion.choices[0].message.content
        print(output)
        if output:
+            # Extract JSON block using regex
+            json_match = re.search(r"\{.*?\}", output, re.DOTALL)
+            if json_match:
+                json_output = json_match.group(0)
                return Resume.model_validate_json(output)
        return None
    except Exception as e:
        raise e

Next you realize you actually want an array of Resumes, but you can't really use List[Resume] because Pydantic and Python don't work this way, so you have to add another wrapper:

diff

+class ResumeArray(BaseModel):
+    resumes: List[Resume]

Now you need to change the rest of your code to handle different models. That's good longterm, but it is now more boilerplate you have to write, test and maintain.

Next, you notice the LLM sometimes outputs a single resume {...}, and sometimes an array [{...}]... You must now change your parser to handle both cases:

diff

+def extract_resume(input_text: str) -> Union[List[Resume], None]:
+    prompt = create_prompt(input_text) # Also requires changes
    chat_completion = client.chat.completions.create(
        model="gpt-5", messages=[{"role": "system", "content": prompt}]
    )
    try:
        output = chat_completion.choices[0].message.content
        if output:
            # Extract JSON block using regex
            json_match = re.search(r"\{.*?\}", output, re.DOTALL)
            if json_match:
                json_output = json_match.group(0)
                try:
+                  parsed = json.loads(json_output)
+                  if isinstance(parsed, list):
+                      return list(map(Resume.model_validate_json, parsed))
+                  else:
+                      return [ResumeArray(**parsed)]
        return None
    except Exception as e:
        raise e

You could retry the call against the LLM to fix the issue, but that will cost you precious seconds and tokens, so handling this corner case manually is the only solution.

A small tangent -- JSON schemas vs type definitions

Sidenote: At this point your prompt looks like this:

JSON Schema:
{'name': {'title': 'Name', 'type': 'string'}, 'skills': {'items': {'type': 'string'}, 'title': 'Skills', 'type': 'array'}, 'education': {'anyOf': [{'$ref': '#/$defs/Education'}, {'type': 'null'}]}}


Use these other JSON schema definitions as well:
{'Education': {'properties': {'degree': {'title': 'Degree', 'type': 'string'}, 'major': {'title': 'Major', 'type': 'string'}, 'school': {'title': 'School', 'type': 'string'}, 'year': {'title': 'Year', 'type': 'integer'}}, 'required': ['degree', 'major', 'school', 'year'], 'title': 'Education', 'type': 'object'}}

and sometimes even GPT-4 outputs incorrect stuff like this, even though it's technically correct JSON (OpenAI's "JSON mode" will still break you)

{
  "name": 
  {
    "title": "Name", 
    "type": "string", 
    "value": "John Doe"
  }, 
  "skills": 
  {
    "items": 
    {
      "type": "string", 
      "values": 
      [
        "Python", 
        "JavaScript", 
        "React"
      ]
    ... // truncated for brevity

(this is an actual result from GPT-4 before some more prompt engineering)

when all you really want is a prompt that looks like the one below -- with way less tokens (and less likelihood of confusion). :

diff

Parse the following resume and return a structured representation of the data in the schema below.
Resume:
---
John Doe
Python, Rust
University of California, Berkeley, B.S. in Computer Science, 2020
---

+JSON Schema:
+{
+  "name": string,
+  "skills": string[]
+  "education": {
+    "school": string,
+    "degree": string,
+    "year": integer
+  }[]
+}

Output JSON:

Ahh, much better. That's 80% less tokens with a simpler prompt, for the same results. (See also Microsoft's TypeChat which uses a similar schema format using typescript types)

But we digress, let's get back to the point. You can see how this can get out of hand quickly, and how Pydantic wasn't really made with LLMs in mind. We haven't gotten around to adding resilience like retries, or falling back to a different model in the event of an outage. There's still a lot of wrapper code to write.

Pydantic and Enums

There are other core limitations. Say you want to do a classification task using Pydantic. An Enum is a great fit for modelling this.

Assume this is our prompt:

text

Classify the company described in this text into the best
of the following categories:

Text:
---
{some_text}
---

Categories:
- Technology: Companies involved in the development and production of technology products or services
- Healthcare: Includes companies in pharmaceuticals, biotechnology, medical devices.
- Real estate: Includes real estate investment trusts (REITs) and companies involved in real estate development.

The best category is:

Since we have descriptions, we need to generate a custom enum we can use to build the prompt:

python

class FinancialCategory(Enum):
    technology = (
        "Technology",
        "Companies involved in the development and production of technology products or services.",
    )
    ...
    real_estate = (
        "Real Estate",
        "Includes real estate investment trusts (REITs) and companies involved in real estate development.",
    )

    def __init__(self, category, description):
        self._category = category
        self._description = description

    @property
    def category(self):
        return self._category

    @property
    def description(self):
        return self._description

We add a class method to load the right enum from the LLM output string:

python

    @classmethod
    def from_string(cls, category: str) -> "FinancialCategory":
        for c in cls:
            if c.category == category:
                return c
        raise ValueError(f"Invalid category: {category}")

Update the prompt to use the enum descriptions:

python

def print_categories_and_descriptions():
    for category in FinancialCategory:
        print(f"{category.category}: {category.description}")

def create_prompt(text: str) -> str:
    additional_models = ""
    print_categories_and_descriptions()
    PROMPT_TEMPLATE = f"""Classify the company described in this text into the best
of the following categories:

Text:
---
{text}
---

Categories:
{print_categories_and_descriptions()}

The best category is:
"""
    return PROMPT_TEMPLATE

And then we use it in our AI function:

python

def classify_company(text: str) -> FinancialCategory:
    prompt = create_prompt(text)
    chat_completion = client.chat.completions.create(
        model="gpt-5", messages=[{"role": "system", "content": prompt}]
    )
    try:
        output = chat_completion.choices[0].message.content
        if output:
            # Use our helper function!
            return FinancialCategory.from_string(output)
        return None
    except Exception as e:
        raise e

What gets hairy is if you want to change your types.

What if you want the LLM to return an object instead? You have to change your enum, your prompt, AND your parser.
What if you want to handle cases where the LLM outputs "Real Estate" or "real estate"?
What if you want to save the enum information in a database? str(category) will save FinancialCategory.healthcare into your DB, but your parser only recognizes "Healthcare", so you'll need more boilerplate if you ever want to programmatically analyze your data.

Alternatives

There are libraries like instructor do provide a great amount of boilerplate but you're still:

Using prompts that you cannot control. E.g. a commit may change your results underneath you.
Using more tokens than you may need to to declare schemas (higher costs and latencies)
There are no included testing capabilities.. Developers have to copy-paste JSON blobs everywhere, potentially between their IDEs and other websites. Existing LLM Playgrounds were not made with structured data in mind.
Lack of observability. No automatic tracing of requests.

Enter BAML

The Boundary toolkit helps you iterate seamlessly compared to Pydantic.

Here's all the BAML code you need to solve the Extract Resume problem from earlier (VSCode prompt preview is shown on the right):

<Note> Here we use a "GPT4" client, but you can use any model. See [client docs](/ref/llm-client-providers/open-ai) </Note> {/* ```baml

class Education { school string degree string year int }

class Resume { name string skills string[] education Education[] }

function ExtractResume(resume_text: string) -> Resume { client GPT4 prompt #" Parse the following resume and return a structured representation of the data in the schema below.

Resume:
---
{{ input.resume_text }}
---

Output in this JSON format:
{{ ctx.output_format }}

Output JSON:

"# }

*/}

The BAML compiler generates a python client that imports and calls the function:
```python
from baml_client import baml as b

async def main():
  resume = await b.ExtractResume(resume_text="""John Doe
Python, Rust
University of California, Berkeley, B.S. in Computer Science, 2020""")

  assert resume.name == "John Doe"

That's it! No need to write any more code. Since the compiler knows what your function signature is we literally generate a custom deserializer for your own unique usecase that just works.

Converting the Resume into an array of resumes requires a single line change in BAML (vs having to create array wrapper classes and parsing logic).

In this image we change the types and BAML automatically updates the prompt, parser, and the Python types you get back.

Adding retries or resilience requires just a couple of modifications. And best of all, you can test things instantly, without leaving your VSCode.

The bottom line

Pydantic is excellent for data validation, but LLM applications need more than validation - they need a complete structured extraction solution.

BAML's advantages over Pydantic:

No boilerplate - BAML generates all parsing, retry, and error handling code
Visual development - See prompts and test instantly in VSCode
Better prompts - Optimized schema format uses 80% fewer tokens
Schema-Aligned Parsing - Handles malformed JSON and edge cases automatically
Multi-model support - Works with any LLM provider, not just OpenAI
Type safety across languages - Generated clients for Python, TypeScript, Java, Go
Built-in resilience - Retries, fallbacks, and smart error recovery

What you get with BAML that Pydantic can't provide:

Instant testing - No API calls or token costs during development
Prompt optimization - See exactly what's sent and optimize token usage
Production features - Automatic retries, model fallbacks, streaming support
Better debugging - Know exactly why extraction failed
Future-proof - Never get locked into one model or provider

Why this matters for your team:

10x faster iteration - Test prompts instantly without running Python code
Better reliability - Handle edge cases and malformed outputs automatically
Cost optimization - Reduce token usage with optimized schema formats
Model flexibility - Switch between GPT, Claude, open-source models seamlessly

We built BAML because writing a Python library wasn't powerful enough to solve the real challenges of LLM structured extraction.

Conclusion

Get started today with Python, TypeScript, Go, Ruby or other languages.

Our mission is to make the best developer experience for AI engineers working with LLMs. Contact us at [email protected] or Join us on Discord to stay in touch with the community and influence the roadmap.