Structured Input for LLMs

It has been observed that most LLMs perfom better when prompted with XML-like content (you can see it in Anthropic's prompting guide, for instance).

We could refer to this kind of prompting as structured input, and LlamaIndex offers you the possibility of chatting with LLMs exactly through this technique - let's go through an example in this notebook!

1. Install Needed Dependencies

Make sure to have llama-index>=0.12.34 installed if you wish to follow this tutorial along without any problem😄

python

! pip install -q llama-index

python

! pip show llama-index | grep "Version"

2. Create a Prompt Template

In order to use the structured input, we need to create a prompt template that would have a Jinja expression (recognizable by the {{}}) with a specific filter (to_xml) that will turn inputs such as Pydantic BaseModel subclasses, dictionaries or JSON-like strings into XML representations.

python

from llama_index.core.prompts import RichPromptTemplate

template_str = "Please extract from the following XML code the contact details of the user:\n\n```xml\n{{ data | to_xml }}\n```\n\n"
prompt = RichPromptTemplate(template_str)

Let's now try to format the input as a string, using different objects as data.

python

# Using a BaseModel

from pydantic import BaseModel
from typing import Dict
from IPython.display import Markdown, display


class User(BaseModel):
    name: str
    surname: str
    age: int
    email: str
    phone: str
    social_accounts: Dict[str, str]


user = User(
    name="John",
    surname="Doe",
    age=30,
    email="[email protected]",
    phone="123-456-7890",
    social_accounts={"bluesky": "john.doe", "instagram": "johndoe1234"},
)

display(Markdown(prompt.format(data=user)))

python

# with a dictionary

user_dict = {
    "name": "John",
    "surname": "Doe",
    "age": 30,
    "email": "[email protected]",
    "phone": "123-456-7890",
    "social_accounts": {"bluesky": "john.doe", "instagram": "johndoe1234"},
}

display(Markdown(prompt.format(data=user_dict)))

python

# Using a JSON-like string

user_str = '{"name":"John","surname":"Doe","age":30,"email":"[email protected]","phone":"123-456-7890","social_accounts":{"bluesky":"john.doe","instagram":"johndoe1234"}}'

display(Markdown(prompt.format(data=user_str)))

3. Chat With an LLM

Now that we know how to produce structured input, let's employ it to chat with an LLM!

python

import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass()

python

from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4.1-mini")

response = await llm.achat(prompt.format_messages(data=user))

python

print(response.message.content)

4. Use Structured Input and Structured Output

Combining structured input and structured output might really help to boost the reliability of the outputs of your LLMs - so let's give it a go!

python

from pydantic import Field
from typing import Optional


class SocialAccounts(BaseModel):
    instagram: Optional[str] = Field(default=None)
    bluesky: Optional[str] = Field(default=None)
    x: Optional[str] = Field(default=None)
    mastodon: Optional[str] = Field(default=None)


class ContactDetails(BaseModel):
    email: str
    phone: str
    social_accounts: SocialAccounts

python

sllm = llm.as_structured_llm(ContactDetails)

python

structured_response = await sllm.achat(prompt.format_messages(data=user))

python

print(structured_response.raw.email)
print(structured_response.raw.phone)
print(structured_response.raw.social_accounts.instagram)
print(structured_response.raw.social_accounts.bluesky)