Structured Output from LLMs with Python: Reliable JSON Using Pydantic and Instructor (2026)
Getting an LLM to respond in plain prose is easy. Getting it to respond with a predictable, validated, machine-readable structure — reliably, across thousands of calls in production — is a different problem. This guide covers instructor, the library that makes structured LLM output practical, and shows you how to use it with OpenAI, Claude, and local Ollama models.
The Problem: LLMs Don't Always Return Valid JSON
When you ask an LLM for JSON, you often get:
- Markdown code blocks around the JSON (
```json ... ```) - Missing fields
- Wrong data types
- Hallucinated extra fields
- Broken JSON syntax
You can work around these one by one — strip the markdown fences, catch JSON parse errors, write validation logic — but the workaround code quickly becomes the majority of your codebase.
The instructor library solves this by using Pydantic models as schemas, patching the LLM client to enforce the schema, and automatically retrying on validation failures. You define the shape of the data you want, and instructor handles the rest.
Installation
pip install instructor pydantic anthropic openai
For local models via Ollama:
pip install instructor pydantic openai # Ollama uses the OpenAI-compatible API
Basic Usage with OpenAI
Define a Pydantic model describing the structure you want, patch the OpenAI client with instructor.from_openai(), then pass response_model=YourModel to the completion call.
import instructor
from openai import OpenAI
from pydantic import BaseModel
# Patch the OpenAI client
client = instructor.from_openai(OpenAI())
class UserInfo(BaseModel):
name: str
age: int
email: str
user = client.chat.completions.create(
model="gpt-4o",
response_model=UserInfo,
messages=[
{"role": "user", "content": "Extract: John Smith is 34 years old. His email is [email protected]"}
]
)
print(user.name) # "John Smith"
print(user.age) # 34
print(user.email) # "[email protected]"
The return value is a fully validated UserInfo instance, not a dict, not a JSON string. You get Python objects with correct types that your IDE understands and your type checker can verify.
With Claude (Anthropic)
instructor works with Claude through the from_anthropic() wrapper. The API is identical — just swap the client.
import instructor
import anthropic
from pydantic import BaseModel
client = instructor.from_anthropic(anthropic.Anthropic())
class ProductReview(BaseModel):
sentiment: str # "positive", "negative", "neutral"
score: int # 1-5
summary: str
review = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
response_model=ProductReview,
messages=[
{"role": "user", "content": "Review: 'Amazing product! Works perfectly, very fast shipping. 5 stars!'"}
]
)
print(review.sentiment) # "positive"
print(review.score) # 5
Claude tends to follow structured output instructions very accurately, which makes it a good choice for extraction tasks where precision matters.
With Local Ollama
Ollama exposes an OpenAI-compatible API, so you use the OpenAI client pointed at localhost. Set mode=instructor.Mode.JSON because Ollama's function-calling support varies by model, but JSON mode works reliably across all of them.
import instructor
from openai import OpenAI
from pydantic import BaseModel
# Ollama has an OpenAI-compatible API
client = instructor.from_openai(
OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama", # required but ignored
),
mode=instructor.Mode.JSON, # Ollama works best with JSON mode
)
class Article(BaseModel):
title: str
keywords: list[str]
summary: str
article = client.chat.completions.create(
model="llama3.2",
response_model=Article,
messages=[{"role": "user", "content": "Analyze this article about Python async..."}]
)
This lets you run structured extraction entirely locally, with no API costs and no data leaving your machine. Smaller models like llama3.2 handle simple extraction well; for complex nested schemas, use a larger model.
Nested Models
Pydantic models can reference other models, and instructor handles the nesting automatically.
from pydantic import BaseModel
from typing import Optional
class Address(BaseModel):
street: str
city: str
country: str
class Company(BaseModel):
name: str
founded: int
headquarters: Address
ceo: Optional[str] = None
company = client.chat.completions.create(
model="gpt-4o",
response_model=Company,
messages=[
{"role": "user", "content": "Extract: Anthropic was founded in 2021. They are based at 548 Market St, San Francisco, USA. Dario Amodei is the CEO."}
]
)
print(company.headquarters.city) # "San Francisco"
print(company.founded) # 2021
The LLM must produce a single JSON object matching the full schema, including all nested fields. Instructor validates the entire structure and retries if any part is missing or mistyped.
Validation with Pydantic
Pydantic validators run automatically before instructor considers the output valid. If a validator raises a ValueError, instructor catches it, sends the error back to the model with a message asking it to correct the output, and retries.
from pydantic import BaseModel, field_validator, Field
class Temperature(BaseModel):
celsius: float = Field(ge=-273.15, le=1000) # physical limits
@field_validator("celsius")
@classmethod
def not_absolute_zero(cls, v):
if v == -273.15:
raise ValueError("Cannot be absolute zero")
return v
# instructor will retry if validation fails
result = client.chat.completions.create(
model="gpt-4o",
response_model=Temperature,
max_retries=3, # retry up to 3 times on validation error
messages=[{"role": "user", "content": "What is normal body temperature?"}]
)
Field(ge=-273.15, le=1000) uses Pydantic's built-in numeric constraints. These trigger validation errors without any custom code. Add max_retries to give the model a few attempts to get it right.
Extract Lists of Objects
Wrap a list of models inside a container model to extract multiple records from a single piece of text.
from pydantic import BaseModel
class Person(BaseModel):
name: str
role: str
class Team(BaseModel):
people: list[Person]
team = client.chat.completions.create(
model="gpt-4o",
response_model=Team,
messages=[
{"role": "user", "content": "The team: Alice is the CTO, Bob is lead developer, Carol handles DevOps."}
]
)
for person in team.people:
print(f"{person.name}: {person.role}")
# Alice: CTO
# Bob: lead developer
# Carol: DevOps
This pattern is useful for extracting tables from documents, parsing structured data from free-form reports, or pulling multiple entities out of a block of text in one API call.
Streaming Partial Objects
For better UX in interactive applications, instructor can stream partial objects as the model generates them. Fields fill in progressively rather than waiting for the complete response.
from instructor import Partial
# Stream the model as it fills in
for partial_user in client.chat.completions.create_partial(
model="gpt-4o",
response_model=Partial[UserInfo],
messages=[{"role": "user", "content": "Extract: Jane Doe, 28, [email protected]"}]
):
print(partial_user) # prints as fields fill in
Each iteration gives you a partially-filled UserInfo instance. Fields that have not been generated yet are None. This is useful for showing progress in a UI while extraction is in flight.
Parallel Extraction
For batch processing, use the async client with asyncio.gather to run multiple extractions concurrently. This dramatically reduces total wall-clock time compared to sequential calls.
import asyncio
import instructor
from openai import AsyncOpenAI
from pydantic import BaseModel
class UserInfo(BaseModel):
name: str
age: int
email: str
async_client = instructor.from_openai(AsyncOpenAI())
async def extract(text: str) -> UserInfo:
return await async_client.chat.completions.create(
model="gpt-4o",
response_model=UserInfo,
messages=[{"role": "user", "content": text}]
)
async def main():
texts = ["John, 30, [email protected]", "Mary, 25, [email protected]", "Bob, 45, [email protected]"]
results = await asyncio.gather(*[extract(t) for t in texts])
for r in results:
print(r)
asyncio.run(main())
All three extractions run concurrently. The total time is roughly the time of the slowest single call rather than the sum of all three.
Using Enums for Constrained Fields
When a field should only take one of a fixed set of values, use a Python Enum. This makes the schema explicit and gives Pydantic a clear constraint to validate against.
from pydantic import BaseModel
from enum import Enum
class Sentiment(str, Enum):
positive = "positive"
negative = "negative"
neutral = "neutral"
class Priority(str, Enum):
low = "low"
medium = "medium"
high = "high"
critical = "critical"
class Ticket(BaseModel):
title: str
sentiment: Sentiment
priority: Priority
summary: str
ticket = client.chat.completions.create(
model="gpt-4o",
response_model=Ticket,
messages=[
{"role": "user", "content": "Ticket: 'Production database is down, all users affected, been down 20 minutes'"}
]
)
print(ticket.priority) # Priority.critical
print(ticket.sentiment) # Sentiment.negative
Using str, Enum (string enum) means the values serialize naturally to strings, which makes them easy to store and compare.
When to Use instructor vs Native JSON Mode
| instructor | Native JSON mode | |
|---|---|---|
| Schema validation | Full Pydantic | None |
| Auto-retry on failure | Yes | No |
| Nested models | Yes | Manual |
| Works with all providers | Yes | OpenAI only |
| Streaming partial | Yes | No |
| Setup complexity | Low | Minimal |
Use instructor for production pipelines where reliability matters. Native JSON mode (passing response_format={"type": "json_object"}) is fine for simple one-off scripts where you control the prompt carefully and can tolerate occasional failures.
The retry behavior alone is usually enough to justify instructor in production. Even a model that gets it right 95% of the time will fail thousands of times per day in a high-volume system — automatic retry turns those failures into invisible latency bumps instead of broken records.
Common Patterns
Document parsing: extract structured records from invoices, contracts, or reports by passing the document text as the user message and a detailed Pydantic model as response_model.
Classification pipelines: classify support tickets, emails, or feedback into categories using an Enum field. Combine with asyncio.gather for batch processing.
Data normalization: take inconsistently formatted input (dates in different formats, phone numbers with or without dashes) and let the model normalize it into your schema.
Multi-step extraction: extract a summary model first, then use its fields to drive follow-up extractions. Instructor composes naturally with standard Python control flow.