Structured Output from LLMs with Python: Reliable JSON Using Pydantic and Instructor (2026)

Getting an LLM to respond in plain prose is easy. Getting it to respond with a predictable, validated, machine-readable structure — reliably, across thousands of calls in production — is a different problem. This guide covers instructor, the library that makes structured LLM output practical, and shows you how to use it with OpenAI, Claude, and local Ollama models.

The Problem: LLMs Don't Always Return Valid JSON

When you ask an LLM for JSON, you often get:

Markdown code blocks around the JSON (```json ... ```)
Missing fields
Wrong data types
Hallucinated extra fields
Broken JSON syntax

You can work around these one by one — strip the markdown fences, catch JSON parse errors, write validation logic — but the workaround code quickly becomes the majority of your codebase.

The instructor library solves this by using Pydantic models as schemas, patching the LLM client to enforce the schema, and automatically retrying on validation failures. You define the shape of the data you want, and instructor handles the rest.

Installation

pip install instructor pydantic anthropic openai

For local models via Ollama:

pip install instructor pydantic openai  # Ollama uses the OpenAI-compatible API

Basic Usage with OpenAI

Define a Pydantic model describing the structure you want, patch the OpenAI client with instructor.from_openai(), then pass response_model=YourModel to the completion call.

import instructor
from openai import OpenAI
from pydantic import BaseModel

# Patch the OpenAI client
client = instructor.from_openai(OpenAI())

class UserInfo(BaseModel):
    name: str
    age: int
    email: str

user = client.chat.completions.create(
    model="gpt-4o",
    response_model=UserInfo,
    messages=[
        {"role": "user", "content": "Extract: John Smith is 34 years old. His email is [email protected]"}
    ]
)

print(user.name)   # "John Smith"
print(user.age)    # 34
print(user.email)  # "[email protected]"

The return value is a fully validated UserInfo instance, not a dict, not a JSON string. You get Python objects with correct types that your IDE understands and your type checker can verify.

With Claude (Anthropic)

instructor works with Claude through the from_anthropic() wrapper. The API is identical — just swap the client.

import instructor
import anthropic
from pydantic import BaseModel

client = instructor.from_anthropic(anthropic.Anthropic())

class ProductReview(BaseModel):
    sentiment: str  # "positive", "negative", "neutral"
    score: int      # 1-5
    summary: str

review = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1024,
    response_model=ProductReview,
    messages=[
        {"role": "user", "content": "Review: 'Amazing product! Works perfectly, very fast shipping. 5 stars!'"}
    ]
)

print(review.sentiment)  # "positive"
print(review.score)      # 5

Claude tends to follow structured output instructions very accurately, which makes it a good choice for extraction tasks where precision matters.

With Local Ollama

Ollama exposes an OpenAI-compatible API, so you use the OpenAI client pointed at localhost. Set mode=instructor.Mode.JSON because Ollama's function-calling support varies by model, but JSON mode works reliably across all of them.

import instructor
from openai import OpenAI
from pydantic import BaseModel

# Ollama has an OpenAI-compatible API
client = instructor.from_openai(
    OpenAI(
        base_url="http://localhost:11434/v1",
        api_key="ollama",  # required but ignored
    ),
    mode=instructor.Mode.JSON,  # Ollama works best with JSON mode
)

class Article(BaseModel):
    title: str
    keywords: list[str]
    summary: str

article = client.chat.completions.create(
    model="llama3.2",
    response_model=Article,
    messages=[{"role": "user", "content": "Analyze this article about Python async..."}]
)

This lets you run structured extraction entirely locally, with no API costs and no data leaving your machine. Smaller models like llama3.2 handle simple extraction well; for complex nested schemas, use a larger model.

Nested Models

Pydantic models can reference other models, and instructor handles the nesting automatically.

from pydantic import BaseModel
from typing import Optional

class Address(BaseModel):
    street: str
    city: str
    country: str

class Company(BaseModel):
    name: str
    founded: int
    headquarters: Address
    ceo: Optional[str] = None

company = client.chat.completions.create(
    model="gpt-4o",
    response_model=Company,
    messages=[
        {"role": "user", "content": "Extract: Anthropic was founded in 2021. They are based at 548 Market St, San Francisco, USA. Dario Amodei is the CEO."}
    ]
)

print(company.headquarters.city)  # "San Francisco"
print(company.founded)            # 2021

The LLM must produce a single JSON object matching the full schema, including all nested fields. Instructor validates the entire structure and retries if any part is missing or mistyped.

Validation with Pydantic

Pydantic validators run automatically before instructor considers the output valid. If a validator raises a ValueError, instructor catches it, sends the error back to the model with a message asking it to correct the output, and retries.

from pydantic import BaseModel, field_validator, Field

class Temperature(BaseModel):
    celsius: float = Field(ge=-273.15, le=1000)  # physical limits

    @field_validator("celsius")
    @classmethod
    def not_absolute_zero(cls, v):
        if v == -273.15:
            raise ValueError("Cannot be absolute zero")
        return v

# instructor will retry if validation fails
result = client.chat.completions.create(
    model="gpt-4o",
    response_model=Temperature,
    max_retries=3,  # retry up to 3 times on validation error
    messages=[{"role": "user", "content": "What is normal body temperature?"}]
)

Field(ge=-273.15, le=1000) uses Pydantic's built-in numeric constraints. These trigger validation errors without any custom code. Add max_retries to give the model a few attempts to get it right.

Extract Lists of Objects

Wrap a list of models inside a container model to extract multiple records from a single piece of text.

from pydantic import BaseModel

class Person(BaseModel):
    name: str
    role: str

class Team(BaseModel):
    people: list[Person]

team = client.chat.completions.create(
    model="gpt-4o",
    response_model=Team,
    messages=[
        {"role": "user", "content": "The team: Alice is the CTO, Bob is lead developer, Carol handles DevOps."}
    ]
)

for person in team.people:
    print(f"{person.name}: {person.role}")
# Alice: CTO
# Bob: lead developer
# Carol: DevOps

This pattern is useful for extracting tables from documents, parsing structured data from free-form reports, or pulling multiple entities out of a block of text in one API call.

Streaming Partial Objects

For better UX in interactive applications, instructor can stream partial objects as the model generates them. Fields fill in progressively rather than waiting for the complete response.

from instructor import Partial

# Stream the model as it fills in
for partial_user in client.chat.completions.create_partial(
    model="gpt-4o",
    response_model=Partial[UserInfo],
    messages=[{"role": "user", "content": "Extract: Jane Doe, 28, [email protected]"}]
):
    print(partial_user)  # prints as fields fill in

Each iteration gives you a partially-filled UserInfo instance. Fields that have not been generated yet are None. This is useful for showing progress in a UI while extraction is in flight.

Parallel Extraction

For batch processing, use the async client with asyncio.gather to run multiple extractions concurrently. This dramatically reduces total wall-clock time compared to sequential calls.

import asyncio
import instructor
from openai import AsyncOpenAI
from pydantic import BaseModel

class UserInfo(BaseModel):
    name: str
    age: int
    email: str

async_client = instructor.from_openai(AsyncOpenAI())

async def extract(text: str) -> UserInfo:
    return await async_client.chat.completions.create(
        model="gpt-4o",
        response_model=UserInfo,
        messages=[{"role": "user", "content": text}]
    )

async def main():
    texts = ["John, 30, [email protected]", "Mary, 25, [email protected]", "Bob, 45, [email protected]"]
    results = await asyncio.gather(*[extract(t) for t in texts])
    for r in results:
        print(r)

asyncio.run(main())

All three extractions run concurrently. The total time is roughly the time of the slowest single call rather than the sum of all three.

Using Enums for Constrained Fields

When a field should only take one of a fixed set of values, use a Python Enum. This makes the schema explicit and gives Pydantic a clear constraint to validate against.

from pydantic import BaseModel
from enum import Enum

class Sentiment(str, Enum):
    positive = "positive"
    negative = "negative"
    neutral = "neutral"

class Priority(str, Enum):
    low = "low"
    medium = "medium"
    high = "high"
    critical = "critical"

class Ticket(BaseModel):
    title: str
    sentiment: Sentiment
    priority: Priority
    summary: str

ticket = client.chat.completions.create(
    model="gpt-4o",
    response_model=Ticket,
    messages=[
        {"role": "user", "content": "Ticket: 'Production database is down, all users affected, been down 20 minutes'"}
    ]
)

print(ticket.priority)   # Priority.critical
print(ticket.sentiment)  # Sentiment.negative

Using str, Enum (string enum) means the values serialize naturally to strings, which makes them easy to store and compare.

When to Use instructor vs Native JSON Mode

	instructor	Native JSON mode
Schema validation	Full Pydantic	None
Auto-retry on failure	Yes	No
Nested models	Yes	Manual
Works with all providers	Yes	OpenAI only
Streaming partial	Yes	No
Setup complexity	Low	Minimal

Use instructor for production pipelines where reliability matters. Native JSON mode (passing response_format={"type": "json_object"}) is fine for simple one-off scripts where you control the prompt carefully and can tolerate occasional failures.

The retry behavior alone is usually enough to justify instructor in production. Even a model that gets it right 95% of the time will fail thousands of times per day in a high-volume system — automatic retry turns those failures into invisible latency bumps instead of broken records.

Common Patterns

Document parsing: extract structured records from invoices, contracts, or reports by passing the document text as the user message and a detailed Pydantic model as response_model.

Classification pipelines: classify support tickets, emails, or feedback into categories using an Enum field. Combine with asyncio.gather for batch processing.

Data normalization: take inconsistently formatted input (dates in different formats, phone numbers with or without dashes) and let the model normalize it into your schema.

Multi-step extraction: extract a summary model first, then use its fields to drive follow-up extractions. Instructor composes naturally with standard Python control flow.

Related Guides

Build an MCP Server in Python: Add Custom Tools to Claude