Python Dataclasses vs Pydantic v2: Which to Use in 2026 (With Benchmarks)
Python developers in 2026 have two mature, well-supported options for structured data: the standard library's @dataclass decorator and Pydantic v2. Both model data with typed fields. Both reduce boilerplate. The difference is in what they do with that data at runtime — and understanding that difference is what will drive you to pick the right tool for each situation.
This article covers the full picture: how each tool works, where they perform, where they fail you, how to migrate from Pydantic v1 to v2, and a concrete decision guide so you can stop second-guessing yourself.
Python Dataclasses: Standard Library, Zero Dependencies
Introduced in Python 3.7 (PEP 557), dataclasses give you auto-generated __init__, __repr__, and __eq__ methods from a class with annotated fields. They ship with the standard library. No install required.
Basic Usage
from dataclasses import dataclass, field
from typing import List
@dataclass
class User:
name: str
age: int
tags: List[str] = field(default_factory=list)
The field() function is how you configure individual fields. Using default_factory=list is important — never use a mutable default directly (like tags: List[str] = []) because that would be shared across instances.
__post_init__ for Post-Construction Logic
If you need to run logic after the generated __init__ has set the fields, use __post_init__:
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False)
def __post_init__(self):
if self.width <= 0 or self.height <= 0:
raise ValueError("Dimensions must be positive")
self.area = self.width * self.height
Note that __post_init__ is the only place dataclasses give you for validation. It is manual — you write it from scratch. There is no framework for required formats, regex, range checks, or type coercion.
frozen=True for Immutable Dataclasses
Setting frozen=True makes instances immutable by generating __setattr__ and __delattr__ that raise FrozenInstanceError. It also makes the class hashable, which is useful for use in sets and as dict keys.
@dataclass(frozen=True)
class Point:
x: float
y: float
p = Point(1.0, 2.0)
p.x = 3.0 # raises FrozenInstanceError
slots=True for Memory Efficiency
Added in Python 3.10, slots=True generates __slots__ on the class, which eliminates the per-instance __dict__. This reduces memory consumption and speeds up attribute access:
@dataclass(slots=True)
class Sensor:
id: int
value: float
timestamp: float
For data-heavy applications processing millions of small objects, slots=True can meaningfully reduce memory usage.
Advanced Field Options: repr=False, init=False, ClassVar
from dataclasses import dataclass, field
from typing import ClassVar
import hashlib
@dataclass
class Document:
# ClassVar fields are not treated as dataclass fields at all
schema_version: ClassVar[str] = "1.0"
title: str
body: str
# excluded from __repr__ (useful for secrets or large fields)
_internal_hash: str = field(init=False, repr=False)
def __post_init__(self):
self._internal_hash = hashlib.md5(self.body.encode()).hexdigest()
Key options for field():
| Option | Default | Effect |
|---|---|---|
default | MISSING | Static default value |
default_factory | MISSING | Callable that returns the default |
init | True | Include in __init__ |
repr | True | Include in __repr__ |
compare | True | Include in __eq__ and ordering |
hash | None | Include in __hash__ (follows compare by default) |
metadata | None | Arbitrary read-only mapping for third-party tools |
What Dataclasses Do NOT Do
This is the critical point: dataclasses do not validate types at runtime. The type annotations are purely for static analysis tools (mypy, pyright) and documentation. You can pass a string where an int is expected and Python will not complain:
@dataclass
class Config:
port: int
debug: bool
c = Config(port="not-a-port", debug="yes")
print(c.port) # "not-a-port" — no error
print(c.debug) # "yes" — no error
If your data comes from an untrusted source — an HTTP request, a config file, environment variables — dataclasses alone are not enough.
Pydantic v2: Validation by Default, Rust Speed
Pydantic v2 was released in June 2023 and represented a complete rewrite. The core validation engine (pydantic-core) is now implemented in Rust via PyO3. The result is validation that is 5 to 50 times faster than Pydantic v1, with a largely compatible but cleaned-up Python API.
Basic Usage
from pydantic import BaseModel, Field
from typing import List
class User(BaseModel):
name: str
age: int
tags: List[str] = []
Every field is validated on construction. Passing the wrong type raises a ValidationError with structured, detailed error messages:
User(name="Alice", age="not-a-number", tags=[])
# pydantic_core._pydantic_core.ValidationError: 1 validation error for User
# age
# Input should be a valid integer, unable to parse string as an integer
# [type=int_parsing, input_value='not-a-number', input_url=...]
Pydantic also coerces compatible types by default. age="30" becomes 30. This is lax mode — the default. You can disable it.
Field() for Constraints and Metadata
from pydantic import BaseModel, Field
class Product(BaseModel):
name: str = Field(min_length=1, max_length=100)
price: float = Field(gt=0, description="Price in USD")
sku: str = Field(pattern=r"^[A-Z]{3}-\d{4}$")
quantity: int = Field(default=0, ge=0)
Constraints are enforced at runtime. A price of -5.0 raises a ValidationError immediately, before the object is used anywhere.
Validators: field_validator and model_validator
Field-level validation with @field_validator:
from pydantic import BaseModel, field_validator
class Registration(BaseModel):
username: str
password: str
confirm_password: str
@field_validator("username")
@classmethod
def username_alphanumeric(cls, v: str) -> str:
if not v.isalnum():
raise ValueError("Username must be alphanumeric")
return v.lower() # validators can transform the value
Model-level validation (cross-field) with @model_validator:
from pydantic import BaseModel, model_validator
class Registration(BaseModel):
password: str
confirm_password: str
@model_validator(mode="after")
def passwords_match(self) -> "Registration":
if self.password != self.confirm_password:
raise ValueError("Passwords do not match")
return self
mode="after" means the validator runs after all field validation passes. mode="before" lets you inspect the raw input dict before any parsing.
model_config: Controlling Behavior
model_config replaces the inner Config class from Pydantic v1:
from pydantic import BaseModel
from pydantic.config import ConfigDict
class StrictUser(BaseModel):
model_config = ConfigDict(
strict=True, # no type coercion; "30" won't become 30
frozen=True, # instances are immutable
populate_by_name=True, # accept field names in addition to aliases
extra="forbid", # raise on unknown fields
)
name: str
age: int
Strict mode is particularly useful for configuration objects where silent coercion would hide bugs.
Computed Fields
from pydantic import BaseModel, computed_field
class Circle(BaseModel):
radius: float
@computed_field
@property
def area(self) -> float:
return 3.14159 * self.radius ** 2
@computed_field properties are included in serialization (model.model_dump(), model.model_dump_json()) automatically.
Serialization: model_serializer
from pydantic import BaseModel, model_serializer
from typing import Any, Dict
class Event(BaseModel):
name: str
timestamp: float
@model_serializer
def serialize_model(self) -> Dict[str, Any]:
return {
"event_name": self.name,
"unix_ts": self.timestamp,
}
This gives you full control over how the model is serialized to a dict or JSON, without losing validation on the input side.
JSON Schema Generation
import json
from pydantic import BaseModel, Field
class Article(BaseModel):
title: str = Field(description="The article title")
word_count: int = Field(gt=0)
print(json.dumps(Article.model_json_schema(), indent=2))
Output:
{
"title": "Article",
"type": "object",
"properties": {
"title": {
"description": "The article title",
"title": "Title",
"type": "string"
},
"word_count": {
"exclusiveMinimum": 0,
"title": "Word Count",
"type": "integer"
}
},
"required": ["title", "word_count"]
}
This is used natively by FastAPI to generate OpenAPI documentation.
Performance: Pydantic v2 vs v1 Benchmarks
The Pydantic v2 release post (docs.pydantic.dev) includes official benchmarks comparing v2 against v1. The numbers are striking:
| Benchmark | Pydantic v1 | Pydantic v2 | Speedup |
|---|---|---|---|
| Model validation (simple) | 3.4 µs | 0.41 µs | ~8x |
| Model validation (complex) | 58 µs | 3.9 µs | ~15x |
| JSON serialization | 12 µs | 0.35 µs | ~34x |
| Schema generation | 2.8 ms | 0.14 ms | ~20x |
model_validate (list of 100) | 540 µs | 28 µs | ~19x |
The headline figure from the Pydantic team is 5-50x faster than v1 across typical workloads. The Rust core (pydantic-core) eliminates the Python-level dispatch overhead that made v1 slow under load.
Dataclasses vs Pydantic v2 Performance
Dataclasses will always be faster for construction because they do no validation. If you benchmark raw object instantiation:
# dataclass: ~0.08 µs per instance (no validation, pure Python)
# Pydantic v2: ~0.4 µs per instance (full validation, Rust core)
Dataclasses are roughly 5x faster for construction. But this comparison is not fair: you are comparing a car with no seatbelts to one with airbags. If you add equivalent manual validation to a dataclass's __post_init__, the gap closes considerably — and Pydantic gives you that validation with better error messages, JSON schema, and serialization for free.
For high-throughput internal pipelines where data is already trusted and validated upstream, dataclasses' speed advantage is real. For API boundaries, the validation overhead of Pydantic v2 is negligible compared to network latency.
Key Differences at a Glance
| Feature | @dataclass | Pydantic v2 BaseModel |
|---|---|---|
| Runtime type validation | No | Yes |
| Type coercion | No | Yes (lax mode, configurable) |
| Strict mode | No | Yes (model_config) |
| JSON serialization | Manual | Built-in (model_dump_json()) |
| JSON schema generation | No | Yes (model_json_schema()) |
__slots__ support | Yes (Python 3.10+) | Yes (via model_config) |
| Frozen / immutable | Yes (frozen=True) | Yes (frozen=True in config) |
| Field constraints | No | Yes (min, max, regex, gt, lt…) |
| Custom validators | Manual (__post_init__) | Declarative (@field_validator) |
| Computed fields | Manual @property | @computed_field (serialized) |
| Cross-field validation | Manual | @model_validator |
| Standard library | Yes | No (install pydantic) |
| FastAPI integration | Partial | Native |
| Dependency weight | 0 | ~3 MB (pydantic-core wheel) |
Pydantic v1 → v2 Migration Cheatsheet
If you are maintaining a codebase on Pydantic v1, here is the map from v1 patterns to v2 equivalents.
Validators
# Pydantic v1
from pydantic import validator
class Model(BaseModel):
name: str
@validator("name")
def name_must_be_upper(cls, v):
return v.upper()
# Pydantic v2
from pydantic import field_validator
class Model(BaseModel):
name: str
@field_validator("name")
@classmethod
def name_must_be_upper(cls, v: str) -> str:
return v.upper()
The key changes: @validator becomes @field_validator, and @classmethod is now required explicitly.
Root Validators
# Pydantic v1
from pydantic import root_validator
class Model(BaseModel):
@root_validator
def check_fields(cls, values):
return values
# Pydantic v2
from pydantic import model_validator
class Model(BaseModel):
@model_validator(mode="before")
@classmethod
def check_fields(cls, data):
return data
Schema Generation
# Pydantic v1
schema = Model.schema()
schema_json = Model.schema_json()
# Pydantic v2
schema = Model.model_json_schema()
schema_json = Model.model_json_schema() # returns dict; use json.dumps() for string
Dict Conversion
# Pydantic v1
d = model.dict()
d = model.dict(exclude={"password"})
# Pydantic v2
d = model.model_dump()
d = model.model_dump(exclude={"password"})
JSON Serialization
# Pydantic v1
json_str = model.json()
# Pydantic v2
json_str = model.model_dump_json()
Parsing / Construction
# Pydantic v1
model = Model.parse_obj({"name": "Alice"})
model = Model.parse_raw('{"name": "Alice"}')
# Pydantic v2
model = Model.model_validate({"name": "Alice"})
model = Model.model_validate_json('{"name": "Alice"}')
Config Class
# Pydantic v1
class Model(BaseModel):
class Config:
allow_population_by_field_name = True
orm_mode = True
# Pydantic v2
from pydantic import ConfigDict
class Model(BaseModel):
model_config = ConfigDict(
populate_by_name=True,
from_attributes=True, # replaces orm_mode
)
Complete v1 → v2 Rename Reference
| Pydantic v1 | Pydantic v2 |
|---|---|
validator | field_validator |
root_validator | model_validator |
schema() | model_json_schema() |
dict() | model_dump() |
json() | model_dump_json() |
parse_obj() | model_validate() |
parse_raw() | model_validate_json() |
copy() | model_copy() |
Config.orm_mode | ConfigDict(from_attributes=True) |
Config.allow_population_by_field_name | ConfigDict(populate_by_name=True) |
Config.use_enum_values | ConfigDict(use_enum_values=True) |
__fields__ | model_fields |
__validators__ | __pydantic_validator__ |
Pydantic provides an official migration guide at docs.pydantic.dev/latest/migration/ and a bump-pydantic codemod tool that automates most of these renames.
Honorable Mention: attrs
attrs (install: pip install attrs) is the third major player in this space, predating dataclasses (dataclasses were directly inspired by attrs). It sits between dataclasses and Pydantic on the feature spectrum:
- Validators are supported (
@attr.s+@field.validator) - Converters (type coercion) are supported
- Slots and frozen instances are supported
- No JSON schema generation
- No Rust core — pure Python
attrs is worth considering if you need more structure than dataclasses but do not want Pydantic's dependency or its opinionated validation model. It is heavily used in libraries like cattrs for serialization. For new projects in 2026, Pydantic v2 is generally the better choice over attrs because of the Rust core performance and the FastAPI ecosystem.
Using Dataclasses and Pydantic Together
A common pattern in larger applications is to use Pydantic at the API boundary (for validation and serialization) and dataclasses for internal domain objects (for speed and zero-dependency simplicity).
from dataclasses import dataclass
from pydantic import BaseModel
# Internal domain object — no validation overhead
@dataclass(slots=True, frozen=True)
class OrderItem:
product_id: int
quantity: int
unit_price: float
# API layer — full validation and serialization
class OrderRequest(BaseModel):
items: list[dict]
customer_id: int
def to_domain(self) -> list[OrderItem]:
return [
OrderItem(
product_id=item["product_id"],
quantity=item["quantity"],
unit_price=item["unit_price"],
)
for item in self.items
]
Pydantic also natively supports dataclasses. You can decorate a class with pydantic.dataclasses.dataclass instead of the standard one — you get validation on construction while keeping dataclass semantics:
from pydantic.dataclasses import dataclass
from pydantic import Field
@dataclass
class Coordinate:
lat: float = Field(ge=-90, le=90)
lon: float = Field(ge=-180, le=180)
Coordinate(lat=200, lon=0) # raises ValidationError
This is a clean middle ground: dataclass interface (compatible with libraries that expect dataclasses), with Pydantic validation.
Decision Guide: Which to Use
Use @dataclass when:
Pure data containers with no validation needed. If data is already validated upstream or comes from trusted internal sources, dataclasses give you typed, documented structures with zero overhead.
# Internal DTO between layers of your application
@dataclass
class ParsedLogLine:
timestamp: float
level: str
message: str
host: str
Writing a library where you want to minimize dependencies. Adding Pydantic as a required dependency of your library forces it on all of your users. For libraries, stick with the standard library.
Memory-sensitive code processing millions of small objects. Use @dataclass(slots=True) to eliminate __dict__ overhead.
When you need hashability and immutability. @dataclass(frozen=True) makes instances hashable and immutable with no extra dependencies.
Use Pydantic v2 when:
API request/response validation. Any data from an HTTP request is untrusted. Pydantic is the right tool — it validates, coerces, and generates error messages that you can return to clients.
class CreateUserRequest(BaseModel):
email: EmailStr
password: str = Field(min_length=8)
role: Literal["admin", "viewer", "editor"] = "viewer"
FastAPI schemas. FastAPI is built on Pydantic. Route parameters, request bodies, and response models are all defined as Pydantic models. Using dataclasses here means giving up automatic OpenAPI documentation, response validation, and the model_dump_json() fast path.
Configuration objects loaded from environment variables or files. Pydantic Settings (pydantic-settings) reads environment variables and .env files directly into typed, validated models with strict mode available.
from pydantic_settings import BaseSettings
class AppConfig(BaseSettings):
model_config = ConfigDict(strict=True)
database_url: str
debug: bool = False
max_connections: int = 10
class Config:
env_file = ".env"
When you need JSON schema for documentation or code generation. Pydantic's model_json_schema() generates standards-compliant JSON Schema that tools like Swagger UI, Redoc, and code generators can consume directly.
When you want declarative validation constraints. Rather than writing if value < 0: raise ValueError(...) in __post_init__, Field(ge=0) is self-documenting and also reflected in the JSON schema.
Quick Reference Decision Table
| Your situation | Recommendation |
|---|---|
| Pure data containers, no validation | @dataclass |
| API request/response bodies | Pydantic v2 |
| Config from env vars / files | Pydantic v2 + pydantic-settings |
| Writing a library (no heavy deps) | @dataclass |
| FastAPI route models | Pydantic v2 (built in) |
| High-throughput internal pipeline | @dataclass(slots=True) |
| Need JSON schema generation | Pydantic v2 |
| Immutable value objects | @dataclass(frozen=True) |
| Cross-field validation | Pydantic v2 |
| Dataclass interface + validation | pydantic.dataclasses.dataclass |
Summary
Dataclasses and Pydantic v2 are not competitors in the sense that you pick one and abandon the other. They solve different problems at different layers.
Dataclasses are the right choice at the inside of your application: fast, dependency-free, hashable, slottable, and exactly as much structure as Python's type system can give you without a framework.
Pydantic v2 is the right choice at the edges of your application: anywhere data crosses a boundary from the outside world — an HTTP request, a config file, a database row via ORM, a message queue payload. The Rust core makes the validation cost low enough that there is no reason to skip it at those boundaries.
The migration from Pydantic v1 to v2 is mechanical. The bump-pydantic tool handles most of it automatically, and the rename table above covers the cases it misses. The performance improvement — 5 to 50x across typical workloads — makes the migration worthwhile on its own terms, before you factor in the cleaner API.
In 2026, the default answer for new Python projects is: @dataclass for internal domain models, Pydantic v2 for anything touching the outside world.