Context Engineering 2026: AGENTS.md, CLAUDE.md, and .cursorrules That Actually Work

Andrej Karpathy called it the essential AI coding skill of 2026: context engineering. Not prompt engineering — you do not type clever things into a chat box. Context engineering is the discipline of building persistent, structured context that AI agents carry throughout an entire session, so they make fewer wrong assumptions and more correct decisions without constant hand-holding.

This tutorial shows you how to write AGENTS.md, CLAUDE.md, and .cursorrules files that actually improve outcomes, with real copy-paste-ready examples for Python FastAPI projects.

1. What Context Engineering Is

When you open a new chat with an AI assistant, the model knows nothing about your project. It does not know that you use PostgreSQL not SQLite, that you never catch bare exceptions, or that your test suite must pass before any commit. It will invent plausible answers that fit a generic "Python project" and roughly half of them will be wrong for your specific codebase.

Context engineering is the practice of writing structured files that solve this cold-start problem once and for all. These files sit in your repository and are automatically read by the agent at the start of every session:

AGENTS.md — the universal standard, read by OpenAI Codex, Claude Code, and Gemini CLI
CLAUDE.md — Claude Code-specific instructions for extended thinking, tool permissions, and memory
.cursorrules — Cursor AI's rules file for language and framework preferences

The key difference from prompt engineering: you write these once, commit them, and every agent session — yours and your teammates' — starts with the same correct understanding of the project.

2. Why It Matters

A 2025 study on agentic coding tasks found that agents operating without project context files completed tasks correctly roughly 30% of the time. The same tasks, with well-crafted context files in place, reached 90% success. The gap is not model capability — it is information availability.

The reason is straightforward. Agents make dozens of small decisions per task: which file to edit, which function to call, what naming convention to follow, whether to write a test. Each decision has a prior drawn from training data — which means "generic open-source Python." Your project is not generic. Every time the agent guesses wrong on one of these micro-decisions, you spend time reviewing, correcting, and re-running. Context files replace bad generic priors with good project-specific ones.

3. AGENTS.md: The Universal Standard

AGENTS.md is the file that every major agent runtime now reads first. Place it at the repository root. Keep it under 500 lines — agents work best with dense, scannable information, not prose.

What to include

Project overview — one paragraph: what it does, who uses it, what stage it is at
Tech stack — language version, framework, database, key libraries
Commands — exact build, test, lint, and run commands
Code style — formatter, linter, import order, naming conventions
Architecture notes — directory layout, module boundaries, data flow

Real AGENTS.md example (Python FastAPI project)

# AGENTS.md

## Project
Inventory API — REST service for warehouse stock management.
Python 3.12, FastAPI 0.111, PostgreSQL 16, Redis 7.
Production on AWS ECS. ~15k requests/day. Internal tooling, no public users.

## Tech Stack
- Runtime: Python 3.12
- Web: FastAPI + uvicorn
- ORM: SQLAlchemy 2.x (async) with alembic migrations
- DB: PostgreSQL 16 (never SQLite, not even in tests)
- Cache: Redis 7 via redis-py async client
- Auth: JWT (python-jose), tokens expire in 1h
- Validation: Pydantic v2 models only

## Commands
```bash
# Install
uv sync --all-extras

# Run (dev)
uvicorn app.main:app --reload --port 8000

# Test (always run the full suite, never a single file)
pytest -x --tb=short

# Lint + format (must pass before commit)
ruff check . --fix
ruff format .

# DB migrations
alembic upgrade head
alembic revision --autogenerate -m "description"

Code Style

Formatter: ruff (line length 100)
No bare except: — always catch specific exceptions
All endpoints must have response_model and status_code defined
Use async def for all route handlers and service methods
Type hints required on all function signatures
Never import from app.main — causes circular imports

Architecture

app/
  main.py          # FastAPI app factory, lifespan hooks
  api/             # Route handlers only — no business logic here
  services/        # Business logic, one file per domain
  models/          # SQLAlchemy ORM models
  schemas/         # Pydantic request/response schemas
  core/            # Config, database session, auth utilities
  tests/           # Mirror app/ structure, use pytest fixtures in conftest.py

Data flows: request → api/ handler → service/ method → models/ query → response schema. Never query the database directly from api/ handlers.

Key Decisions

We chose async SQLAlchemy because the service is I/O-bound, not CPU-bound.
Redis is used for idempotency keys only, not general caching.
All timestamps stored as UTC, converted to local on the frontend.


---

## 4. CLAUDE.md: Claude Code-Specific Instructions

Claude Code reads `CLAUDE.md` on top of `AGENTS.md`. Use it for Claude-specific behaviour: extended thinking triggers, tool-use permissions, memory hints, and explicit do/don't rules.

### Real CLAUDE.md example

```markdown
# CLAUDE.md

## Claude Code Configuration

### Thinking
Use extended thinking (budget_tokens >= 8000) for:
- Designing new service methods that touch multiple tables
- Debugging async race conditions
- Reviewing alembic migrations before applying

Do NOT use extended thinking for:
- Adding simple CRUD endpoints
- Fixing type errors
- Writing docstrings

### Tools
- You MAY run `pytest`, `ruff`, `alembic`, and `uvicorn` without asking.
- You MUST ask before running `alembic downgrade` or any destructive DB command.
- You MUST ask before editing files in `app/core/config.py` — env vars are prod-sensitive.
- Never run `git push` directly.

### Memory Hints
- The `get_db()` dependency yields an async session — always `await` db calls.
- `app/tests/conftest.py` has a `db_session` fixture that rolls back after each test.
- The `ItemService.bulk_update()` method has a known N+1 issue — do not call it in loops.

### Do / Don't

DO:
- Add a test for every new endpoint in `app/tests/api/`
- Use `httpx.AsyncClient` for endpoint tests (see existing tests for the pattern)
- Keep service methods under 50 lines — split if longer
- Use `logger = logging.getLogger(__name__)` at the top of each module

DON'T:
- Don't add new dependencies without asking — pyproject.toml changes need team review
- Don't use `print()` — use the logger
- Don't hard-code environment values — use `app/core/config.py` settings
- Don't create new Pydantic models in api/ files — put them in schemas/

5. .cursorrules: Cursor AI Rules

.cursorrules sits at the repository root and configures Cursor's AI behaviour for all users of the project. Keep it focused on language, framework, testing, and style preferences.

Real .cursorrules example

# .cursorrules

language: python
python_version: "3.12"

framework: fastapi
async: true

style:
  formatter: ruff
  line_length: 100
  imports: isort-compatible (ruff handles this)
  type_hints: required
  docstrings: Google style, only on public functions

testing:
  framework: pytest
  async_mode: asyncio (pytest-asyncio)
  fixtures: prefer conftest.py fixtures over inline setup
  coverage_target: 80%
  never_mock: database (use test DB with rollback fixture instead)

rules:
  - Prefer composition over inheritance
  - One responsibility per function — if it needs a comment to explain what it does, split it
  - Raise HTTPException in api/ handlers, raise domain exceptions in services/
  - All list endpoints must support pagination (limit/offset)
  - Return 404 not 500 when a resource is not found

6. The 8 Context Engineering Patterns

Regardless of which file you are writing, these eight patterns consistently produce better agent behaviour.

1. Direct answer first (inverted pyramid). Put the most actionable information at the top. Agents scan; they do not read linearly. Put commands before explanations.

2. Explicit constraints. Say "never use X, always use Y" rather than implying it. Never query the database directly from api/ handlers is clearer than follow layered architecture.

3. Examples of desired output. Show a short example of a correctly-written endpoint, test, or schema. Agents replicate patterns they have seen.

4. Architecture decisions with reasoning. Write why, not just what. We chose async SQLAlchemy because the service is I/O-bound prevents the agent from suggesting a sync rewrite that would technically work but miss the point.

5. Anti-patterns to avoid. Name the specific mistakes you have seen or want to prevent. Don't over-abstract — a helper function used once is not a helper gives the agent a concrete rule.

6. Command cheatsheet section. Exact, copy-paste-ready shell commands. Include flags. Agents that know the exact test command do not guess or omit --tb=short.

7. Testing requirements section. State what kind of tests are expected, where they live, and what fixtures to use. Ambiguity here causes agents to skip tests or write them in the wrong style.

8. Deployment/environment notes. Note which environment variables exist and where they are set. This prevents agents from hard-coding values or asking questions they could answer from the file.

7. Measuring Improvement

You do not need a formal study to measure whether your context files are working. Track these four signals over two weeks before and after adding the files:

Metric	Before	After (typical)
Tasks completed without correction	~30%	~80–90%
Re-prompts needed per task	3–5	0–1
Wrong files edited per session	2–4	0
Time to review agent output (min)	10–20	2–5

The most reliable leading indicator is re-prompts per task — if you find yourself saying "no, I told you not to use print()" or "you need to write a test for that", the instruction belongs in a context file.

8. FAQ

Does AGENTS.md replace CLAUDE.md? No. AGENTS.md is the baseline read by all agents. CLAUDE.md adds Claude Code-specific behaviour on top. Keep them separate so the universal rules stay portable.

How long should these files be? AGENTS.md: 200–400 lines. CLAUDE.md: 100–200 lines. .cursorrules: 50–100 lines. Beyond these lengths, agents start losing signal in noise. If you need more, split into focused sub-files and link them.

Do I need all three files? Only if you use all three tools. Start with AGENTS.md — it has the broadest coverage. Add CLAUDE.md if your team uses Claude Code daily, and .cursorrules if you use Cursor.

Should these files be committed to the repository? Yes. Commit them to the repository root. They are project documentation as much as they are agent configuration. They help human reviewers understand project conventions too.

What about monorepos? Place a root-level AGENTS.md for global conventions, and a second AGENTS.md in each service directory for service-specific rules. Claude Code and Gemini CLI both walk up the directory tree and merge what they find.

My agent still ignores the rules. What now? Check if the rule is buried in a paragraph. Extract it to a bullet in a clearly labelled ## Do / Don't or ## Constraints section. Agents weight headings and bullets more heavily than inline prose.

Context engineering is not a clever trick — it is good software practice applied to AI agents. The developers getting 90% task success rates in 2026 are not using better models; they are giving their agents better information to work with. Write the files once, keep them current as the project evolves, and watch the re-prompt rate drop.