What is LiteLLM?
LiteLLM gives you a single unified API to call 100+ LLM providers — OpenAI, Anthropic Claude, Google Gemini, Ollama local models, Azure, and more — using the same Python interface.
Why use it? - Switch providers without rewriting code - Built-in fallbacks and retries - Cost tracking across providers - Production-ready proxy server
Installation
pip install litellm
Basic Usage: Same API for All Providers
from litellm import completion
# OpenAI
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
# Claude (same code, different model)
response = completion(
model="claude-opus-4-5",
messages=[{"role": "user", "content": "Hello!"}]
)
# Local Ollama (same code again!)
response = completion(
model="ollama/llama3.2",
messages=[{"role": "user", "content": "Hello!"}],
api_base="http://localhost:11434"
)
# All return the same response format
print(response.choices[0].message.content)
Environment Variables
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="..."
Streaming
from litellm import completion
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a haiku"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")
Fallbacks: Never Go Down
from litellm import completion
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
fallbacks=["claude-opus-4-5", "ollama/llama3.2"], # try these if gpt-4o fails
num_retries=3
)
Cost Tracking
from litellm import completion
import litellm
litellm.success_callback = ["langfuse"] # or use built-in
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
# Check cost
print(f"Cost: ${response._hidden_params['response_cost']:.6f}")
print(f"Tokens: {response.usage.total_tokens}")
Async Support
import asyncio
from litellm import acompletion
async def main():
tasks = [
acompletion(model="gpt-4o", messages=[{"role": "user", "content": f"Question {i}"}])
for i in range(5)
]
responses = await asyncio.gather(*tasks)
for r in responses:
print(r.choices[0].message.content)
asyncio.run(main())
LiteLLM Proxy Server
The proxy lets you expose a unified OpenAI-compatible endpoint that any tool can use:
# config.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-opus
litellm_params:
model: claude-opus-4-5
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: llama-local
litellm_params:
model: ollama/llama3.2
api_base: http://localhost:11434
general_settings:
master_key: sk-my-secret-key # protect the proxy
Start the proxy:
litellm --config config.yaml --port 4000
Now any OpenAI-compatible client works:
from openai import OpenAI
client = OpenAI(
api_key="sk-my-secret-key",
base_url="http://localhost:4000"
)
# Route to Claude using the proxy
response = client.chat.completions.create(
model="claude-opus", # mapped to claude-opus-4-5
messages=[{"role": "user", "content": "Hello!"}]
)
Load Balancing
model_list:
- model_name: gpt-4o-lb
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY_1
- model_name: gpt-4o-lb
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY_2 # second account for load balancing
Docker Deployment
# docker-compose.yml
services:
litellm:
image: ghcr.io/berriai/litellm:main-latest
ports:
- "4000:4000"
volumes:
- ./config.yaml:/app/config.yaml
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
command: ["--config", "/app/config.yaml", "--port", "4000"]
When to Use LiteLLM
- Multiple providers: switching between OpenAI and Claude based on cost or availability
- Teams: centralize API keys in the proxy, give teams virtual keys
- Cost control: set budgets per team or per virtual key
- Evaluation: test the same prompt across multiple models