What is LiteLLM?

LiteLLM gives you a single unified API to call 100+ LLM providers — OpenAI, Anthropic Claude, Google Gemini, Ollama local models, Azure, and more — using the same Python interface.

Why use it? - Switch providers without rewriting code - Built-in fallbacks and retries - Cost tracking across providers - Production-ready proxy server

Installation

pip install litellm

Basic Usage: Same API for All Providers

from litellm import completion

# OpenAI
response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Claude (same code, different model)
response = completion(
    model="claude-opus-4-5",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Local Ollama (same code again!)
response = completion(
    model="ollama/llama3.2",
    messages=[{"role": "user", "content": "Hello!"}],
    api_base="http://localhost:11434"
)

# All return the same response format
print(response.choices[0].message.content)

Environment Variables

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="..."

Streaming

from litellm import completion

response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Fallbacks: Never Go Down

from litellm import completion

response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    fallbacks=["claude-opus-4-5", "ollama/llama3.2"],  # try these if gpt-4o fails
    num_retries=3
)

Cost Tracking

from litellm import completion
import litellm

litellm.success_callback = ["langfuse"]  # or use built-in

response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

# Check cost
print(f"Cost: ${response._hidden_params['response_cost']:.6f}")
print(f"Tokens: {response.usage.total_tokens}")

Async Support

import asyncio
from litellm import acompletion

async def main():
    tasks = [
        acompletion(model="gpt-4o", messages=[{"role": "user", "content": f"Question {i}"}])
        for i in range(5)
    ]
    responses = await asyncio.gather(*tasks)
    for r in responses:
        print(r.choices[0].message.content)

asyncio.run(main())

LiteLLM Proxy Server

The proxy lets you expose a unified OpenAI-compatible endpoint that any tool can use:

# config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: gpt-4o
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-opus
    litellm_params:
      model: claude-opus-4-5
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: llama-local
    litellm_params:
      model: ollama/llama3.2
      api_base: http://localhost:11434

general_settings:
  master_key: sk-my-secret-key  # protect the proxy

Start the proxy:

litellm --config config.yaml --port 4000

Now any OpenAI-compatible client works:

from openai import OpenAI

client = OpenAI(
    api_key="sk-my-secret-key",
    base_url="http://localhost:4000"
)

# Route to Claude using the proxy
response = client.chat.completions.create(
    model="claude-opus",  # mapped to claude-opus-4-5
    messages=[{"role": "user", "content": "Hello!"}]
)

Load Balancing

model_list:
  - model_name: gpt-4o-lb
    litellm_params:
      model: gpt-4o
      api_key: os.environ/OPENAI_API_KEY_1

  - model_name: gpt-4o-lb
    litellm_params:
      model: gpt-4o
      api_key: os.environ/OPENAI_API_KEY_2  # second account for load balancing

Docker Deployment

# docker-compose.yml
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"
    volumes:
      - ./config.yaml:/app/config.yaml
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
    command: ["--config", "/app/config.yaml", "--port", "4000"]

When to Use LiteLLM

Multiple providers: switching between OpenAI and Claude based on cost or availability
Teams: centralize API keys in the proxy, give teams virtual keys
Cost control: set budgets per team or per virtual key
Evaluation: test the same prompt across multiple models

LiteLLM Tutorial 2026: Use OpenAI, Claude, and Ollama with One Python API

What is LiteLLM?

Installation

Basic Usage: Same API for All Providers

Environment Variables

Streaming

Fallbacks: Never Go Down

Cost Tracking

Async Support

LiteLLM Proxy Server

Load Balancing

Docker Deployment

When to Use LiteLLM

Leonardo Lazzaro

What is LiteLLM?

Installation

Basic Usage: Same API for All Providers

Environment Variables

Streaming

Fallbacks: Never Go Down

Cost Tracking

Async Support

LiteLLM Proxy Server

Load Balancing

Docker Deployment

When to Use LiteLLM

Related Articles

Leonardo Lazzaro