OpenTelemetry Python Tutorial 2026: Traces, Metrics, and Logs
Distributed systems fail in ways that single-process applications never did. A single user request may touch ten microservices, three databases, and two message queues before returning a response. When something goes wrong — a latency spike, a silent error, a cascade failure — you need to understand the entire chain, not just one service's log file. OpenTelemetry was built to solve exactly this problem, and by 2026 it has become the universal answer to the observability question for Python developers.
This tutorial is a complete, practical guide to instrumenting Python applications with OpenTelemetry. You will learn the conceptual model, install the SDK, apply both automatic and manual instrumentation to a FastAPI application, collect all three signals (traces, metrics, logs), deploy the OpenTelemetry Collector with Docker, and export data to Jaeger and Prometheus. Every code example is tested against the current stable releases.
TL;DR
- OpenTelemetry (OTel) is the CNCF's vendor-neutral standard for observability — one SDK, any backend.
- It defines three signals: traces (request flows), metrics (numeric measurements), and logs (events with context).
- Install with
pip install opentelemetry-sdk opentelemetry-exporter-otlp. - Use
opentelemetry-instrumentfor zero-code auto-instrumentation of FastAPI, Django, and Flask. - Add manual spans with
tracer.start_as_current_span()for business-logic visibility. - Propagate context across service boundaries with W3C TraceContext headers.
- Deploy the OTel Collector as a Docker sidecar; route traces to Jaeger and metrics to Prometheus.
- Correlate logs with traces by injecting
trace_idinto log records.
What Is OpenTelemetry and Why It Became the Industry Standard
OpenTelemetry is an open-source observability framework and a set of APIs, SDKs, and tooling for generating, collecting, and exporting telemetry data. It was formed in 2019 by merging OpenTracing and OpenCensus, two earlier competing standards, under the governance of the Cloud Native Computing Foundation (CNCF). Today it is the second-most-active CNCF project after Kubernetes.
The growth numbers are striking. OpenTelemetry's GitHub repositories collectively exceeded 200 million artifact downloads in 2025, representing roughly 70% year-over-year growth. The Python SDK alone accumulates tens of millions of PyPI downloads per month. Every major cloud provider (AWS, GCP, Azure), every major APM vendor (Datadog, New Relic, Dynatrace, Honeycomb, Grafana), and hundreds of open-source projects now support OTLP — the OpenTelemetry Protocol — as a first-class ingestion format.
The reason OpenTelemetry won is straightforward: it eliminated vendor lock-in at the instrumentation layer. Before OTel, switching your observability backend meant re-instrumenting every service from scratch. With OTel, you instrument once using standard APIs and redirect your data by changing a single environment variable. The collector pipeline adds a further layer of flexibility, allowing you to fan out the same telemetry stream to multiple backends simultaneously.
For Python developers, the benefits are concrete:
- A single
pip installpulls in the entire SDK. - Auto-instrumentation patches popular frameworks (FastAPI, Django, Flask, SQLAlchemy, Redis, requests, httpx, psycopg2, and many more) without touching application code.
- The same instrumentation code works whether you are running locally, in Docker, or in a Kubernetes pod.
- Context propagation between services is handled automatically through HTTP headers and message queue metadata.
The Three OTel Signals
OpenTelemetry organises observability data into three complementary signals. Understanding what each signal is for prevents the common mistake of trying to answer every question with logs alone.
Traces
A trace represents the complete journey of a single request through a distributed system. It is composed of spans — individual units of work, each with a start time, duration, attributes (key-value metadata), events (timestamped log-like annotations), and a status. Spans are linked by a shared trace_id and form a parent-child tree that maps exactly to the call graph of your system.
Traces answer the question: "What happened for this specific request, and where did the time go?"
Metrics
Metrics are numeric measurements aggregated over time. Unlike traces, which record individual events, metrics summarise behaviour at scale — request rates, error rates, latency percentiles, queue depths, memory usage. They are cheap to store and query even at high cardinality, making them the right signal for dashboards and alerting.
Metrics answer the question: "What is the system doing right now, and how does that compare to normal?"
Logs
Logs are timestamped, structured text records that describe discrete events. They have been the default observability primitive since the beginning of computing. OTel's contribution is not to replace your logging library but to bridge it — injecting trace_id and span_id into every log record so that you can navigate from a log line directly to the trace that produced it.
Logs answer the question: "What exactly happened at this moment in time?"
When all three signals share the same trace context, you can start an investigation with a metric alert, drill into the relevant traces, and then read the structured logs from the exact span that failed. This is the observability workflow OpenTelemetry enables.
Installing the SDK
The OpenTelemetry Python SDK is split into focused packages. Install the core SDK and the OTLP exporter (which sends data to the Collector or any OTLP-compatible backend):
pip install \
opentelemetry-sdk \
opentelemetry-exporter-otlp \
opentelemetry-instrumentation-fastapi \
opentelemetry-instrumentation-httpx \
opentelemetry-instrumentation-sqlalchemy
For Django projects replace fastapi with django. For Flask replace it with flask. A full list of instrumentation packages is available at opentelemetry-python-contrib.
Pin specific versions in your requirements.txt for reproducible builds:
opentelemetry-sdk==1.25.0
opentelemetry-exporter-otlp==1.25.0
opentelemetry-instrumentation-fastapi==0.46b0
opentelemetry-instrumentation-httpx==0.46b0
opentelemetry-instrumentation-sqlalchemy==0.46b0
The opentelemetry-distro package provides a convenient meta-package plus the opentelemetry-instrument CLI:
pip install opentelemetry-distro
opentelemetry-bootstrap --action=install # auto-detects installed libraries and installs their instrumentation packages
Auto-Instrumentation: Zero-Code Setup
The fastest path to traces and metrics is opentelemetry-instrument, a wrapper script that patches your application at startup without any code changes.
# FastAPI
opentelemetry-instrument \
--traces_exporter otlp \
--metrics_exporter otlp \
--logs_exporter otlp \
--exporter_otlp_endpoint http://localhost:4317 \
--service_name my-fastapi-app \
uvicorn app.main:app --host 0.0.0.0 --port 8000
# Django
opentelemetry-instrument \
--traces_exporter otlp \
--metrics_exporter otlp \
--service_name my-django-app \
python manage.py runserver
# Flask
opentelemetry-instrument \
--traces_exporter otlp \
--metrics_exporter otlp \
--service_name my-flask-app \
flask run
You can also configure everything through environment variables, which is the preferred approach for container deployments:
export OTEL_SERVICE_NAME=my-fastapi-app
export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
export OTEL_TRACES_EXPORTER=otlp
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_PYTHON_LOG_CORRELATION=true # injects trace context into log records
opentelemetry-instrument uvicorn app.main:app --host 0.0.0.0 --port 8000
Auto-instrumentation covers: incoming HTTP requests (FastAPI, Django, Flask, Starlette), outgoing HTTP calls (requests, httpx, aiohttp), database queries (SQLAlchemy, psycopg2, pymongo, redis), messaging (kafka-python, pika for RabbitMQ), and more. Each instrumented call becomes a span automatically with standard HTTP and database semantic attributes populated.
Manual Tracing: TracerProvider, Tracer, and Span
Auto-instrumentation covers the infrastructure layer. To add visibility into your own business logic — a pricing calculation, a complex data transformation, a third-party API call that has no instrumentation package — you use the manual API.
Bootstrapping the SDK in Code
# telemetry.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource, SERVICE_NAME
def configure_tracing(service_name: str, otlp_endpoint: str = "http://localhost:4317") -> None:
resource = Resource(attributes={SERVICE_NAME: service_name})
provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True)
processor = BatchSpanProcessor(exporter)
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
Call configure_tracing() once at application startup, before any request is served.
Creating Spans
# app/services/order_service.py
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
def process_order(order_id: str, user_id: str) -> dict:
with tracer.start_as_current_span("process_order") as span:
span.set_attribute("order.id", order_id)
span.set_attribute("user.id", user_id)
inventory = check_inventory(order_id)
payment = charge_payment(order_id, user_id)
span.add_event("payment_charged", attributes={"amount": payment["amount"]})
return {"status": "confirmed", "order_id": order_id}
def check_inventory(order_id: str) -> dict:
with tracer.start_as_current_span("check_inventory") as span:
span.set_attribute("order.id", order_id)
# ... database query ...
return {"available": True}
def charge_payment(order_id: str, user_id: str) -> dict:
with tracer.start_as_current_span("charge_payment") as span:
span.set_attribute("order.id", order_id)
span.set_attribute("payment.provider", "stripe")
# ... Stripe API call ...
return {"amount": 49.99, "currency": "USD"}
The with tracer.start_as_current_span(...) context manager automatically: - Creates the span as a child of the currently active span (if any). - Sets it as the active span for nested calls. - Records start and end timestamps. - Closes the span when the block exits, even if an exception is raised.
Span Attributes, Events, and Status Codes
Attributes, events, and status codes are the three ways to enrich a span with information beyond its name and timing.
Attributes are key-value pairs that describe the span. OTel defines semantic conventions for common attributes — use them instead of inventing your own names so that your data is queryable across teams and tools:
from opentelemetry.semconv.trace import SpanAttributes
with tracer.start_as_current_span("http.request") as span:
span.set_attribute(SpanAttributes.HTTP_METHOD, "POST")
span.set_attribute(SpanAttributes.HTTP_URL, "https://api.stripe.com/v1/charges")
span.set_attribute(SpanAttributes.HTTP_STATUS_CODE, 200)
span.set_attribute("payment.amount", 49.99)
span.set_attribute("payment.currency", "USD")
Events are timestamped log entries attached to a span. They record things that happened at a specific moment during the span's lifetime:
span.add_event("cache_miss", attributes={"cache.key": "product:42"})
span.add_event("retry_attempt", attributes={"retry.count": 1, "retry.delay_ms": 100})
Status codes communicate whether the span succeeded or failed. Always set the status on error so that your backend can display error rates correctly:
from opentelemetry.trace import StatusCode
try:
result = risky_operation()
except Exception as exc:
span.record_exception(exc) # attaches the stack trace as an event
span.set_status(StatusCode.ERROR, str(exc)) # marks the span as failed
raise
else:
span.set_status(StatusCode.OK)
Context Propagation: W3C TraceContext and Baggage
Context propagation is what turns a collection of isolated spans into a distributed trace. When service A calls service B, it injects trace context into the request headers. Service B extracts that context and creates child spans under the same trace.
W3C TraceContext
OpenTelemetry uses the W3C TraceContext standard by default. The propagated header looks like:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^ ^^
version trace-id (128-bit hex) span-id (64-bit) flags
The opentelemetry-instrumentation-httpx and opentelemetry-instrumentation-requests packages inject and extract this header automatically for outgoing HTTP calls. For manual propagation:
import httpx
from opentelemetry import propagate
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
def call_downstream_service(url: str, payload: dict) -> dict:
headers = {}
propagate.inject(headers) # injects traceparent (and tracestate if present)
response = httpx.post(url, json=payload, headers=headers)
return response.json()
On the receiving service, extract the context from the incoming request:
from opentelemetry import propagate, trace
def handle_request(headers: dict) -> None:
ctx = propagate.extract(headers) # reconstructs the remote SpanContext
with tracer.start_as_current_span("handle_request", context=ctx) as span:
# this span is now a child of the upstream span
pass
FastAPI and Django auto-instrumentation handle this extraction automatically for every incoming request.
Baggage
Baggage allows you to propagate arbitrary key-value pairs alongside trace context. Unlike span attributes (which are local to a span), baggage is forwarded to every downstream service. Use it sparingly for values that need to cross service boundaries — for example, a tenant ID or a feature-flag value:
from opentelemetry import baggage, context
ctx = baggage.set_baggage("tenant.id", "acme-corp")
token = context.attach(ctx)
try:
# all downstream calls within this block carry the baggage header
call_downstream_service(url, payload)
finally:
context.detach(token)
# In the downstream service:
tenant_id = baggage.get_baggage("tenant.id")
Metrics API: Counter, Histogram, Gauge, UpDownCounter
OpenTelemetry provides four instrument types for different measurement scenarios.
Setting Up the Metrics SDK
# telemetry.py (extending the tracing setup above)
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
def configure_metrics(service_name: str, otlp_endpoint: str = "http://localhost:4317") -> None:
exporter = OTLPMetricExporter(endpoint=otlp_endpoint, insecure=True)
reader = PeriodicExportingMetricReader(exporter, export_interval_millis=15_000)
provider = MeterProvider(
resource=Resource(attributes={SERVICE_NAME: service_name}),
metric_readers=[reader],
)
metrics.set_meter_provider(provider)
Using the Four Instrument Types
from opentelemetry import metrics
meter = metrics.get_meter(__name__)
# Counter: monotonically increasing value (requests served, errors, items processed)
request_counter = meter.create_counter(
name="http.server.request_count",
description="Total number of HTTP requests received",
unit="1",
)
# Histogram: distribution of measurements (latency, response size)
latency_histogram = meter.create_histogram(
name="http.server.request_duration",
description="HTTP request duration",
unit="ms",
)
# Gauge: a value that can go up or down arbitrarily (current temperature, cache size)
# Gauges are best expressed as ObservableGauge for values you poll rather than record:
def get_queue_depth() -> int:
return task_queue.qsize()
queue_depth_gauge = meter.create_observable_gauge(
name="task_queue.depth",
callbacks=[lambda opts: [metrics.Observation(get_queue_depth())]],
description="Current depth of the task queue",
unit="1",
)
# UpDownCounter: a value that increases and decreases (active connections, items in flight)
active_connections = meter.create_up_down_counter(
name="http.server.active_connections",
description="Number of active HTTP connections",
unit="1",
)
# Recording measurements:
def handle_request(method: str, path: str, status: int, duration_ms: float) -> None:
labels = {"http.method": method, "http.route": path, "http.status_code": str(status)}
request_counter.add(1, labels)
latency_histogram.record(duration_ms, labels)
Logs API: LogEmitter and the Python Logging Bridge
OTel's approach to logs is pragmatic: rather than replacing Python's standard logging module, it bridges it. A LoggingHandler intercepts every log record emitted by Python's logging system and forwards it to the OTel SDK — enriched with the current trace_id, span_id, and trace flags.
# telemetry.py (extending the setup above)
import logging
from opentelemetry._logs import set_logger_provider
from opentelemetry.sdk._logs import LoggerProvider
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
from opentelemetry.sdk._logs._internal.export import ConsoleLogExporter
from opentelemetry.instrumentation.logging import LoggingInstrumentor
def configure_logging(service_name: str, otlp_endpoint: str = "http://localhost:4317") -> None:
exporter = OTLPLogExporter(endpoint=otlp_endpoint, insecure=True)
provider = LoggerProvider(
resource=Resource(attributes={SERVICE_NAME: service_name})
)
provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
set_logger_provider(provider)
# Bridge standard Python logging into OTel
LoggingInstrumentor().instrument(set_logging_format=True)
Once the bridge is active, every logging.getLogger(__name__).info(...) call is automatically forwarded to the OTel SDK and includes trace context:
import logging
logger = logging.getLogger(__name__)
def process_order(order_id: str) -> None:
with tracer.start_as_current_span("process_order"):
logger.info("Processing order", extra={"order.id": order_id})
# The log record will contain trace_id and span_id automatically
The resulting log output (in JSON format) looks like:
{
"timestamp": "2026-05-14T10:23:41.123Z",
"severity": "INFO",
"body": "Processing order",
"attributes": {"order.id": "ord-789"},
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"span_id": "00f067aa0ba902b7",
"trace_flags": "01"
}
Resource Detection: service.name, Host, and Process Attributes
A Resource describes the entity that produced the telemetry — the service, the host, the process, the container. Resource attributes appear on every span, metric, and log record. Without them, data from different services is indistinguishable.
from opentelemetry.sdk.resources import (
Resource,
SERVICE_NAME,
SERVICE_VERSION,
DEPLOYMENT_ENVIRONMENT,
)
from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector
from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector
resource = Resource.create({
SERVICE_NAME: "order-service",
SERVICE_VERSION: "2.4.1",
DEPLOYMENT_ENVIRONMENT: "production",
})
OTel also ships automatic resource detectors for common environments. Install the extras and let them run at startup:
pip install opentelemetry-resource-detector-aws
pip install opentelemetry-resource-detector-container
from opentelemetry.sdk.resources import Resource, ProcessResourceDetector, OsResourceDetector
from opentelemetry.resource.detector.container import ContainerResourceDetector
resource = Resource.create().merge(
Resource.create(detectors=[
ProcessResourceDetector(),
OsResourceDetector(),
ContainerResourceDetector(),
])
)
The container detector automatically populates container.id from /proc/self/cgroup. On AWS it will add cloud.provider, cloud.region, host.id, and more. These attributes become filterable dimensions in every observability backend.
Sampling Strategies: TraceIdRatioBased and ParentBased
In production, recording every trace is often impractical. A high-traffic service might generate millions of spans per minute — storing all of them is expensive and usually unnecessary. Sampling selects which traces to record.
OpenTelemetry provides two built-in samplers:
TraceIdRatioBased samples a fixed percentage of traces, determined by the trace_id. Because the sampling decision is based on the trace ID, the same trace will always be sampled or always be dropped, even across service boundaries:
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased, ParentBased
# Sample 10% of all traces
sampler = TraceIdRatioBased(rate=0.1)
provider = TracerProvider(resource=resource, sampler=sampler)
ParentBased respects the sampling decision made by the upstream service. If the parent span was sampled, the child span is sampled too, and vice versa. This is the default sampler and is almost always what you want in a microservices environment:
# Sample 10% at the root; all downstream spans follow the root's decision
sampler = ParentBased(root=TraceIdRatioBased(rate=0.1))
provider = TracerProvider(resource=resource, sampler=sampler)
Configure sampling via environment variables for deployment-time control:
export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1
For more advanced use cases — sampling based on attributes like error status, specific routes, or user tiers — implement a custom Sampler by subclassing opentelemetry.sdk.trace.sampling.Sampler.
The OpenTelemetry Collector: Deploy with Docker
The OTel Collector is a standalone agent that receives telemetry from your application, processes it, and exports it to one or more backends. Running a Collector between your application and your observability backends provides several advantages:
- Your application always exports to
localhost:4317— no backend-specific configuration in application code. - The Collector can fan out to multiple backends simultaneously.
- Processing pipelines can add attributes, filter noise, batch efficiently, and retry on failure.
- You can swap backends (e.g., migrate from Jaeger to Tempo) without redeploying applications.
Docker Compose Stack
# docker-compose.yml
version: "3.9"
services:
app:
build: .
environment:
OTEL_SERVICE_NAME: my-fastapi-app
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
OTEL_TRACES_EXPORTER: otlp
OTEL_METRICS_EXPORTER: otlp
OTEL_LOGS_EXPORTER: otlp
OTEL_PYTHON_LOG_CORRELATION: "true"
command: opentelemetry-instrument uvicorn app.main:app --host 0.0.0.0 --port 8000
ports:
- "8000:8000"
depends_on:
- otel-collector
otel-collector:
image: otel/opentelemetry-collector-contrib:0.101.0
command: ["--config=/etc/otel/config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel/config.yaml
ports:
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
- "8888:8888" # Collector's own metrics (Prometheus format)
- "8889:8889" # Prometheus exporter endpoint
depends_on:
- jaeger
- prometheus
jaeger:
image: jaegertracing/all-in-one:1.57
ports:
- "16686:16686" # Jaeger UI
- "14250:14250" # gRPC for collector → Jaeger
environment:
COLLECTOR_OTLP_ENABLED: "true"
prometheus:
image: prom/prometheus:v2.52.0
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
OTel Collector Configuration
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 512
spike_limit_mib: 128
resource:
attributes:
- action: insert
key: environment
value: production
exporters:
otlp/jaeger:
endpoint: jaeger:14250
tls:
insecure: true
prometheus:
endpoint: 0.0.0.0:8889
namespace: myapp
debug:
verbosity: basic
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, resource]
exporters: [otlp/jaeger, debug]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus, debug]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [debug]
Prometheus Scrape Configuration
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: otel-collector
static_configs:
- targets: ["otel-collector:8889"] # metrics exported by the Collector
- job_name: collector-internal
static_configs:
- targets: ["otel-collector:8888"] # Collector's own health metrics
Start the entire stack with:
docker compose up -d
The Jaeger UI is available at http://localhost:16686. Prometheus is at http://localhost:9090.
Full Working FastAPI Example
The following is a complete, minimal FastAPI application that wires together tracing, metrics, and logs:
# app/main.py
import logging
from contextlib import asynccontextmanager
import httpx
from fastapi import FastAPI, HTTPException
from opentelemetry import metrics, trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
from opentelemetry.instrumentation.logging import LoggingInstrumentor
from opentelemetry.sdk._logs import LoggerProvider
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.resources import DEPLOYMENT_ENVIRONMENT, SERVICE_NAME, SERVICE_VERSION, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased
from opentelemetry.trace import StatusCode
logger = logging.getLogger(__name__)
OTLP_ENDPOINT = "http://otel-collector:4317"
SERVICE = "order-service"
def setup_telemetry() -> None:
resource = Resource(
attributes={
SERVICE_NAME: SERVICE,
SERVICE_VERSION: "1.0.0",
DEPLOYMENT_ENVIRONMENT: "production",
}
)
# Traces
trace_exporter = OTLPSpanExporter(endpoint=OTLP_ENDPOINT, insecure=True)
sampler = ParentBased(root=TraceIdRatioBased(rate=1.0)) # 100% in dev
tracer_provider = TracerProvider(resource=resource, sampler=sampler)
tracer_provider.add_span_processor(BatchSpanProcessor(trace_exporter))
trace.set_tracer_provider(tracer_provider)
# Metrics
metric_exporter = OTLPMetricExporter(endpoint=OTLP_ENDPOINT, insecure=True)
reader = PeriodicExportingMetricReader(metric_exporter, export_interval_millis=15_000)
meter_provider = MeterProvider(resource=resource, metric_readers=[reader])
metrics.set_meter_provider(meter_provider)
# Logs
log_exporter = OTLPLogExporter(endpoint=OTLP_ENDPOINT, insecure=True)
log_provider = LoggerProvider(resource=resource)
log_provider.add_log_record_processor(BatchLogRecordProcessor(log_exporter))
set_logger_provider(log_provider)
LoggingInstrumentor().instrument(set_logging_format=True)
# Auto-instrument HTTP client
HTTPXClientInstrumentor().instrument()
@asynccontextmanager
async def lifespan(app: FastAPI):
setup_telemetry()
FastAPIInstrumentor.instrument_app(app)
logger.info("Telemetry configured", extra={"service": SERVICE})
yield
app = FastAPI(title="Order Service", lifespan=lifespan)
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)
order_counter = meter.create_counter("orders.created", unit="1", description="Total orders created")
order_latency = meter.create_histogram("orders.processing_duration", unit="ms")
@app.post("/orders")
async def create_order(order_id: str, user_id: str, amount: float):
with tracer.start_as_current_span("create_order") as span:
span.set_attribute("order.id", order_id)
span.set_attribute("user.id", user_id)
span.set_attribute("order.amount", amount)
logger.info("Creating order", extra={"order_id": order_id, "user_id": user_id})
try:
import time
start = time.monotonic()
# Simulate downstream call (auto-instrumented by HTTPXClientInstrumentor)
async with httpx.AsyncClient() as client:
resp = await client.get(f"http://inventory-service/items/{order_id}")
if resp.status_code == 404:
raise HTTPException(status_code=422, detail="Item not in inventory")
duration_ms = (time.monotonic() - start) * 1000
order_counter.add(1, {"status": "success", "user.tier": "standard"})
order_latency.record(duration_ms, {"status": "success"})
span.set_status(StatusCode.OK)
span.add_event("order_confirmed", attributes={"order.id": order_id})
logger.info("Order created successfully", extra={"order_id": order_id})
return {"status": "confirmed", "order_id": order_id}
except HTTPException:
order_counter.add(1, {"status": "rejected"})
span.set_status(StatusCode.ERROR, "Item not available")
raise
except Exception as exc:
span.record_exception(exc)
span.set_status(StatusCode.ERROR, str(exc))
order_counter.add(1, {"status": "error"})
logger.error("Order processing failed", extra={"order_id": order_id, "error": str(exc)})
raise HTTPException(status_code=500, detail="Internal error") from exc
@app.get("/health")
async def health():
return {"status": "ok"}
Correlating Traces and Logs with trace_id Injection
The LoggingInstrumentor (shown above) automatically injects trace context into every Python logging record. When you enable JSON log formatting, each log line carries trace_id and span_id fields that link directly to the corresponding span in Jaeger or Tempo.
To make this work with structured logging libraries like structlog or python-json-logger:
# With python-json-logger
import logging
from pythonjsonlogger import jsonlogger
from opentelemetry import trace
class TraceContextFilter(logging.Filter):
"""Inject OTel trace context into every log record."""
def filter(self, record: logging.LogRecord) -> bool:
span = trace.get_current_span()
ctx = span.get_span_context()
if ctx and ctx.is_valid:
record.trace_id = format(ctx.trace_id, "032x")
record.span_id = format(ctx.span_id, "016x")
record.trace_flags = str(ctx.trace_flags)
else:
record.trace_id = "0" * 32
record.span_id = "0" * 16
record.trace_flags = "00"
return True
handler = logging.StreamHandler()
handler.setFormatter(jsonlogger.JsonFormatter("%(asctime)s %(levelname)s %(message)s %(trace_id)s %(span_id)s"))
handler.addFilter(TraceContextFilter())
logging.getLogger().addHandler(handler)
logging.getLogger().setLevel(logging.INFO)
In Grafana, you can configure a derived field on your log data source to create clickable links from trace_id values directly to the corresponding trace in Tempo or Jaeger — turning every log line into a one-click navigation to the full distributed trace.
FAQ
Do I need to run the OTel Collector? Can I export directly from the application?
You can export directly to any OTLP-compatible backend by pointing OTEL_EXPORTER_OTLP_ENDPOINT at it. The Collector is optional but recommended for production: it handles batching, retries, tail-based sampling, and fan-out to multiple backends. Running it as a sidecar container adds minimal overhead.
What is the performance overhead of OTel instrumentation?
The SDK is designed to be fast. The BatchSpanProcessor and PeriodicExportingMetricReader export asynchronously and do not block request handling. In benchmarks on typical Python web services, the overhead is under 1ms per request. The largest cost is memory — keep the memory_limiter processor configured in the Collector.
Can I use OTel with async Python (asyncio)?
Yes. All OTel Python APIs are async-compatible. Context propagation works correctly across await points because OTel uses Python's contextvars.Context under the hood, which is carried across coroutine boundaries.
How do I instrument a Celery worker?
Install opentelemetry-instrumentation-celery and call CeleryInstrumentor().instrument() before starting the worker. The instrumentation creates a trace link between the task producer (e.g., a web request) and the Celery worker — preserving the end-to-end trace across the message queue boundary.
What is the difference between OTLP gRPC and OTLP HTTP?
Both carry the same data. gRPC (port 4317) uses protobuf framing and HTTP/2 multiplexing — more efficient for high-volume telemetry. HTTP (port 4318) uses protobuf or JSON over HTTP/1.1 — simpler to configure through corporate proxies and firewalls. Choose gRPC for internal services; HTTP for edge cases where gRPC is blocked.
How do I test that instrumentation is working without a backend?
Use the ConsoleSpanExporter and ConsoleMetricExporter during development — they print telemetry to stdout in human-readable form. Or set OTEL_TRACES_EXPORTER=console before running opentelemetry-instrument. No Collector or backend required.
Is OpenTelemetry stable enough for production?
The traces API and SDK reached stable status in 2021. The metrics API and SDK reached stable status in 2022. The logs API/SDK is stable as of OTel Python 1.20. The semantic conventions are actively being stabilized signal by signal. The project is used in production by thousands of organisations including major cloud providers.
Sources
- OpenTelemetry Python SDK — GitHub
- OpenTelemetry Python Contrib — GitHub
- OpenTelemetry Specification
- OpenTelemetry Collector Documentation
- W3C TraceContext Recommendation
- OpenTelemetry Semantic Conventions
- Jaeger Distributed Tracing
- Prometheus Monitoring System
- CNCF OpenTelemetry Project Page
- OpenTelemetry Python Docs — opentelemetry.io