}

Grafana Loki Tutorial 2026: Log Aggregation Without the ELK Complexity

Grafana Loki Tutorial 2026: Log Aggregation Without the ELK Complexity

The Elastic Stack (Elasticsearch, Logstash, Kibana) has been the default answer to log aggregation for over a decade. It works. It also demands a JVM cluster with 16 GB or more of RAM on day one, a dedicated operations team to tune heap sizes, shard counts, and index lifecycle policies, and an Elastic license that grows expensive as your data does. Grafana Loki offers a different trade-off: it indexes only the metadata labels attached to each log stream, stores the raw log lines compressed in object storage, and offloads all the expensive full-text indexing to query time. The result is a system that runs comfortably on a single node for most small-to-medium workloads and scales horizontally when you need it.

Loki has seen roughly 80% year-over-year growth in adoption through 2025 and into 2026, driven primarily by teams already running Prometheus and Grafana who want a unified observability stack without adding a second complex technology. This tutorial covers everything from first install to production-grade Kubernetes log collection.

TL;DR

  • Loki stores logs as compressed chunks in object storage (S3, GCS, local filesystem). It indexes only stream labels, not log content, making storage 10x–100x cheaper than Elasticsearch for the same volume.
  • The PLG stack is Promtail (log shipper) + Loki (storage and query engine) + Grafana (visualization). Docker Compose gets you running in under five minutes.
  • LogQL is the query language. It looks like PromQL with a log selector prepended: {app="nginx"} |= "error".
  • Promtail runs as a DaemonSet on Kubernetes, reading from /var/log/pods and auto-discovering pods via the Kubernetes API.
  • Loki supports multi-tenancy via the X-Scope-OrgID HTTP header, structured log parsing via pipeline stages, and alerting via Grafana alert rules or the built-in ruler component.
  • For full-text search at write time, regex-heavy analytics, or long-retention compliance data, Elasticsearch is still the better tool. Loki wins on cost, operational simplicity, and Prometheus-native workflows.

What Is Loki? Loki vs ELK Stack vs Splunk

Loki is an open-source log aggregation system developed by Grafana Labs and released in 2018. Its design philosophy is deliberately borrowed from Prometheus: logs are identified by a set of key-value labels, and the storage backend holds only those labels in an index. Everything else — the raw log line — is stored compressed and unindexed in chunks.

When you query, Loki fetches the chunks that match your label selectors, decompresses them, and then applies filter expressions to find matching lines. This means reads are slightly more expensive than Elasticsearch for highly selective full-text queries, but writes and storage are far cheaper because there is no per-token index to maintain.

Cost and complexity comparison (2026 estimates for 50 GB/day ingest):

DimensionLokiELK StackSplunk Cloud
Minimum RAM (single node)2 GB16 GBManaged
Storage per GB of logs~0.10 GB (compressed, S3)~1.5 GB (index + data)~0.5 GB
LicenseApache 2.0SSPL (self-managed)Proprietary
Estimated monthly cost (cloud)$15–50$300–800$1,500+
Full-text index at write timeNoYesYes
Native Prometheus integrationYesVia exporterVia exporter
Kubernetes auto-discoveryVia PromtailVia FilebeatVia Splunk Connect

The ELK stack numbers assume a self-managed deployment on EC2 or equivalent with adequate heap memory. Splunk costs are based on their ingest-volume licensing model. Loki costs assume S3 storage at standard pricing and minimal compute for the Loki process itself.

The headline trade-off is this: ELK and Splunk index every token of every log line at ingest time. Queries are fast regardless of log volume because the index narrows results instantly. Loki defers that work to query time. If you need to search for a specific string across all logs without any label filter, Loki will scan every chunk, which is slow and expensive. If you structure your logs with good labels and use label selectors on every query, Loki is extremely fast and uses a fraction of the resources.

Loki Architecture

Loki can run as a single binary (the default for development) or as individual microservices (recommended for production scale). The key components are:

Distributor — the write path entry point. It receives log streams from shippers like Promtail, validates them, and fans them out to multiple ingesters using consistent hashing on the stream's label fingerprint. It also applies rate limiting and validation rules (max label count, max line size).

Ingester — holds log chunks in memory, grouped by stream. It accumulates lines until the chunk reaches a configured size or age, then flushes compressed chunks to the storage backend. Ingesters maintain a write-ahead log (WAL) so in-flight data survives a restart.

Querier — handles read requests. It queries both the ingesters (for recent data still in memory) and the long-term storage backend (for older chunks). It executes the actual LogQL filter logic after fetching chunks.

Query Frontend — an optional component that sits in front of queriers. It splits long-range queries into smaller shards, caches results, and retries failed sub-queries. It is the main reason large time-range queries are practical in production.

Compactor — runs as a background process to merge many small index files produced by separate ingester flushes into fewer, larger files. It also enforces retention by deleting chunks older than the configured retention period.

Ruler — evaluates recording rules and alerting rules against log data on a schedule, similar to the Prometheus ruler. Required if you want Loki-native alerts rather than Grafana alert rules.

Storage backend — Loki supports local filesystem (monolithic mode only), S3-compatible object stores (Amazon S3, MinIO, DigitalOcean Spaces), Google Cloud Storage, and Azure Blob Storage. The index can use the TSDB index format (recommended since Loki 2.8), BoltDB Shipper (legacy), or Cassandra/Bigtable for very large deployments.

For teams starting out, running Loki in single-binary mode with local filesystem storage is perfectly fine. When you outgrow a single node, switching to object storage requires only a configuration change — the data format is the same.

Install the PLG Stack with Docker Compose

The following Docker Compose file brings up Promtail, Loki, and Grafana on a single host. It creates a named volume for Loki's local storage and mounts /var/log into Promtail for log scraping.

Create a project directory and the required config files:

mkdir -p ~/plg-stack/config && cd ~/plg-stack

config/loki-config.yaml — Loki server configuration:

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  ingestion_rate_mb: 16
  ingestion_burst_size_mb: 32

analytics:
  reporting_enabled: false

config/promtail-config.yaml — Promtail configuration:

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          host: docker-host
          __path__: /var/log/*.log

  - job_name: containers
    static_configs:
      - targets:
          - localhost
        labels:
          job: containerlogs
          __path__: /var/lib/docker/containers/*/*-json.log
    pipeline_stages:
      - json:
          expressions:
            log: log
            stream: stream
            time: time
      - labels:
          stream:
      - output:
          source: log

docker-compose.yaml:

version: "3.8"

networks:
  plg:

volumes:
  loki-data:
  grafana-data:

services:
  loki:
    image: grafana/loki:3.4.2
    container_name: loki
    ports:
      - "3100:3100"
    volumes:
      - ./config/loki-config.yaml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      - plg
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "--quiet", "--tries=1", "--output-document=-", "http://localhost:3100/ready"]
      interval: 10s
      timeout: 5s
      retries: 5

  promtail:
    image: grafana/promtail:3.4.2
    container_name: promtail
    volumes:
      - ./config/promtail-config.yaml:/etc/promtail/config.yaml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yaml
    networks:
      - plg
    depends_on:
      loki:
        condition: service_healthy
    restart: unless-stopped

  grafana:
    image: grafana/grafana:11.6.0
    container_name: grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=changeme
      - GF_USERS_ALLOW_SIGN_UP=false
    networks:
      - plg
    depends_on:
      - loki
    restart: unless-stopped

Start the stack:

docker compose up -d
docker compose ps

All three services should show as healthy within about 30 seconds. Open Grafana at http://localhost:3000, log in with admin / changeme, navigate to Connections > Data Sources > Add data source, select Loki, and set the URL to http://loki:3100. Save and test. You should see "Data source connected and labels found."

Configure Promtail: Scraping Log Files with Labels

Promtail is the recommended log shipper for Loki, though Loki also accepts logs from Fluentd, Fluentbit, the OpenTelemetry Collector, and the Loki Docker logging driver.

Labels are the most important design decision in a Loki deployment. Because Loki only indexes labels, your ability to narrow down queries efficiently depends entirely on choosing labels that are high cardinality enough to be useful but not so high cardinality that they explode the number of streams.

Good label candidates: app, env (production/staging), host, namespace, pod, level (info/warn/error).

Bad label candidates: user_id, request_id, trace_id — these have millions of distinct values and create millions of separate streams, which degrades ingester performance and increases memory usage.

A more complete Promtail scrape config for a server running multiple applications:

scrape_configs:
  - job_name: nginx
    static_configs:
      - targets:
          - localhost
        labels:
          app: nginx
          env: production
          host: web-01
          __path__: /var/log/nginx/access.log
    pipeline_stages:
      - regex:
          expression: '^(?P<remote_addr>\S+) - (?P<remote_user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<request>[^"]+)" (?P<status>\d+) (?P<body_bytes_sent>\d+)'
      - labels:
          status:
      - metrics:
          http_requests_total:
            type: Counter
            description: "Total HTTP requests seen in nginx access log"
            source: status
            config:
              action: inc

  - job_name: app-json
    static_configs:
      - targets:
          - localhost
        labels:
          app: myservice
          env: production
          __path__: /var/log/myservice/*.log
    pipeline_stages:
      - json:
          expressions:
            level: level
            msg: message
            ts: timestamp
      - labels:
          level:
      - timestamp:
          source: ts
          format: RFC3339Nano

The pipeline_stages block processes each log line before it is sent to Loki. You can extract fields from JSON or regex, promote extracted fields to labels, rewrite the log line, and even drop lines that match certain conditions.

LogQL Basics: Stream Selectors and Filter Expressions

LogQL is Loki's query language. Every query starts with a log stream selector — a set of label matchers enclosed in curly braces — followed by optional pipeline expressions.

Label matchers use four operators:

# Exact match
{app="nginx"}

# Regex match
{app=~"nginx|apache"}

# Negative exact match
{app!="nginx"}

# Negative regex
{app!~"nginx|apache"}

Filter expressions narrow results after the stream is selected:

# Lines containing the string "error"
{app="nginx"} |= "error"

# Lines NOT containing "health"
{app="nginx"} != "healthcheck"

# Lines matching a regex
{app="nginx"} |~ "5[0-9]{2}"

# Lines NOT matching a regex
{app="nginx"} !~ "2[0-9]{2}"

Pattern matching — Loki's |pattern expression uses a simplified pattern syntax that is faster than regex for common log formats:

{app="nginx"} | pattern `<ip> - - [<_>] "<method> <path> <_>" <status> <size>`
| status >= "500"

Parser expressions — extract structured fields and make them available for filtering and formatting:

# Parse JSON log lines
{app="myservice"} | json | level="error"

# Parse logfmt (key=value format)
{app="myservice"} | logfmt | duration > 500ms

# Regex extraction
{app="nginx"} | regexp `(?P<status>\d{3}) (?P<bytes>\d+)$`

Line format — rewrite the displayed log line using extracted fields:

{app="myservice"} | json | line_format "{{.level}} {{.message}}"

Multiple pipeline stages chain with the pipe character. The query is evaluated left to right, so place the most selective filters first to minimize the data processed by later stages.

LogQL Metric Queries

Metric queries wrap a log pipeline in an aggregation function to produce time-series data, exactly like PromQL. This is what lets you build graphs and alerts from log data.

count_over_time — count log lines in a time range:

# Request rate per second, averaged over the last 5 minutes
rate({app="nginx"}[5m])

# Total errors in the last hour
count_over_time({app="nginx"} |= "error" [1h])

# Error rate as a fraction of total requests
sum(rate({app="nginx"} |= "error" [5m]))
/
sum(rate({app="nginx"}[5m]))

bytes_over_time and bytes_rate — measure log volume:

# Log volume in bytes per second
bytes_rate({app="nginx"}[5m])

# Total log volume per app over the last day
sum by (app) (bytes_over_time({app=~".+"}[24h]))

avg_over_time on extracted numeric fields:

# Average response time from JSON logs
avg_over_time(
  {app="myservice"} | json | unwrap duration_ms [5m]
)

# 99th percentile latency
quantile_over_time(0.99,
  {app="myservice"} | json | unwrap duration_ms [5m]
)

The unwrap expression extracts a numeric value from a parsed label, converting it to a metric. Without unwrap, metric aggregations count or size log lines; with unwrap, they aggregate numeric values extracted from the log content.

Aggregations mirror PromQL:

# Error count by application and environment
sum by (app, env) (
  count_over_time({env="production"} |= "error" [5m])
)

# Top 10 apps by log volume
topk(10, sum by (app) (bytes_rate({env="production"}[5m])))

Grafana Dashboards: Log Panels and Derived Fields

Once Loki is configured as a data source in Grafana, you can build dashboards that mix log panels with metric graphs.

Adding a Logs panel:

  1. Create a new dashboard, add a panel, and select Logs as the visualization type.
  2. Set the data source to Loki.
  3. Enter a LogQL query such as {app="nginx", env="production"} |= "error".
  4. Enable Deduplication if your logs contain repeated identical lines.
  5. Enable Show time and Wrap lines in the panel options.

Variables for dynamic dashboards — add a dashboard variable to let users filter by application:

  • Variable type: Query
  • Data source: Loki
  • Query: label_values(app) — returns all distinct values of the app label
  • Use $app in your panel queries: {app="$app", env="$env"}

Derived fields link log lines to traces in Tempo or Jaeger. In the Loki data source settings, add a derived field:

  • Name: TraceID
  • Regex: traceID=(\w+) (or "trace_id":"(\w+)" for JSON logs)
  • URL: http://tempo:3200/jaeger/api/traces/${__value.raw} (for Tempo)
  • Internal link: enabled, pointing to your Tempo data source

When a log line contains a trace ID matching the regex, Grafana renders it as a clickable link that jumps directly to the corresponding trace in Tempo. This is the core of the Grafana observability loop: logs link to traces, traces link back to metrics.

Kubernetes Log Collection: Promtail DaemonSet with Helm

On Kubernetes, every pod writes its logs to /var/log/pods/<namespace>_<pod>_<uid>/<container>/<n>.log on the node's filesystem. Promtail runs as a DaemonSet — one pod per node — to read those files and ship them to Loki with automatically discovered labels.

Install with Helm:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Install Loki in simple-scalable mode
helm install loki grafana/loki \
  --namespace monitoring \
  --create-namespace \
  --set loki.auth_enabled=false \
  --set loki.commonConfig.replication_factor=1 \
  --set loki.storage.type=filesystem \
  --set singleBinary.replicas=1

# Install Promtail pointing at the Loki service
helm install promtail grafana/promtail \
  --namespace monitoring \
  --set config.clients[0].url=http://loki-gateway.monitoring.svc.cluster.local/loki/api/v1/push

Custom values.yaml for Promtail to add extra labels and drop noisy system logs:

config:
  clients:
    - url: http://loki-gateway.monitoring.svc.cluster.local/loki/api/v1/push

  snippets:
    extraScrapeConfigs: |
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
          - role: pod
        pipeline_stages:
          - cri: {}
          - json:
              expressions:
                level: level
                msg: message
          - labels:
              level:
          - drop:
              expression: ".*kube-probe.*"
              drop_counter_reason: healthcheck
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_app]
            target_label: app
          - source_labels: [__meta_kubernetes_namespace]
            target_label: namespace
          - source_labels: [__meta_kubernetes_pod_name]
            target_label: pod
          - source_labels: [__meta_kubernetes_pod_container_name]
            target_label: container
          - source_labels: [__meta_kubernetes_pod_node_name]
            target_label: node

tolerations:
  - key: node-role.kubernetes.io/master
    operator: Exists
    effect: NoSchedule
  - key: node-role.kubernetes.io/control-plane
    operator: Exists
    effect: NoSchedule

Apply the custom values:

helm upgrade promtail grafana/promtail \
  --namespace monitoring \
  -f values-promtail.yaml

With this configuration, every pod in the cluster is automatically scraped. Labels app, namespace, pod, container, and node are available as Loki stream labels, letting you query {namespace="production", app="api-server"} |= "panic" without any manual configuration per application.

Structured Logging: Parsing JSON Logs with Pipeline Stages

Modern applications typically emit JSON-structured logs. Loki's pipeline stages let you parse that structure and extract fields for use as labels, filters, and metrics.

Consider an application emitting logs like this:

{"timestamp":"2026-05-14T10:23:41Z","level":"error","service":"payment","traceId":"4bf92f3577b34da6","message":"charge failed","user_id":42891,"amount":99.99,"error":"card_declined"}

A Promtail pipeline to handle this log format:

pipeline_stages:
  # Parse the entire line as JSON
  - json:
      expressions:
        level: level
        service: service
        trace_id: traceId
        message: message
        error_type: error

  # Promote safe low-cardinality fields to labels
  - labels:
      level:
      service:

  # Drop debug logs in production to reduce volume
  - drop:
      source: level
      expression: "debug"
      drop_counter_reason: debug_dropped

  # Rewrite the log line to a cleaner format
  - output:
      source: message

  # Extract timestamp from the log line itself
  - timestamp:
      source: timestamp
      format: RFC3339

After this pipeline runs, each log line has level and service as indexed labels. The message field becomes the displayed log line. The trace_id is available as an extracted field for derived field linking but is not promoted to a label (avoiding high cardinality).

You can then query:

{service="payment", level="error"} | json | error_type="card_declined"

The json parser in the query re-parses the stored log line, so you can filter on any field even if it was not promoted to a label at ingest time. The label filter service="payment" narrows the chunks to fetch; the json | error_type="card_declined" filter runs against the decompressed lines.

Multi-Tenancy with Loki

Loki supports multiple tenants through HTTP header-based isolation. When auth_enabled: true is set in the Loki configuration, every API request must include an X-Scope-OrgID header. Requests with different org IDs are stored and queried in completely separate namespaces — one tenant cannot see another's data.

Enable multi-tenancy in loki-config.yaml:

auth_enabled: true

Configure Promtail to send a tenant ID:

clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: team-alpha

For Kubernetes deployments where different namespaces belong to different teams, you can use Promtail's relabeling to derive the tenant ID from the Kubernetes namespace label:

clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: fake  # default, overridden by relabeling

scrape_configs:
  - job_name: kubernetes-pods
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_namespace]
        target_label: __tenant_id__

When querying in Grafana, set the X-Scope-OrgID header in the Loki data source configuration under HTTP Headers. You can provision separate Grafana data sources pointing at the same Loki instance with different tenant IDs, then use Grafana's team/folder permissions to control which teams can see which data source.

Alerting: Grafana Alert Rules on Log Patterns

There are two ways to alert on Loki data: Grafana alert rules (evaluated by the Grafana server) and Loki ruler rules (evaluated by Loki itself, similar to Prometheus recording and alerting rules). Both work; Grafana alert rules are easier to configure through the UI.

Grafana alert rule for high error rate:

  1. Navigate to Alerting > Alert Rules > New alert rule.
  2. Set the data source to Loki.
  3. Enter a metric query:
sum(rate({app="api-server", env="production"} |= "error" [5m])) > 0.1
  1. Set the condition: IS ABOVE threshold 0.1 (errors per second).
  2. Set evaluation interval to 1m, pending period to 5m (fires only after the condition holds for 5 minutes to reduce noise).
  3. Add labels: severity=critical, team=backend.
  4. Add a contact point (Slack, PagerDuty, email) in Alerting > Contact Points.

Loki ruler alerting rules (for alerts evaluated independently of Grafana):

# config/loki-rules.yaml
groups:
  - name: log-alerts
    interval: 1m
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({app="api-server", env="production"} |= "error" [5m])) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate in api-server"
          description: "Error rate is {{ $value | printf \"%.2f\" }} errors/sec"

      - alert: PanicDetected
        expr: |
          count_over_time({app=~".+", env="production"} |= "panic" [1m]) > 0
        for: 0m
        labels:
          severity: page
        annotations:
          summary: "Panic detected in {{ $labels.app }}"

Place this file in the Loki rules directory (configured as /loki/rules in the example above) under a tenant subdirectory: /loki/rules/fake/loki-rules.yaml. Configure the ruler in loki-config.yaml:

ruler:
  storage:
    type: local
    local:
      directory: /loki/rules
  rule_path: /tmp/loki-rules
  alertmanager_url: http://alertmanager:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true

Loki vs Elasticsearch: When to Choose Each

Choose Loki when:

  • Your team already runs Prometheus and Grafana. The label-based mental model is identical, and you avoid adding a second observability paradigm.
  • Your log volume is growing fast and storage cost is a concern. Loki's object storage backend with compression typically achieves 10x–20x better storage efficiency than Elasticsearch.
  • You run Kubernetes and want zero-configuration per-pod log collection with automatic label discovery.
  • Your log queries are primarily filtered by application, environment, and severity — the kind of queries labels handle efficiently.
  • You want a simple operational footprint. A single Loki binary can replace an entire Elasticsearch cluster for many workloads.

Choose Elasticsearch when:

  • You need full-text search across log content without any label pre-filtering. Searching for an arbitrary string across all logs is fast in Elasticsearch because every token is indexed; in Loki it requires a full chunk scan.
  • You need complex aggregations at query time (cardinality, percentiles on arbitrary fields) without schema pre-planning.
  • You are ingesting logs from sources where you cannot add structured labels at collection time (legacy syslog, third-party appliances).
  • Your compliance requirements mandate specific data retention, audit, and chain-of-custody features that commercial Elasticsearch distributions provide.
  • Your team has existing Kibana dashboards and operational expertise that would be expensive to rebuild.

A common pattern in 2026 is running both: Loki for application logs (high-volume, label-friendly, cost-sensitive) and Elasticsearch for security event logs and audit trails (lower volume, full-text search required).

Retention and Storage: Object Storage Configuration

For production deployments, local filesystem storage is not appropriate — it cannot be shared across multiple Loki instances and is not durable without additional backup infrastructure. Switch to S3 or GCS.

S3 configuration in loki-config.yaml:

common:
  storage:
    s3:
      endpoint: s3.amazonaws.com
      region: us-east-1
      bucketnames: my-loki-chunks
      access_key_id: ${AWS_ACCESS_KEY_ID}
      secret_access_key: ${AWS_SECRET_ACCESS_KEY}
      s3forcepathstyle: false

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: s3
      schema: v13
      index:
        prefix: loki_index_
        period: 24h

storage_config:
  tsdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/index_cache
  aws:
    s3: s3://my-loki-chunks
    region: us-east-1

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150

limits_config:
  retention_period: 744h  # 31 days

For per-tenant retention (available when auth_enabled: true):

limits_config:
  retention_period: 744h  # default for all tenants

  # Override per tenant in the Loki runtime config
  per_tenant_override_config: /etc/loki/runtime-config.yaml

runtime-config.yaml (hot-reloaded without restart):

overrides:
  team-alpha:
    retention_period: 2160h  # 90 days for compliance team
  team-beta:
    retention_period: 168h   # 7 days for development team
    ingestion_rate_mb: 8

GCS configuration:

common:
  storage:
    gcs:
      bucket_name: my-loki-bucket

storage_config:
  gcs:
    bucket_name: my-loki-bucket

For GCS, authenticate via Workload Identity on GKE or set the GOOGLE_APPLICATION_CREDENTIALS environment variable pointing to a service account key file.

MinIO (self-hosted S3-compatible) is a good option for on-premises deployments or local testing with an S3-compatible API:

docker run -d --name minio \
  -p 9000:9000 -p 9001:9001 \
  -e MINIO_ROOT_USER=minioadmin \
  -e MINIO_ROOT_PASSWORD=minioadmin \
  minio/minio server /data --console-address ":9001"

Then set endpoint: minio:9000 and s3forcepathstyle: true in the Loki S3 config.

FAQ

Q: Can I migrate existing Elasticsearch or Splunk data to Loki? A: Not directly — the index formats are entirely different. For historical data, the common approach is to set both systems to run in parallel during a transition period (typically 30–90 days matching your retention window), then decommission the old system once the Loki retention covers the required history. There is no official migration tool.

Q: How does Loki handle log lines out of order? A: Loki requires that log lines within a given stream are ingested in roughly chronological order. Lines arriving more than the max_chunk_age (default 2 hours) out of order are rejected when reject_old_samples: true is set. Adjust reject_old_samples_max_age if your application has long delays between log generation and shipping.

Q: What is the maximum log line size? A: The default maximum line size is 256 KB, configurable via max_line_size in limits_config. Lines exceeding this limit are dropped at the distributor.

Q: Can I use Loki without Promtail? A: Yes. Loki exposes a standard HTTP push API (/loki/api/v1/push) and is compatible with Fluentd (via the fluent-plugin-grafana-loki plugin), Fluent Bit (native output plugin), the OpenTelemetry Collector (otlp_http or otlp_grpc exporters), Vector, Logstash (via logstash-output-loki), and the Docker logging driver (loki log driver). Many teams use Fluent Bit on Kubernetes for lower resource usage than Promtail.

Q: How do I debug missing logs in Loki? A: Check the Promtail /metrics endpoint (port 9080 by default) for promtail_sent_entries_total and promtail_dropped_entries_total. Check /targets to verify that Promtail discovered your log files. Check Loki's /metrics for loki_distributor_bytes_received_total. Check Grafana's Explore view with a very broad selector ({job=~".+"}) to confirm data is arriving. The Promtail positions.yaml file tracks read offsets — deleting it will re-read all log files from the beginning.

Q: Is Loki production-ready in 2026? A: Yes. Grafana Labs runs Loki at petabyte scale internally, and large companies including Adobe, Deutsche Telekom, and Snyk run Loki in production. The TSDB index format (stable since Loki 2.8) and the microservices deployment mode are both mature. The main operational risk is cardinality explosion from poorly chosen labels — address this by reviewing stream counts regularly with {job=~".+"} | count by (stream) in Explore.

Sources

Leonardo Lazzaro

Software engineer and technical writer. 10+ years experience in DevOps, Python, and Linux systems.

More articles by Leonardo Lazzaro