}

eBPF and Cilium Tutorial 2026: Kubernetes Networking and Security


TL;DR

eBPF (extended Berkeley Packet Filter) lets you run sandboxed programs inside the Linux kernel without writing kernel modules or patching kernel source. Cilium leverages eBPF to deliver a high-performance CNI for Kubernetes, replacing kube-proxy, enforcing L3/L4/L7 network policies, enabling mutual TLS without sidecars, and providing real-time observability through Hubble. In 2026, Cilium is the most widely adopted Kubernetes CNI in production, powering clusters at Google, Microsoft, AWS, and thousands of enterprises. This tutorial takes you from zero to a fully operational Cilium installation with network policies, Hubble observability, WireGuard encryption, and egress gateway configuration.

What you will have after this tutorial:

  • Cilium installed as your cluster CNI (kube-proxy replaced with eBPF)
  • L3/L4/L7 network policies enforced on real workloads
  • Hubble UI and CLI showing live network flows
  • WireGuard transparent encryption between nodes
  • Egress gateway controlling outbound cluster traffic
  • A working eBPF program written in C and loaded via libbpf

Prerequisites: A Linux machine (kernel 5.15+), kubectl, helm 3.x, and a working Kubernetes cluster (kind, k3s, or managed). Root or sudo access on cluster nodes is needed for some steps.


1. What Is eBPF and Why It Is the #1 Kubernetes Networking Trend

eBPF stands for extended Berkeley Packet Filter. The original BPF was introduced in 1992 as a way to filter network packets in the kernel without copying them to user space. The "extended" version, merged into the Linux kernel around version 3.18 and dramatically expanded in kernels 4.x and 5.x, transformed BPF into a general-purpose in-kernel virtual machine capable of safely executing arbitrary logic at dozens of kernel hook points.

In the Kubernetes world, the conventional approach to networking relied on iptables (and later nftables) for service load balancing, and on per-pod network namespaces stitched together by CNI plugins running user-space daemons. Observability required injecting a sidecar proxy (Envoy, Linkerd-proxy) into every pod. These approaches work, but they have hard scalability ceilings:

  • iptables rules grow linearly with the number of services. A cluster with 10,000 services can accumulate hundreds of thousands of iptables rules, causing latency spikes on rule updates and high CPU usage during kube-proxy synchronization.
  • Sidecar proxies add latency (two extra TCP hops per connection), consume memory on every pod, and complicate upgrade rollouts.
  • Traditional CNI plugins rely on the kernel network stack traversal (veth pairs, iptables NAT, conntrack) for every packet, adding overhead measured in microseconds that compounds at scale.

eBPF eliminates these bottlenecks because eBPF programs run directly in the kernel, at the exact points where packets are processed, without user-space round trips. Key advantages:

  • No kernel modules required. eBPF programs are loaded and verified at runtime. You do not need to recompile the kernel or maintain out-of-tree modules.
  • No sidecars required for observability or policy. eBPF programs attach to cgroup hooks, socket hooks, and TC (traffic control) hooks that see every byte flowing in or out of every container automatically.
  • O(1) service lookup. Cilium replaces iptables DNAT rules with eBPF maps (hash tables in kernel memory), reducing service lookup from O(n) to O(1) regardless of cluster size.
  • Kernel-verified safety. The eBPF verifier statically checks every program before loading it, ensuring it cannot crash the kernel, access arbitrary memory, or loop infinitely.

According to the CNCF Annual Survey 2025, Cilium surpassed Flannel and Calico to become the most commonly used CNI plugin in production Kubernetes deployments, with adoption growing 47% year-over-year.


2. How eBPF Works: Kernel Hooks, Maps, Verifier, and BPF Programs

Understanding Cilium requires understanding the four fundamental eBPF primitives.

2.1 BPF Programs

A BPF program is a function written in restricted C (or Rust, Go via compilers like bpf2go), compiled to BPF bytecode by LLVM/Clang, and loaded into the kernel via the bpf() syscall. The program must be finite, must not use unbounded loops (unless the verifier can prove termination), and must not dereference arbitrary pointers.

Program types determine which kernel hook the program attaches to:

Program TypeHook LocationUse Case
XDPNetwork driver RX path (before sk_buff allocation)DDoS mitigation, fast packet drop
TC (Traffic Control)Ingress/egress on a network interfacePer-pod policy, NAT, load balancing
kprobe/kretprobeAny kernel function entry/exitTracing, profiling
tracepointStatic kernel trace pointsStable tracing interface
cgroup/skbcgroup socket buffer hooksPer-container network policy
sockopsSocket operations (connect, close)TCP acceleration, sockmap
sk_msgSocket message redirectionZero-copy data path

Cilium primarily uses TC hooks and cgroup hooks to implement its data plane.

2.2 BPF Maps

BPF maps are key/value stores shared between BPF programs and user space. They persist across program invocations and are the primary mechanism for storing state (connection tracking tables, policy rules, metrics counters). Map types include:

  • BPF_MAP_TYPE_HASH — general-purpose hash table
  • BPF_MAP_TYPE_LRU_HASH — hash table with least-recently-used eviction (connection tracking)
  • BPF_MAP_TYPE_ARRAY — fixed-size array indexed by integer
  • BPF_MAP_TYPE_PERF_EVENT_ARRAY — ring buffer for sending events to user space
  • BPF_MAP_TYPE_SOCK_MAP — map of sockets for redirection

Cilium stores service endpoints, policy rules, and identity labels in BPF maps. When you apply a Kubernetes NetworkPolicy, the Cilium agent compiles it into BPF map entries — no iptables rules involved.

2.3 The eBPF Verifier

Before any BPF program runs, the kernel verifier performs static analysis:

  1. Control-flow graph check: Ensures the program is a DAG (no unreachable code, no backward edges unless bounded loops are allowed in kernel 5.3+).
  2. Register type tracking: Tracks the type of every register at every instruction. Pointer arithmetic is restricted.
  3. Memory access validation: Every pointer dereference must be proven safe before dereferencing.
  4. Resource limit: Programs are limited to one million instructions (raised in recent kernels).

If verification fails, bpf() returns an error with a human-readable log explaining which instruction caused the violation.

2.4 The BPF CO-RE Mechanism

BPF CO-RE (Compile Once, Run Everywhere) is the feature that makes modern eBPF programs portable across kernel versions. Using BTF (BPF Type Format) embedded in the kernel and the libbpf library, a BPF program compiled once can access kernel structures correctly even if field offsets differ between kernel versions — the loader patches the bytecode at load time using relocation information.


3. Cilium Overview: CNI, Network Policy, Load Balancing, Security

Cilium is an open-source project (CNCF Graduated, 2023) that uses eBPF to provide:

  • CNI (Container Network Interface): IP address management (IPAM), routing between pods, and optional BGP peering via Cilium's built-in BGP control plane.
  • Network Policy: Kubernetes NetworkPolicy (L3/L4) plus Cilium's own CiliumNetworkPolicy (L3/L4/L7, including HTTP method/path filtering, DNS-based egress, and FQDN policies).
  • kube-proxy replacement: eBPF-based service load balancing that replaces iptables DNAT entirely.
  • Hubble: Built-in observability layer exposing per-flow, per-identity network telemetry via gRPC, a CLI, and a web UI.
  • Mutual authentication: Node-to-node SPIFFE-based mutual TLS without injecting sidecars.
  • Transparent encryption: WireGuard or IPsec between nodes.
  • Egress gateway: Route specific pod traffic through a designated gateway node with a stable external IP.
  • Cluster mesh: Multi-cluster service discovery and policy enforcement.

Cilium assigns every workload a numeric security identity derived from its Kubernetes labels. Policy enforcement happens based on identity, not IP address — this is a key architectural difference from iptables-based CNIs and is what enables Cilium to enforce policy even when pod IPs change.


4. Installing Cilium as CNI with Helm (kube-proxy Replaced by eBPF)

4.1 Cluster Preparation

For this tutorial, we use a kind cluster with kube-proxy disabled (required for the kube-proxy replacement feature):

# kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
  - role: worker
  - role: worker
networking:
  disableDefaultCNI: true
  kubeProxyMode: none
kind create cluster --config kind-config.yaml --name cilium-demo

Verify nodes are in NotReady state (expected — no CNI yet):

kubectl get nodes

4.2 Add the Cilium Helm Repository

helm repo add cilium https://helm.cilium.io/
helm repo update

4.3 Install Cilium

The following Helm values enable the kube-proxy replacement, Hubble, WireGuard encryption, and the Hubble UI:

# cilium-values.yaml
kubeProxyReplacement: true

k8sServiceHost: "$(kubectl get nodes cilium-demo-control-plane \
  -o jsonpath='{.status.addresses[0].address}')"
k8sServicePort: "6443"

ipam:
  mode: kubernetes

hubble:
  enabled: true
  relay:
    enabled: true
  ui:
    enabled: true
  metrics:
    enabled:
      - dns
      - drop
      - tcp
      - flow
      - port-distribution
      - icmp
      - http

encryption:
  enabled: true
  type: wireguard

operator:
  replicas: 1

# Enable Cilium's BGP control plane (optional)
bgpControlPlane:
  enabled: false

# Enable egress gateway (needed for section 10)
egressGateway:
  enabled: true

Install with Helm:

export API_SERVER_IP=$(kubectl get nodes cilium-demo-control-plane \
  -o jsonpath='{.status.addresses[0].address}')

helm install cilium cilium/cilium \
  --version 1.17.0 \
  --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set k8sServiceHost="${API_SERVER_IP}" \
  --set k8sServicePort=6443 \
  --set hubble.enabled=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set encryption.enabled=true \
  --set encryption.type=wireguard \
  --set egressGateway.enabled=true \
  --set operator.replicas=1

4.4 Verify Installation

Install the Cilium CLI:

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --remote-name-all \
  "https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz"
tar xzvf cilium-linux-amd64.tar.gz
sudo mv cilium /usr/local/bin

Run the connectivity test:

cilium status --wait
cilium connectivity test

All nodes should now be Ready and kube-proxy pods should be absent:

kubectl get nodes
kubectl get pods -n kube-system | grep kube-proxy   # should return nothing
kubectl get pods -n kube-system | grep cilium

5. Cilium Network Policies: L3/L4/L7 Policies with YAML Examples

Cilium supports both the standard Kubernetes NetworkPolicy and its own CiliumNetworkPolicy CRD. The CRD unlocks L7 filtering (HTTP, gRPC, Kafka, DNS) and identity-based rules.

5.1 Default Deny All

Start with a default-deny posture in your application namespace:

# default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
kubectl apply -f default-deny.yaml

5.2 L3/L4 Allow Policy

Allow the frontend pods to receive traffic on port 8080 only from the api-gateway pods:

# frontend-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-gateway-to-frontend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: frontend
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: api-gateway
      ports:
        - protocol: TCP
          port: 8080
  policyTypes:
    - Ingress

5.3 L7 HTTP Policy with CiliumNetworkPolicy

The following policy allows GET requests to /api/v1/products but blocks POST (and any other method or path) from the frontend to the catalog service:

# l7-http-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: catalog-l7-policy
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: catalog
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: GET
                path: /api/v1/products.*

5.4 DNS-Based Egress Policy (FQDN)

Allow pods in the data-pipeline namespace to reach only api.external-partner.com on HTTPS:

# fqdn-egress.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-external-partner
  namespace: data-pipeline
spec:
  endpointSelector:
    matchLabels:
      app: data-exporter
  egress:
    - toFQDNs:
        - matchName: api.external-partner.com
      toPorts:
        - ports:
            - port: "443"
              protocol: TCP
    - toEndpoints:
        - matchLabels:
            io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP
          rules:
            dns:
              - matchPattern: "*"

The DNS rule is required because Cilium intercepts DNS responses to learn the IPs associated with FQDNs dynamically.

5.5 Verify Policy Enforcement

# Check which policies are applied to a pod
kubectl exec -n production deploy/frontend -- \
  curl -s http://catalog:8080/api/v1/products   # should succeed

kubectl exec -n production deploy/frontend -- \
  curl -s -X POST http://catalog:8080/api/v1/products  # should get 403 Forbidden

# List endpoints and their identities
kubectl exec -n kube-system ds/cilium -- cilium endpoint list

6. Hubble: Real-Time Network Observability

Hubble is Cilium's observability layer. It runs as a DaemonSet (hubble-relay aggregates flows cluster-wide) and exposes a gRPC API, a CLI, and a web UI.

6.1 Install the Hubble CLI

HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
curl -L --remote-name-all \
  "https://github.com/cilium/hubble/releases/download/${HUBBLE_VERSION}/hubble-linux-amd64.tar.gz"
tar xzvf hubble-linux-amd64.tar.gz
sudo mv hubble /usr/local/bin

6.2 Port-Forward to Hubble Relay

cilium hubble port-forward &
hubble status

6.3 Observe Live Flows

Watch all flows in the production namespace:

hubble observe --namespace production --follow

Watch only dropped flows (policy violations):

hubble observe --namespace production --verdict DROPPED --follow

Filter to a specific pod:

hubble observe \
  --namespace production \
  --pod frontend-6d8f9b-xxxx \
  --follow

Output a single flow in JSON for log ingestion:

hubble observe --namespace production --output json | head -5

6.4 Hubble UI

If you enabled the UI during installation, port-forward the UI service:

cilium hubble ui &
# Opens http://localhost:12000 in the browser

The Hubble UI shows a service dependency graph with live flow counts, latency, and drop rates between services. Each edge on the graph shows allowed/dropped counts and can be clicked to inspect individual flows.

6.5 Hubble Metrics in Prometheus

Cilium exposes Hubble metrics in Prometheus format. Add a ServiceMonitor to scrape them:

# hubble-service-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hubble
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: hubble
  endpoints:
    - port: hubble-metrics
      interval: 10s

Key metrics to alert on:

  • hubble_drop_total — total dropped packets by reason and direction
  • hubble_flows_processed_total — total flows (use for traffic baseline)
  • hubble_http_requests_total — HTTP request counts by method, protocol, and status

7. Service Mesh Without Sidecars: Mutual Authentication with Cilium

Traditional service meshes (Istio, Linkerd) achieve mutual TLS by injecting a sidecar proxy into every pod. Cilium achieves the same result using SPIFFE-based mutual authentication implemented entirely in eBPF and node-level proxies — no per-pod sidecar required.

7.1 Enable Mutual Authentication

In your Helm values:

authentication:
  mutual:
    spire:
      enabled: true
      install:
        enabled: true

Or update an existing installation:

helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set authentication.mutual.spire.enabled=true \
  --set authentication.mutual.spire.install.enabled=true

7.2 Require Mutual Authentication in Policy

# mutual-auth-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: require-mutual-auth
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: payments
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: checkout
      authentication:
        mode: required

With this policy, connections from checkout to payments will only be allowed if mutual SPIFFE authentication succeeds. Connections from workloads without a valid SPIFFE identity are dropped before any data is exchanged.


8. Transparent Encryption with WireGuard

Cilium's WireGuard integration encrypts all node-to-node traffic transparently. Individual pods require no configuration changes.

8.1 How It Works

When enabled, Cilium creates a WireGuard interface (cilium_wg0) on each node. All traffic leaving a node destined for a pod on another node is routed through this interface, encrypted using the WireGuard protocol (Curve25519 key exchange, ChaCha20-Poly1305 encryption). Keys are automatically rotated.

8.2 Enable WireGuard

During installation (as shown above) or via upgrade:

helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set encryption.enabled=true \
  --set encryption.type=wireguard

8.3 Verify Encryption

# Check WireGuard interface on a node
kubectl exec -n kube-system ds/cilium -- \
  wg show cilium_wg0

# Confirm encryption in Cilium status
kubectl exec -n kube-system ds/cilium -- \
  cilium status | grep -A3 Encryption

Expected output shows the WireGuard interface, peer count, and bytes transferred.

8.4 Encrypt Hubble Flows (Node-to-Relay)

Hubble relay traffic between nodes is also encrypted when WireGuard is enabled, because it transits the same node-to-node path. For relay-to-client TLS, generate a certificate and reference it:

helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set hubble.tls.enabled=true \
  --set hubble.tls.auto.enabled=true \
  --set hubble.tls.auto.method=certmanager \
  --set hubble.tls.auto.certManagerIssuerRef.group=cert-manager.io \
  --set hubble.tls.auto.certManagerIssuerRef.kind=ClusterIssuer \
  --set hubble.tls.auto.certManagerIssuerRef.name=cluster-ca

9. Egress Gateway: Controlling Outbound Traffic

In many enterprises, all outbound internet traffic must egress through a fixed IP for firewall allowlisting. Cilium's egress gateway feature routes pod traffic through a designated gateway node with a stable external IP, without requiring NAT rules in every pod or a separate NAT gateway appliance.

9.1 Label the Gateway Node

kubectl label node worker-node-1 egress-gateway=true

9.2 Create an EgressGatewayPolicy

# egress-gateway-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumEgressGatewayPolicy
metadata:
  name: data-pipeline-egress
spec:
  selectors:
    - podSelector:
        matchLabels:
          app: data-exporter
        namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: data-pipeline
  destinationCIDRs:
    - 0.0.0.0/0
  egressGateway:
    nodeSelector:
      matchLabels:
        egress-gateway: "true"
    egressIP: 203.0.113.50   # The stable external IP assigned to the gateway node
kubectl apply -f egress-gateway-policy.yaml

After applying this policy, all traffic from data-exporter pods in the data-pipeline namespace destined for the internet will be SNATed to 203.0.113.50 when leaving the cluster, regardless of which node the pod runs on.

9.3 Verify Egress Gateway

From a data-exporter pod:

kubectl exec -n data-pipeline deploy/data-exporter -- \
  curl -s https://ifconfig.me
# Should print 203.0.113.50

10. Cluster Mesh: Multi-Cluster Connectivity

Cilium Cluster Mesh connects multiple Kubernetes clusters so that pods in one cluster can reach services in another cluster using their normal Kubernetes service DNS names, with full network policy enforcement.

10.1 Enable Cluster Mesh on Each Cluster

Each cluster needs a unique name and cluster ID (1–255):

# On cluster-1
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set cluster.name=cluster-1 \
  --set cluster.id=1 \
  --set clustermesh.useAPIServer=true

# On cluster-2
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set cluster.name=cluster-2 \
  --set cluster.id=2 \
  --set clustermesh.useAPIServer=true

10.2 Connect the Clusters

# Use the cilium CLI with kubeconfig contexts for both clusters
cilium clustermesh connect \
  --context cluster-1 \
  --destination-context cluster-2

cilium clustermesh status --context cluster-1 --wait

10.3 Share a Service Across Clusters

Annotate a service to make it globally available:

# shared-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: payments
  namespace: production
  annotations:
    service.cilium.io/global: "true"
    service.cilium.io/shared: "true"
spec:
  selector:
    app: payments
  ports:
    - port: 8080
      targetPort: 8080

After applying this in both clusters, DNS resolution of payments.production.svc.cluster.local in either cluster will return endpoints from both clusters, with load balancing and failover handled by Cilium.


11. Writing a Simple eBPF Program in C with libbpf

To understand what Cilium does under the hood, it helps to write a minimal eBPF program yourself. This example counts packets on a network interface using a TC (traffic control) hook.

11.1 Install Build Dependencies

sudo apt-get install -y \
  clang llvm \
  libbpf-dev \
  linux-headers-$(uname -r) \
  bpftool

11.2 The eBPF Program (C)

// packet_counter.bpf.c
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <linux/pkt_cls.h>

// Define a per-CPU array map to count packets (one entry)
struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, __u64);
} packet_count SEC(".maps");

SEC("tc")
int count_packets(struct __sk_buff *skb)
{
    __u32 key = 0;
    __u64 *count;

    count = bpf_map_lookup_elem(&packet_count, &key);
    if (count)
        __sync_fetch_and_add(count, 1);

    return TC_ACT_OK;  // Pass the packet through
}

char _license[] SEC("license") = "GPL";

11.3 Compile the Program

clang -O2 -g -target bpf \
  -I/usr/include/$(uname -m)-linux-gnu \
  -c packet_counter.bpf.c \
  -o packet_counter.bpf.o

11.4 The Loader (C with libbpf)

// loader.c
#include <stdio.h>
#include <unistd.h>
#include <bpf/libbpf.h>
#include <bpf/bpf.h>
#include <net/if.h>

int main(int argc, char **argv)
{
    const char *iface = argc > 1 ? argv[1] : "eth0";
    struct bpf_object *obj;
    struct bpf_program *prog;
    int prog_fd, map_fd, ifindex;
    __u32 key = 0;
    __u64 count[16] = {};  // per-CPU values (up to 16 CPUs for demo)

    // Load and verify the BPF object
    obj = bpf_object__open_file("packet_counter.bpf.o", NULL);
    if (libbpf_get_error(obj)) {
        fprintf(stderr, "Failed to open BPF object\n");
        return 1;
    }
    if (bpf_object__load(obj)) {
        fprintf(stderr, "Failed to load BPF object\n");
        return 1;
    }

    prog = bpf_object__find_program_by_name(obj, "count_packets");
    prog_fd = bpf_program__fd(prog);

    ifindex = if_nametoindex(iface);
    if (!ifindex) {
        fprintf(stderr, "Unknown interface: %s\n", iface);
        return 1;
    }

    // Attach to TC ingress using tc_bpf API
    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook,
        .ifindex = ifindex,
        .attach_point = BPF_TC_INGRESS);
    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts,
        .handle = 1, .priority = 1, .prog_fd = prog_fd);

    bpf_tc_hook_create(&hook);
    bpf_tc_attach(&hook, &opts);

    map_fd = bpf_object__find_map_fd_by_name(obj, "packet_count");

    printf("Counting packets on %s. Press Ctrl+C to stop.\n", iface);
    while (1) {
        sleep(1);
        bpf_map_lookup_elem(map_fd, &key, count);
        __u64 total = 0;
        for (int i = 0; i < 16; i++) total += count[i];
        printf("Packets received: %llu\n", total);
    }

    return 0;
}
gcc -O2 -o loader loader.c -lbpf
sudo ./loader eth0

You should see packet counts increment each second. This is exactly the kind of data-plane logic that Cilium implements — except Cilium's BPF programs enforce policy, perform NAT, and export flow telemetry in the same pipeline.


12. Troubleshooting with cilium-dbg and Hubble

12.1 cilium-dbg: The Built-In Debugger

Starting with Cilium 1.14, the cilium-dbg binary (inside the Cilium agent pod) provides detailed diagnostics:

# Shell into a Cilium agent pod
kubectl exec -it -n kube-system ds/cilium -- bash

# Check overall health
cilium-dbg status --verbose

# Inspect a specific endpoint (pod)
cilium-dbg endpoint list
cilium-dbg endpoint get <ENDPOINT_ID>

# Dump the BPF policy map for an endpoint
cilium-dbg bpf policy get <ENDPOINT_ID>

# Check service BPF map (replaces iptables)
cilium-dbg bpf lb list

# Inspect connection tracking table
cilium-dbg bpf ct list global

# Trace a packet policy decision
cilium-dbg policy trace \
  --src-k8s-pod production/frontend-pod-xxx \
  --dst-k8s-pod production/catalog-pod-yyy \
  --dport 8080 \
  --protocol tcp

The policy trace command is invaluable: it simulates a packet and tells you exactly which policy rule allows or denies it, without sending any real traffic.

12.2 Common Issues and Fixes

Pods cannot communicate after Cilium installation:

# Check if Cilium agent is running on every node
kubectl get pods -n kube-system -l k8s-app=cilium -o wide

# Check agent logs for errors
kubectl logs -n kube-system ds/cilium --tail=50

# Run the built-in connectivity test
cilium connectivity test --test-namespace cilium-test

Policy drops appearing in Hubble but policy looks correct:

# Observe drops with full context
hubble observe --verdict DROPPED --output json | jq '.flow | {
  src: .source,
  dst: .destination,
  reason: .drop_reason_desc,
  policy: .policy_match_reason
}'

The drop_reason_desc field tells you whether the drop was a policy violation, an encryption failure, a missing identity, or a conntrack issue.

WireGuard handshake failures:

kubectl exec -n kube-system ds/cilium -- wg show cilium_wg0
# Check "latest handshake" for each peer — if >3 minutes ago, handshake failed
# Check node firewall allows UDP 51871 between nodes

High CPU from Cilium agent:

# Check BPF map pressure
kubectl exec -n kube-system ds/cilium -- \
  cilium-dbg bpf map list
# Look for maps near their max_entries limit — this causes lookup fallback to user space

13. Limitations and When NOT to Use Cilium

Cilium is powerful, but it is not the right choice in every situation.

Kernel version requirements. Cilium's full feature set requires Linux kernel 5.10+. Many enterprise Linux distributions running kernel 4.x (RHEL 7, Ubuntu 18.04) are not supported. Always check the Cilium system requirements before adopting.

Complexity of debugging. eBPF-based data planes are harder to debug than iptables. When something goes wrong, you need familiarity with BPF maps, bpftool, and Cilium's internal identity model. Teams without this expertise may spend significant time on incidents.

Windows nodes. Cilium does not support Windows worker nodes. Clusters with Windows workloads must use a different CNI (Calico, Flannel) for Windows nodes. Linux nodes in the same cluster can run Cilium with a hybrid CNI configuration.

Very small clusters. For clusters with fewer than 10 nodes and fewer than 100 services, the performance advantages of eBPF over iptables are negligible. The operational overhead of Cilium may not be justified. Flannel or Canal may be simpler choices.

ARM64 in restricted environments. Cilium supports arm64, but some eBPF program types have limited support on arm64 with older kernels. Test thoroughly before deploying to arm64 production nodes running kernels below 5.15.

Managed Kubernetes node access. GKE Autopilot, EKS Fargate, and AKS virtual nodes do not give you access to the underlying node OS. Cilium requires node-level access to load BPF programs. On these platforms you must use the managed CNI (GKE Dataplane V2 for GKE, which is itself Cilium-based, but managed).

L7 policy performance. L7 HTTP/gRPC policies are enforced via an Envoy instance embedded in the Cilium agent, not via pure eBPF. For very high-throughput paths (>100K RPS per pod), L7 policy enforcement adds measurable latency. Use L4 policies on hot paths and reserve L7 policies for control-plane or management traffic.


14. FAQ

Q: Does Cilium replace my service mesh (Istio/Linkerd) entirely?

A: For many use cases, yes. Cilium provides mutual authentication, transparent encryption, L7 policy, and rich observability without sidecars. However, Istio and Linkerd offer additional features: traffic management (retries, circuit breaking, canary deployments via weight-based routing), more mature mTLS certificate management workflows, and broader ecosystem integrations. Evaluate based on your specific requirements. Many teams use Cilium as the CNI and drop Istio entirely; others run Cilium with Istio's control plane but disable the Istio CNI in favor of Cilium's.

Q: Can I migrate from Calico/Flannel to Cilium without downtime?

A: In-place CNI migration without downtime is not officially supported and is operationally risky. The recommended approach is to provision a new node pool with Cilium, migrate workloads, and drain/delete the old node pool. Some cloud providers (EKS, GKE) support CNI migration via their node group replacement mechanisms.

Q: How does Cilium handle IPv6?

A: Cilium has first-class dual-stack IPv4/IPv6 support. Enable it during installation:

helm install cilium cilium/cilium \
  --set ipv6.enabled=true \
  --set tunnel=disabled \
  --set autoDirectNodeRoutes=true

Q: Is Cilium production-ready?

A: Yes. Cilium is CNCF Graduated, used by Google (GKE Dataplane V2), AWS (EKS Anywhere), Azure (AKS CNI Powered by Cilium), Bell Canada, Adobe, and thousands of other organizations. It has a strong security track record and a dedicated security response team.

Q: What is the difference between Cilium Enterprise and open-source Cilium?

A: Isovalent (acquired by Cisco in 2024) offers Cilium Enterprise with additional features: advanced RBAC for Hubble, compliance reporting, support SLAs, and integrations with SIEM platforms. The open-source version is fully functional for the vast majority of use cases.

Q: Can I use Cilium with Helm on OpenShift?

A: OpenShift uses OVN-Kubernetes as its default CNI and has deep integration with it. Replacing it with Cilium on OpenShift requires additional steps and is not officially supported by Red Hat. Cilium can run on OpenShift in "secondary CNI" mode or in specific configurations, but this is an advanced setup.


Sources

Leonardo Lazzaro

Software engineer and technical writer. 10+ years experience in DevOps, Python, and Linux systems.

More articles by Leonardo Lazzaro