eBPF

Extended Berkeley Packet Filter // Kernel-level programmability // Linux 3.15+

What eBPF Actually Is

eBPF is a sandboxed virtual machine inside the Linux kernel. It lets you run custom bytecode at kernel hook points without modifying kernel source or loading kernel modules. Programs are verified at load time for safety (no infinite loops, no invalid memory access), then JIT-compiled to native machine code.

The "extended" part matters: classic BPF (cBPF, 1992) was a simple packet filter with 2 registers and 32-bit ops. eBPF (2014+) is a general-purpose in-kernel execution engine with 11 registers, 64-bit ops, maps (key-value stores), helper functions, and tail calls.

Programs attach to hook points: kprobes (any kernel function), uprobes (any userspace function), tracepoints (stable kernel instrumentation), XDP (network driver ingress), tc (traffic control), cgroup hooks, LSM hooks, and more. Each hook type defines what context the program receives and what it can do.

Why this matters Before eBPF, if you wanted custom kernel instrumentation you had two options: write a kernel module (dangerous, version-specific, can crash the system) or modify the kernel source and recompile. eBPF gives you the same power with safety guarantees enforced by the verifier.

Kernel Execution Pipeline

USERSPACE BCC Python/Go or bpftrace or libbpf CO-RE | | | | C source code | bpftrace script | pre-compiled .o | | | | | | v v v v v v LLVM/Clang compiles to BPF bytecode (or loaded directly) | v ----------- bpf() syscall ------------------------------------ | KERNEL v VERIFIER -- rejects unsafe programs (bounds checks, no loops*, | stack depth <= 512 bytes, no null derefs) v JIT COMPILER -- bytecode -> native (x86_64, arm64, etc.) | v ATTACH to hook point (kprobe, tracepoint, XDP, etc.) | v BPF MAPS <-- shared state between kernel prog & userspace (hash, array, ringbuf, perf_event, LRU, stack_trace, ...) | v Userspace reads maps / perf buffer / ring buffer for output * bounded loops allowed since kernel 5.3 (verifier proves termination)
Step 1
Write C
BPF program source
Step 2
Compile
LLVM -> BPF bytecode
Step 3
bpf() syscall
load into kernel
Step 4
Verify
safety proof
Step 5
JIT
native machine code
Step 6
Attach
kprobe / XDP / etc

Core Concepts

Core Concept
BPF Verifier

Every BPF program passes through the verifier before execution. It walks all possible code paths and proves:

1. No unreachable instructions 2. No out-of-bounds memory access 3. All branches terminate (DAG check) 4. Stack usage <= 512 bytes 5. Only allowed helper functions called 6. R0 contains valid return before exit 7. Map pointers null-checked before deref

The verifier is your biggest enemy when writing BPF C. R1 invalid mem access 'scalar' means you tried to dereference a raw pointer without bpf_probe_read(). Every struct field read from a kernel pointer in kprobe context needs an explicit probe read.

Core Concept
BPF Maps

Maps are the shared data structures between kernel-side BPF programs and userspace. They persist across BPF program invocations and are the primary mechanism for collecting data.

BPF_MAP_TYPE_HASHgeneral k/v store
BPF_MAP_TYPE_ARRAYfixed-size indexed
BPF_MAP_TYPE_PERF_EVENT_ARRAYper-cpu event stream
BPF_MAP_TYPE_RINGBUFshared ring (5.8+)
BPF_MAP_TYPE_LRU_HASHauto-evicting cache
BPF_MAP_TYPE_STACK_TRACEcall stack capture
BPF_MAP_TYPE_PERCPU_HASHlock-free per-cpu

In BCC Python: BPF_HASH(counts) declares a hash map. BPF_PERF_OUTPUT(events) declares a perf buffer. Userspace polls with b.perf_buffer_poll().

Where BPF Programs Attach

kprobe / kretprobeany kernel function entry/return
uprobe / uretprobeany userspace function entry/return
tracepointstable kernel instrumentation points
raw_tracepointlower overhead, raw args
fentry / fexitBTF-based, zero overhead (5.5+)
perf_eventPMC sampling, CPU profiling
kprobe vs tracepoint kprobes can hook ANY kernel function but the interface can change between kernel versions. Tracepoints are stable across versions but only exist where kernel devs placed them. Use tracepoints when available, kprobes when you need to reach deeper.
XDPdriver-level packet processing (fastest)
tc (clsact)traffic control ingress/egress
sk_filtersocket-level packet filtering
sk_msg / sk_skbsockmap redirect (service mesh)
lwtlightweight tunnel encap/decap
cgroup/sock*per-cgroup network policy
XDP performance XDP runs before the kernel allocates an skb (socket buffer). This means it can process packets at line rate -- Cloudflare uses it to mitigate multi-Tbps DDoS attacks. Actions: XDP_PASS, XDP_DROP, XDP_TX (bounce back), XDP_REDIRECT.
LSMsecurity module hooks (5.7+)
seccompsyscall filtering
cgroup/devicedevice access control
cgroup/sysctlsysctl interposition
LSM + eBPF LSM hooks let BPF enforce security policy at the same kernel points as SELinux/AppArmor -- but dynamically, without rebooting or recompiling policy. Cilium uses this for Kubernetes network policy enforcement at the kernel level.

BCC (BPF Compiler Collection)

BCC provides a Python (and Lua) frontend for writing BPF programs. C code is embedded as a string, compiled at runtime via LLVM, and loaded into the kernel. This is the fastest path from zero to working BPF instrumentation but has trade-offs.

Advantages

rapid prototypingwrite, run, iterate
rich python APImap access, formatting
100+ built-in toolsproduction-ready
auto struct offsetsreads kernel headers

Trade-offs

requires LLVM + headerson every target
compile at runtimeslow startup on ARM
not portabletied to kernel version
heavy deps~300MB installed

Anatomy of a BCC Program

BCC Python
#!/usr/bin/env python3
from bcc import BPF

# ---- KERNEL SIDE (C) ----
prog = """
struct event_t {
    u32 pid;
    char comm[16];
    char fname[64];
    u64 size;
};

BPF_PERF_OUTPUT(events);              // declare perf buffer

int trace_read_entry(struct pt_regs *ctx,
                     struct file *file,
                     char __user *buf,
                     size_t count) {
    struct event_t evt = {};
    evt.pid = bpf_get_current_pid_tgid() >> 32;
    bpf_get_current_comm(&evt.comm, sizeof(evt.comm));
    bpf_probe_read_kernel_str(&evt.fname, sizeof(evt.fname),
                               file->f_path.dentry->d_name.name);
    evt.size = count;
    events.perf_submit(ctx, &evt, sizeof(evt));
    return 0;
}
"""

# ---- USERSPACE SIDE (Python) ----
b = BPF(text=prog)                           # compile + load
b.attach_kprobe(event="vfs_read",
                fn_name="trace_read_entry") # attach to hook

def handle_event(cpu, data, size):         # callback for events
    evt = b["events"].event(data)
    print(f"{evt.pid}  {evt.comm}  {evt.fname}  {evt.size}")

b["events"].open_perf_buffer(handle_event)
while True:
    b.perf_buffer_poll()                     # blocks, calls handle_event
BCC's rewriter gotcha The C string is compiled by LLVM into BPF bytecode at runtime. BCC's rewriter automatically converts struct member access into bpf_probe_read() calls -- but it misses some cases (like ntohs(sk->field)), which is why you hit verifier errors and need explicit bpf_probe_read_kernel().

Reference

bpf_get_current_pid_tgid()pid in upper 32 bits
bpf_get_current_comm(buf, sz)process name (16 char max)
bpf_ktime_get_ns()monotonic nanoseconds
bpf_probe_read_kernel(dst, sz, src)safe kernel mem read
bpf_probe_read_user(dst, sz, src)safe userspace mem read
bpf_probe_read_kernel_str()read kernel string safely
bpf_map_lookup_elem(map, &key)read from map (returns ptr or NULL)
bpf_map_update_elem(map, &key, &val, flags)write to map
bpf_map_delete_elem(map, &key)remove from map
bpf_perf_event_output(ctx, map, flags, data, sz)send to perf buffer
bpf_trace_printk(fmt, ...)debug only (slow, limited)

C-side macros (in the embedded C string):

BPF_HASH(name, key_t, val_t)hash map
BPF_ARRAY(name, val_t, max)array map
BPF_PERF_OUTPUT(name)perf buffer
BPF_PERCPU_ARRAY(name)per-cpu array

Python-side API:

b.attach_kprobe(event, fn_name)hook kernel func
b.attach_kretprobe(event, fn)hook return
b.attach_uprobe(name, sym, fn)hook userspace func
b.attach_tracepoint(tp, fn)stable kernel hook
b["map"].open_perf_buffer(cb)register event callback
b.perf_buffer_poll()block + dispatch events
b["map"][key]read map entry directly
File I/O
vfs_read / vfs_writeall file reads/writes
vfs_openfile open
do_sys_openat2openat syscall impl
Networking
tcp_v4_connectoutbound TCP
tcp_finish_connecthandshake done
tcp_sendmsg / tcp_recvmsgTCP data xfer
tcp_closeteardown
udp_sendmsgUDP (DNS etc)
Process
__arm64_sys_execveexec (arm64)
__x64_sys_execveexec (x86_64)
do_exitprocess exit
wake_up_new_taskfork/clone

BPF Verifier Errors & Fixes

R1 invalid mem access 'scalar' Dereferencing a kernel pointer directly. Fix: use bpf_probe_read_kernel()
BPF stack limit exceeded (512 bytes) Struct too large for BPF stack. Fix: use BPF_PERCPU_ARRAY as scratch space
back-edge from insn X to Y Unbounded loop detected (pre-5.3 kernels). Fix: #pragma unroll, or bounded for-loops on 5.3+
invalid indirect read from stack Reading uninitialized stack memory. Fix: zero-init structs: struct foo bar = {};
cannot pass map_value to helper Map value pointer used where scalar expected. Fix: dereference into local var first
btf_vmlinux is malformed Kernel BTF data missing or broken. Fix: install linux-headers or build kernel with CONFIG_DEBUG_INFO_BTF=y

Toolchains Compared

BCC
python lua runtime compile

Inline C compiled at load time. Best for prototyping and ad-hoc investigation. Requires LLVM + kernel headers on target.

startupslow (LLVM compile)
portabilitykernel-version tied
prototypingfastest iteration
libbpf + CO-RE
c go rust pre-compiled

Compile Once, Run Everywhere. BPF programs compiled ahead of time with clang. BTF provides struct layout info at runtime. Production standard.

startupinstant (pre-compiled)
portabilitycross-kernel (BTF)
prototypingslower iteration
bpftrace
awk-like DSL one-liners

High-level tracing language inspired by DTrace/awk. One-liners for quick investigation. Compiles to BPF under the hood.

startupmoderate
portabilityneeds BTF or headers
prototypingone-liner speed

Cilium eBPF Go Library

cilium/ebpf is the Go library for loading and managing pre-compiled BPF programs. It's the production approach: write BPF C, compile with clang to .o, then load/attach/read from Go. No LLVM on the target machine. CO-RE + BTF handles cross-kernel portability.

Workflow: 1. Write BPF C program (probe.bpf.c) 2. Compile: clang -O2 -target bpf -c probe.bpf.c -o probe.bpf.o 3. Generate Go bindings: bpf2go or manual 4. Load from Go: ebpf.LoadCollection() / link.Kprobe() 5. Read maps: map.Lookup() / map.Iterate()

This is what Cilium itself uses for Kubernetes network policy, service mesh, and observability (Hubble). Also used by Cloudflare, Meta, and Netflix for production eBPF tooling.

cilium/ebpf CO-RE BTF bpf2go kubernetes

bpftrace One-Liners

bpftrace
# count syscalls by process
$ bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

# trace file opens with path
$ bpftrace -e 'tracepoint:syscalls:sys_enter_openat
  { printf("%s %s\n", comm, str(args.filename)); }'

# histogram of read() sizes
$ bpftrace -e 'tracepoint:syscalls:sys_exit_read /args.ret > 0/
  { @bytes = hist(args.ret); }'

# TCP connect with dest IP
$ bpftrace -e 'kprobe:tcp_v4_connect { printf("%s -> %s\n", comm,
    ntop(((struct sock *)arg0)->__sk_common.skc_daddr)); }'

# slow syscalls (>1ms)
$ bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @start[tid] = nsecs; }
  tracepoint:raw_syscalls:sys_exit /@start[tid]/
  { $d = nsecs - @start[tid];
    if ($d > 1000000) { printf("%s %dms\n", comm, $d/1000000); }
    delete(@start[tid]); }'

eBPF Evolution

1992
BPF introducedSteven McCanne and Van Jacobson. Simple packet filter in BSD. 2 registers, 32-bit.
2014
eBPF merged (Linux 3.15-3.18)Alexei Starovoitov. 11 registers, 64-bit, maps, verifier, JIT. Initial use: networking.
2015
kprobes + tracing supportBPF programs can attach to kernel functions. BCC project starts at PLUMgrid.
2016
Cilium founded / XDP merged (4.8)Thomas Graf. eBPF-based K8s networking. XDP: BPF at the network driver level.
2018
BTF (BPF Type Format)Struct layout metadata enables CO-RE: compile once, run on different kernel versions.
2019
bpftrace 0.9 / bounded loops (5.3)High-level tracing goes mainstream. Verifier gains bounded loop support.
2020
LSM hooks (5.7) / fentry (5.5)BPF can enforce security policy. Zero-overhead function entry/exit tracing.
2023
eBPF Foundation / Windows eBPFLinux Foundation governance. Cross-platform eBPF runtime explorations.