eBPF is a sandboxed virtual machine inside the Linux kernel. It lets you run custom bytecode at kernel hook points without modifying kernel source or loading kernel modules. Programs are verified at load time for safety (no infinite loops, no invalid memory access), then JIT-compiled to native machine code.
The "extended" part matters: classic BPF (cBPF, 1992) was a simple packet filter with 2 registers and 32-bit ops. eBPF (2014+) is a general-purpose in-kernel execution engine with 11 registers, 64-bit ops, maps (key-value stores), helper functions, and tail calls.
Programs attach to hook points: kprobes (any kernel function), uprobes (any userspace function), tracepoints (stable kernel instrumentation), XDP (network driver ingress), tc (traffic control), cgroup hooks, LSM hooks, and more. Each hook type defines what context the program receives and what it can do.
Why this matters
Before eBPF, if you wanted custom kernel instrumentation you had two options: write a kernel module (dangerous, version-specific, can crash the system) or modify the kernel source and recompile. eBPF gives you the same power with safety guarantees enforced by the verifier.
Kernel Execution Pipeline
USERSPACEBCC Python/Go or bpftrace or libbpf CO-RE
| | |
| C source code | bpftrace script | pre-compiled .o
| | | | | |
v v v v v v
LLVM/Clang compiles to BPF bytecode (or loaded directly)
|
v
----------- bpf() syscall ------------------------------------
|
KERNEL
v
VERIFIER -- rejects unsafe programs (bounds checks, no loops*,
| stack depth <= 512 bytes, no null derefs)
v
JIT COMPILER -- bytecode -> native (x86_64, arm64, etc.)
|
v
ATTACH to hook point (kprobe, tracepoint, XDP, etc.)
|
v
BPF MAPS <-- shared state between kernel prog & userspace
(hash, array, ringbuf, perf_event, LRU, stack_trace, ...)
|
v
Userspace reads maps / perf buffer / ring buffer for output
* bounded loops allowed since kernel 5.3 (verifier proves termination)
Step 1
Write C
BPF program source
→
Step 2
Compile
LLVM -> BPF bytecode
→
Step 3
bpf() syscall
load into kernel
→
Step 4
Verify
safety proof
→
Step 5
JIT
native machine code
→
Step 6
Attach
kprobe / XDP / etc
Core Concepts
Core Concept
BPF Verifier
Every BPF program passes through the verifier before execution. It walks all possible code paths and proves:
1. No unreachable instructions
2. No out-of-bounds memory access
3. All branches terminate (DAG check)
4. Stack usage <= 512 bytes
5. Only allowed helper functions called
6. R0 contains valid return before exit
7. Map pointers null-checked before deref
The verifier is your biggest enemy when writing BPF C. R1 invalid mem access 'scalar' means you tried to dereference a raw pointer without bpf_probe_read(). Every struct field read from a kernel pointer in kprobe context needs an explicit probe read.
Core Concept
BPF Maps
Maps are the shared data structures between kernel-side BPF programs and userspace. They persist across BPF program invocations and are the primary mechanism for collecting data.
BPF_MAP_TYPE_HASHgeneral k/v store
BPF_MAP_TYPE_ARRAYfixed-size indexed
BPF_MAP_TYPE_PERF_EVENT_ARRAYper-cpu event stream
BPF_MAP_TYPE_RINGBUFshared ring (5.8+)
BPF_MAP_TYPE_LRU_HASHauto-evicting cache
BPF_MAP_TYPE_STACK_TRACEcall stack capture
BPF_MAP_TYPE_PERCPU_HASHlock-free per-cpu
In BCC Python: BPF_HASH(counts) declares a hash map. BPF_PERF_OUTPUT(events) declares a perf buffer. Userspace polls with b.perf_buffer_poll().
Where BPF Programs Attach
kprobe / kretprobeany kernel function entry/return
uprobe / uretprobeany userspace function entry/return
tracepointstable kernel instrumentation points
raw_tracepointlower overhead, raw args
fentry / fexitBTF-based, zero overhead (5.5+)
perf_eventPMC sampling, CPU profiling
kprobe vs tracepoint
kprobes can hook ANY kernel function but the interface can change between kernel versions. Tracepoints are stable across versions but only exist where kernel devs placed them. Use tracepoints when available, kprobes when you need to reach deeper.
XDPdriver-level packet processing (fastest)
tc (clsact)traffic control ingress/egress
sk_filtersocket-level packet filtering
sk_msg / sk_skbsockmap redirect (service mesh)
lwtlightweight tunnel encap/decap
cgroup/sock*per-cgroup network policy
XDP performance
XDP runs before the kernel allocates an skb (socket buffer). This means it can process packets at line rate -- Cloudflare uses it to mitigate multi-Tbps DDoS attacks. Actions: XDP_PASS, XDP_DROP, XDP_TX (bounce back), XDP_REDIRECT.
LSMsecurity module hooks (5.7+)
seccompsyscall filtering
cgroup/devicedevice access control
cgroup/sysctlsysctl interposition
LSM + eBPF
LSM hooks let BPF enforce security policy at the same kernel points as SELinux/AppArmor -- but dynamically, without rebooting or recompiling policy. Cilium uses this for Kubernetes network policy enforcement at the kernel level.
BCC (BPF Compiler Collection)
BCC provides a Python (and Lua) frontend for writing BPF programs. C code is embedded as a string, compiled at runtime via LLVM, and loaded into the kernel. This is the fastest path from zero to working BPF instrumentation but has trade-offs.
BCC's rewriter gotcha
The C string is compiled by LLVM into BPF bytecode at runtime. BCC's rewriter automatically converts struct member access into bpf_probe_read() calls -- but it misses some cases (like ntohs(sk->field)), which is why you hit verifier errors and need explicit bpf_probe_read_kernel().
Reference
bpf_get_current_pid_tgid()pid in upper 32 bits
bpf_get_current_comm(buf, sz)process name (16 char max)
bpf_ktime_get_ns()monotonic nanoseconds
bpf_probe_read_kernel(dst, sz, src)safe kernel mem read
bpf_probe_read_user(dst, sz, src)safe userspace mem read
R1 invalid mem access 'scalar'
Dereferencing a kernel pointer directly.
Fix: use bpf_probe_read_kernel()
BPF stack limit exceeded (512 bytes)
Struct too large for BPF stack.
Fix: use BPF_PERCPU_ARRAY as scratch space
back-edge from insn X to Y
Unbounded loop detected (pre-5.3 kernels).
Fix: #pragma unroll, or bounded for-loops on 5.3+
invalid indirect read from stack
Reading uninitialized stack memory.
Fix: zero-init structs: struct foo bar = {};
cannot pass map_value to helper
Map value pointer used where scalar expected.
Fix: dereference into local var first
btf_vmlinux is malformed
Kernel BTF data missing or broken.
Fix: install linux-headers or build kernel
with CONFIG_DEBUG_INFO_BTF=y
Toolchains Compared
BCC
pythonluaruntime compile
Inline C compiled at load time. Best for prototyping and ad-hoc investigation. Requires LLVM + kernel headers on target.
startupslow (LLVM compile)
portabilitykernel-version tied
prototypingfastest iteration
libbpf + CO-RE
cgorustpre-compiled
Compile Once, Run Everywhere. BPF programs compiled ahead of time with clang. BTF provides struct layout info at runtime. Production standard.
startupinstant (pre-compiled)
portabilitycross-kernel (BTF)
prototypingslower iteration
bpftrace
awk-like DSLone-liners
High-level tracing language inspired by DTrace/awk. One-liners for quick investigation. Compiles to BPF under the hood.
startupmoderate
portabilityneeds BTF or headers
prototypingone-liner speed
Cilium eBPF Go Library
cilium/ebpf is the Go library for loading and managing pre-compiled BPF programs. It's the production approach: write BPF C, compile with clang to .o, then load/attach/read from Go. No LLVM on the target machine. CO-RE + BTF handles cross-kernel portability.
Workflow:
1. Write BPF C program (probe.bpf.c)
2. Compile: clang -O2 -target bpf -c probe.bpf.c -o probe.bpf.o
3. Generate Go bindings: bpf2go or manual
4. Load from Go: ebpf.LoadCollection() / link.Kprobe()
5. Read maps: map.Lookup() / map.Iterate()
This is what Cilium itself uses for Kubernetes network policy, service mesh, and observability (Hubble). Also used by Cloudflare, Meta, and Netflix for production eBPF tooling.