Linux Namespaces // Cheatsheet

kernel 4.6+ · util-linux · iproute2 · cgroups v2
deep dive · whitepaper · github.com/hed0rah/namespaces-fun

The Eight Types

CLONE flag · unshare flag · since
typeCLONE_NEW*-Xkernel
mntNS-m2.4.19
utsUTS-u2.6.19
ipcIPC-i2.6.19
pidPID-p2.6.24
netNET-n2.6.29
userUSER-U3.8
cgroupCGROUP-C4.6
timeTIME-T5.6

all eight + cgroups + image = container

Three Syscalls

how userspace touches namespaces
clonefork into new ns. runc, crun. unsharemove self into new ns. unshare(1). setnsjoin existing ns by fd. nsenter(1).
# inspect via /proc
$ ls /proc/$$/ns/
cgroup ipc mnt net pid pid_for_children
time time_for_children user uts

$ readlink /proc/$$/ns/pid
pid:[4026531836]
# equal inode = same namespace

unshare // create

make new ns and exec a program
# UTS only - change hostname
$ sudo unshare --uts bash

# PID ns needs --fork (and --mount-proc)
$ sudo unshare --pid --fork --mount-proc bash

# blank network stack (only lo, DOWN)
$ sudo unshare --net bash

# the works - basically a container
$ sudo unshare --pid --uts --mount --ipc \
       --net --cgroup --fork bash

# rootless: user ns first, no sudo
$ unshare --user --map-root-user bash

nsenter // join

enter an existing namespace
# every namespace of PID 12345
$ sudo nsenter --target 12345 --all bash

# just the network ns (great for tcpdump)
$ sudo nsenter --target 12345 --net bash

# a docker container without docker exec
$ PID=$(docker inspect -f \
       '{{.State.Pid}}' mybox)
$ sudo nsenter --target $PID --all bash

# works even if dockerd is wedged

ip netns // persist

named network namespaces
$ ip netns add myns
$ ip netns list
$ ip netns exec myns bash
$ ip netns del myns

# veth pair: virtual cable
$ ip link add v0 type veth peer name v1
$ ip link set v1 netns myns
$ ip addr add 10.0.0.1/24 dev v0
$ ip link set v0 up
$ ip netns exec myns ip addr add \
       10.0.0.2/24 dev v1
$ ip netns exec myns ip link set v1 up

nsm CLI // this repo

/usr/local/bin/nsm after install
$ nsm list                 # all ns on host
$ nsm tree                 # by type
$ nsm inspect $(pidof nginx)
$ nsm diff 1 $$            # vs init
$ nsm ps --ns-type net     # grouped
$ nsm monitor              # live events

# named ns (under /run/nsm/)
$ sudo nsm create mybox --type net
$ sudo nsm enter mybox
$ sudo nsm exec mybox -- ip addr
$ sudo nsm destroy mybox

/proc // everything

where namespaces show up
/proc/PID/ns/8 magic symlinks per process /proc/PID/statusNSpid: in-ns then host PID /proc/PID/uid_mapuser ns mapping /proc/PID/cgroupcgroup path (virtualised by cgroup ns) /var/run/netns/ip netns bind-mount targets /run/nsm/<n>/nsm-managed ns metadata
# two PIDs for one process in a new pid ns
$ grep NSpid /proc/12345/status
NSpid:    12345    1

Inspection // lsns

discover namespaces system-wide
$ sudo lsns                       # all
$ sudo lsns -t net                # filter
$ sudo lsns -t pid -o NS,PID,COMMAND

# find isolated processes (likely containers)
$ for d in /proc/[0-9]*/; do
    p=$(basename $d)
    [ "$(readlink $d/ns/pid 2>/dev/null)" \
      != "$(readlink /proc/1/ns/pid)" ] \
      && echo "$p in non-init pid ns"
  done

Container // from scratch

demos/07-mini-container.sh

Production also: drop caps, seccomp, apparmor/selinux, CNI for net.

Pitfalls // gotchas

ways to make namespaces hurt

Recipes // copy-paste

things you actually do
# enter a docker container's network
$ P=$(docker inspect -f '{{.State.Pid}}' X)
$ sudo nsenter --target $P --net bash

# NAT a netns to the world
$ sysctl -w net.ipv4.ip_forward=1
$ iptables -t nat -A POSTROUTING \
    -s 10.0.0.0/24 -j MASQUERADE
$ ip netns exec myns ip route add \
    default via 10.0.0.1

# cgroup v2 limits (no namespace involved)
$ sudo mkdir /sys/fs/cgroup/mybox
$ echo 67108864 | sudo tee \
    /sys/fs/cgroup/mybox/memory.max
$ echo $$ | sudo tee \
    /sys/fs/cgroup/mybox/cgroup.procs

Find & Debug // compare

when something is in the wrong ns
# is this process isolated from init?
$ for ns in /proc/$PID/ns/*; do
    t=$(basename $ns)
    a=$(readlink $ns)
    b=$(readlink /proc/1/ns/$t)
    [ "$a" != "$b" ] && echo "ISOL: $t"
  done

# shorthand via nsm
$ nsm diff 1 $PID

# watch for short-lived ns (runtimes spawn them)
$ sudo nsm monitor
NAMESPACES-FUN // CHEATSHEET // deep dive · whitepaper github.com/hed0rah/namespaces-fun