A single-cluster, node-to-node encrypted overlay for Kubernetes. Each node runs one WireGuard tunnel to every other node; cross-node pod-to-pod traffic is steered into those tunnels by eBPF and encrypted in-kernel by WireGuard. A leader-elected controller distributes peers/keys/policy to per-node daemons over warm bidirectional gRPC streams, driven entirely by Custom Resources. State lives only in the Kubernetes API — no Kafka, no external Redis.
Implements the contract in
CLAUDE.md. Section references below (e.g. §8) point at that document.
Quick links: Docs site (make docs → a landing page + docs UI) ·
Helm chart (make helm-install) ·
Dashboards — built-in web UI (--dashboard-addr, default :8082) and the
meshtop terminal view (make build-meshtop).
CONTROLLER (Deployment, 2 replicas, leader-elected)
- informer event handlers on CRDs + Node + Pod (event-driven)
- warm gRPC stream to every daemon (snapshot + resync)
- internal cron: zero-downtime 3-phase key rotation
^ ^
warm gRPC | | warm gRPC (bidi)
+-----+------+ +-------+-----+
| DAEMON n1 | | DAEMON n2 | (DaemonSet 1/node)
| wg netlink | | wg netlink |
| eBPF tc(x) | | eBPF tc(x) |
+-----+------+ +-------+-----+
+------- WireGuard/UDP 51820 ---+
Control plane = userspace decides what should be true (peers, keys, routes, blocks). Data plane = kernel: WireGuard does crypto, eBPF does per-packet steering by reading maps. The only bridge is the daemon writing eBPF maps + configuring wg via netlink; kernel programs only ever read maps (§1, §14).
api/v1alpha1/ CRD Go types + generated deepcopy
proto/ mesh.proto (warm-stream protocol) + generated Go (meshpb)
internal/controller/ desiredstate.go (pure, tested) - informers - streamserver
- rotation - leader - manager - status
internal/daemon/ wg.go - routing.go (fwmark + MTU/MSS) - ebpf.go - streamclient.go
- daemon.go - *_bpfel.{go,o} (committed bpf2go output)
bpf/ classify.bpf.c - deliver.bpf.c - mesh_maps.h - headers/
cmd/{controller,daemon}/ entrypoints
deploy/ crds/ - namespace - rbac - controller-deployment - daemon-daemonset - samples
test/e2e/ kind-based acceptance harness (§12)
| Kind | Purpose |
|---|---|
VpbluNode |
a node's identity; daemon publishes only its public key into status |
TunnelPolicy |
FullMesh or Explicit peering, listen port, keepalive |
RoutePolicy |
which pod CIDR is reachable via which node's tunnel (overrides) |
BlockPolicy |
CIDR->CIDR drops + node Isolate/Drain (highest precedence) |
RotationPolicy |
cron schedule + overlap window for key rotation |
Precedence: BlockPolicy > RoutePolicy > TunnelPolicy > derived-from-Node — see
internal/controller/desiredstate.go, the pure, exhaustively table-tested heart
of the controller.
Prereqs: Go >= 1.24, and for regeneration: protoc, controller-gen, clang.
make generate # proto + deepcopy + CRDs + eBPF (bpf2go)
make build # ./build/controller, ./build/daemon
make test # unit tests incl. desiredstate precedence tables (§12.9)
make docker # vpblu-controller:latest, vpblu-daemon:latestThe compiled eBPF objects (internal/daemon/*_bpfel.o) are committed, so
go build works without clang; rerun make bpf only when the C changes.
make deploy # namespace + CRDs + RBAC + controller + daemon
make samples # FullMesh TunnelPolicy + RotationPolicyThe daemon needs privileged / CAP_NET_ADMIN+BPF+SYS_ADMIN, host network, the
host bpffs (/sys/fs/bpf), and a host path for the 0600 private key
(/var/lib/mesh). The daemon has no Kubernetes RBAC — the controller pushes
everything over the stream (§7.3).
WireGuard adds ~60–80 bytes of overhead. If wg-mesh keeps the node MTU, full
-size encrypted frames exceed the path MTU and are silently dropped (DF set)
while handshakes and small packets succeed — the mesh "works" until the first big
packet. The daemon therefore, in internal/daemon/routing.go:
- sets
wg-meshMTU =primaryMTU - overhead(default 80,--mtu-overhead), and - clamps TCP MSS on the wg path to
MTU - 40(best-effort iptables TCPMSS).
Acceptance §12.8 exercises this with ping -s 1472 -M do and large iperf3.
- Event-driven for latency: informer
Add/Update/Deletehandlers enqueue a recompute and push targeted commands. - Snapshot + resync for correctness: on every (re)connect and on a 5-min
periodic tick the controller pushes a full idempotent
SyncState; every command isgeneration-stamped so the daemon ignores stale/reordered ones.
This mirrors how real informer-based controllers behave. See
internal/controller/{informers,streamserver}.go and the genGuard in
internal/daemon/streamclient.go.
Controller cron drives Prepare -> Distribute -> Activate -> Retire per node,
one at a time: the daemon stages a new keypair while keeping the old active, every
peer is taught to accept both keys during the overlap window, the daemon switches,
then the old key is dropped and zeroized. Any failure before Retire rolls back —
the old key is still valid, so no packet drops. See internal/controller/rotation.go.
make e2e # 3-node kind cluster + Flannel base CNI, runs §12 checksmake bpf-verify (load the eBPF objects through the in-kernel verifier, §8.3) and
make e2e both need privileges and a real kernel — and because kind nodes are
containers that share the host kernel, running them on your workstation would load
our tc/eBPF programs and create WireGuard interfaces in your kernel. To keep the
host untouched, run them inside a throwaway VM whose kernel we discard afterward:
make kvm-validate # boots an Ubuntu cloud VM via KVM, provisions the toolchain,
# rsyncs the repo, then runs bpf-verify + e2e inside the guest
make kvm-down # power off + destroy the VMPure qemu + cloud-init (no libvirt, no host package installs). Because daemon
startup loads — and therefore verifies — the BPF, the e2e run covers bpf-verify
too. Iterate without rebooting: hack/kvm-validate.sh sync && hack/kvm-validate.sh e2e;
shell in with hack/kvm-validate.sh ssh; VM state lives under .kvm/ (gitignored).
Requires /dev/kvm, qemu-system-x86_64, qemu-img, genisoimage, ssh, rsync.
| # | Criterion | Where |
|---|---|---|
| 1 | kind 3-node deploy | test/e2e/run.sh, deploy/ |
| 2 | FullMesh peers form | desiredstate.go + streamserver/wg.go |
| 3 | only UDP/51820 on wire | routing.go fwmark steer + wg |
| 4 | BlockPolicy drops <1s | classify.bpf.c + ebpf.go SetBlocks |
| 5 | node delete drops peer | informers.go removeNodePeerEverywhere + SyncState |
| 6 | zero-drop rotation | rotation.go + wg.go Prepare/Activate/Retire |
| 7 | controller down -> dataplane lives | kernel wg + pinned maps; reconnect SyncState |
| 8 | large-packet test | routing.go MTU/MSS |
| 9 | desiredstate unit tables | desiredstate_test.go |
In: single-cluster node-to-node encryption, pod-IP L3 steering, blocks, rotation. Out (this milestone): cluster federation, a full CNI/IPAM, ClusterIP/kube-proxy replacement, Kafka/audit streaming.