vpblu — Encrypted Infra-Cluster Mesh

A single-cluster, node-to-node encrypted overlay for Kubernetes. Each node runs one WireGuard tunnel to every other node; cross-node pod-to-pod traffic is steered into those tunnels by eBPF and encrypted in-kernel by WireGuard. A leader-elected controller distributes peers/keys/policy to per-node daemons over warm bidirectional gRPC streams, driven entirely by Custom Resources. State lives only in the Kubernetes API — no Kafka, no external Redis.

Implements the contract in CLAUDE.md. Section references below (e.g. §8) point at that document.

Quick links: Docs site (make docs → a landing page + docs UI) · Helm chart (make helm-install) · Dashboards — built-in web UI (--dashboard-addr, default :8082) and the meshtop terminal view (make build-meshtop).

Architecture

            CONTROLLER (Deployment, 2 replicas, leader-elected)
            - informer event handlers on CRDs + Node + Pod   (event-driven)
            - warm gRPC stream to every daemon                (snapshot + resync)
            - internal cron: zero-downtime 3-phase key rotation
                      ^                              ^
        warm gRPC     |                              |   warm gRPC (bidi)
                +-----+------+                +-------+-----+
                | DAEMON n1  |                | DAEMON n2   |   (DaemonSet 1/node)
                | wg netlink |                | wg netlink  |
                | eBPF tc(x) |                | eBPF tc(x)  |
                +-----+------+                +-------+-----+
                      +------- WireGuard/UDP 51820 ---+

Control plane = userspace decides what should be true (peers, keys, routes, blocks). Data plane = kernel: WireGuard does crypto, eBPF does per-packet steering by reading maps. The only bridge is the daemon writing eBPF maps + configuring wg via netlink; kernel programs only ever read maps (§1, §14).

Repository layout

api/v1alpha1/        CRD Go types + generated deepcopy
proto/               mesh.proto (warm-stream protocol) + generated Go (meshpb)
internal/controller/ desiredstate.go (pure, tested) - informers - streamserver
                     - rotation - leader - manager - status
internal/daemon/     wg.go - routing.go (fwmark + MTU/MSS) - ebpf.go - streamclient.go
                     - daemon.go - *_bpfel.{go,o} (committed bpf2go output)
bpf/                 classify.bpf.c - deliver.bpf.c - mesh_maps.h - headers/
cmd/{controller,daemon}/   entrypoints
deploy/              crds/ - namespace - rbac - controller-deployment - daemon-daemonset - samples
test/e2e/            kind-based acceptance harness (§12)

CRDs (policy is data — §4)

Kind	Purpose
`VpbluNode`	a node's identity; daemon publishes only its public key into status
`TunnelPolicy`	`FullMesh` or `Explicit` peering, listen port, keepalive
`RoutePolicy`	which pod CIDR is reachable via which node's tunnel (overrides)
`BlockPolicy`	CIDR->CIDR drops + node `Isolate`/`Drain` (highest precedence)
`RotationPolicy`	cron schedule + overlap window for key rotation

Precedence: BlockPolicy > RoutePolicy > TunnelPolicy > derived-from-Node — see internal/controller/desiredstate.go, the pure, exhaustively table-tested heart of the controller.

Build

Prereqs: Go >= 1.24, and for regeneration: protoc, controller-gen, clang.

make generate     # proto + deepcopy + CRDs + eBPF (bpf2go)
make build        # ./build/controller, ./build/daemon
make test         # unit tests incl. desiredstate precedence tables (§12.9)
make docker       # vpblu-controller:latest, vpblu-daemon:latest

The compiled eBPF objects (internal/daemon/*_bpfel.o) are committed, so go build works without clang; rerun make bpf only when the C changes.

Deploy

make deploy        # namespace + CRDs + RBAC + controller + daemon
make samples       # FullMesh TunnelPolicy + RotationPolicy

The daemon needs privileged / CAP_NET_ADMIN+BPF+SYS_ADMIN, host network, the host bpffs (/sys/fs/bpf), and a host path for the 0600 private key (/var/lib/mesh). The daemon has no Kubernetes RBAC — the controller pushes everything over the stream (§7.3).

The MTU/MSS trap (§10 — read twice)

WireGuard adds ~60–80 bytes of overhead. If wg-mesh keeps the node MTU, full -size encrypted frames exceed the path MTU and are silently dropped (DF set) while handshakes and small packets succeed — the mesh "works" until the first big packet. The daemon therefore, in internal/daemon/routing.go:

sets wg-mesh MTU = primaryMTU - overhead (default 80, --mtu-overhead), and
clamps TCP MSS on the wg path to MTU - 40 (best-effort iptables TCPMSS).

Acceptance §12.8 exercises this with ping -s 1472 -M do and large iperf3.

Warm connection + correctness model

Event-driven for latency: informer Add/Update/Delete handlers enqueue a recompute and push targeted commands.
Snapshot + resync for correctness: on every (re)connect and on a 5-min periodic tick the controller pushes a full idempotent SyncState; every command is generation-stamped so the daemon ignores stale/reordered ones.

This mirrors how real informer-based controllers behave. See internal/controller/{informers,streamserver}.go and the genGuard in internal/daemon/streamclient.go.

Key rotation (zero-downtime, §9)

Controller cron drives Prepare -> Distribute -> Activate -> Retire per node, one at a time: the daemon stages a new keypair while keeping the old active, every peer is taught to accept both keys during the overlap window, the daemon switches, then the old key is dropped and zeroized. Any failure before Retire rolls back — the old key is still valid, so no packet drops. See internal/controller/rotation.go.

End-to-end

make e2e          # 3-node kind cluster + Flannel base CNI, runs §12 checks

Validating on a disposable KVM guest

make bpf-verify (load the eBPF objects through the in-kernel verifier, §8.3) and make e2e both need privileges and a real kernel — and because kind nodes are containers that share the host kernel, running them on your workstation would load our tc/eBPF programs and create WireGuard interfaces in your kernel. To keep the host untouched, run them inside a throwaway VM whose kernel we discard afterward:

make kvm-validate   # boots an Ubuntu cloud VM via KVM, provisions the toolchain,
                    # rsyncs the repo, then runs bpf-verify + e2e inside the guest
make kvm-down       # power off + destroy the VM

Pure qemu + cloud-init (no libvirt, no host package installs). Because daemon startup loads — and therefore verifies — the BPF, the e2e run covers bpf-verify too. Iterate without rebooting: hack/kvm-validate.sh sync && hack/kvm-validate.sh e2e; shell in with hack/kvm-validate.sh ssh; VM state lives under .kvm/ (gitignored). Requires /dev/kvm, qemu-system-x86_64, qemu-img, genisoimage, ssh, rsync.

Acceptance criteria -> code map (§12)

#	Criterion	Where
1	kind 3-node deploy	`test/e2e/run.sh`, `deploy/`
2	FullMesh peers form	`desiredstate.go` + `streamserver`/`wg.go`
3	only UDP/51820 on wire	`routing.go` fwmark steer + wg
4	BlockPolicy drops <1s	`classify.bpf.c` + `ebpf.go` SetBlocks
5	node delete drops peer	`informers.go` removeNodePeerEverywhere + SyncState
6	zero-drop rotation	`rotation.go` + `wg.go` Prepare/Activate/Retire
7	controller down -> dataplane lives	kernel wg + pinned maps; reconnect SyncState
8	large-packet test	`routing.go` MTU/MSS
9	desiredstate unit tables	`desiredstate_test.go`

Scope

In: single-cluster node-to-node encryption, pod-IP L3 steering, blocks, rotation. Out (this milestone): cluster federation, a full CNI/IPAM, ClusterIP/kube-proxy replacement, Kafka/audit streaming.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
api/v1alpha1		api/v1alpha1
bpf		bpf
cmd		cmd
deploy		deploy
docs		docs
hack		hack
internal		internal
proto		proto
test/e2e		test/e2e
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
Dockerfile.controller		Dockerfile.controller
Dockerfile.daemon		Dockerfile.daemon
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
netlify.toml		netlify.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vpblu — Encrypted Infra-Cluster Mesh

Architecture

Repository layout

CRDs (policy is data — §4)

Build

Deploy

The MTU/MSS trap (§10 — read twice)

Warm connection + correctness model

Key rotation (zero-downtime, §9)

End-to-end

Validating on a disposable KVM guest

Acceptance criteria -> code map (§12)

Scope

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vpblu — Encrypted Infra-Cluster Mesh

Architecture

Repository layout

CRDs (policy is data — §4)

Build

Deploy

The MTU/MSS trap (§10 — read twice)

Warm connection + correctness model

Key rotation (zero-downtime, §9)

End-to-end

Validating on a disposable KVM guest

Acceptance criteria -> code map (§12)

Scope

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages