Use ListAll() to reduce nft subprocesses during resync#12471
Use ListAll() to reduce nft subprocesses during resync#12471caseydavenport wants to merge 1 commit intoprojectcalico:masterfrom
Conversation
Replace separate List("map") and List("chain") calls in
loadDataplaneState with a single ListAll() call, halving the number
of nft subprocesses spawned during each resync cycle.
Also replace the unscoped "nft list ruleset" debug dump with a
table-scoped "nft list table" call. The unscoped command parses
objects from all kernel nftables tables, which can crash older nft
binaries when other tables contain udata written by newer nft
versions.
There was a problem hiding this comment.
Pull request overview
This PR optimizes Felix’s nftables resync path by using knftables ListAll() to fetch all object names in a single invocation, reducing subprocess spawning, and scopes error-time diagnostics to the Calico table to avoid crashes caused by parsing other tables’ udata.
Changes:
- Replace separate
List("map")/List("chain")calls with a singleListAll()call and plumb the resulting object names into map and chain resync logic. - Update the maps dataplane resync API to accept a caller-provided context and pre-fetched map names.
- Replace
nft list rulesetdebug dumping on transaction failure with a table-scopednft list table ...dump.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| felix/nftables/table.go | Uses ListAll() during resync; threads chain names into hash/rule loading; scopes debug dumps to table. |
| felix/nftables/maps.go | Changes LoadDataplaneState to accept (ctx, mapNames) and removes internal List("map") call. |
| felix/nftables/table_layer.go | Updates table-layer wrapper to the new maps resync signature and adds context import. |
| felix/nftables/maps_test.go | Adjusts unit tests to call maps resync with pre-fetched map names via a helper. |
| } else { | ||
| t.logCxt.WithError(err).Warn("Failed to list all nftables objects") | ||
| } | ||
| // Fall through — maps and chains will get empty slices, which is | ||
| // correct when the table doesn't exist yet. | ||
| allObjects = map[string][]string{} |
There was a problem hiding this comment.
On ListAll() failure (non-NotFound), this falls through to an empty object map. That can make Felix treat an existing table as empty, clearing the in-memory dataplane view and potentially causing noisy/failed re-programming attempts. Consider returning early on unexpected ListAll errors, or falling back to separate List("map")/List("chain") calls instead of assuming empty state.
| } else { | |
| t.logCxt.WithError(err).Warn("Failed to list all nftables objects") | |
| } | |
| // Fall through — maps and chains will get empty slices, which is | |
| // correct when the table doesn't exist yet. | |
| allObjects = map[string][]string{} | |
| // Fall through — maps and chains will get empty slices, which is | |
| // correct when the table doesn't exist yet. | |
| allObjects = map[string][]string{} | |
| } else { | |
| t.logCxt.WithError(err).Warn("Failed to list all nftables objects") | |
| return | |
| } |
| ctx, cancel := context.WithTimeout(context.Background(), t.contextTimeout) | ||
| defer cancel() |
There was a problem hiding this comment.
The same timeout-scoped ctx is used for both ListAll() and the subsequent map resync (which may issue many ListElements calls). This couples their time budgets and can cause premature context deadline exceeded during large resyncs. Consider using separate contexts/timeouts per operation (or let Maps.LoadDataplaneState derive its own per-call timeouts).
| // Dump our table's state for debugging. We scope this to our | ||
| // own table rather than using "nft list ruleset" to avoid | ||
| // parsing objects from other tables that may contain udata | ||
| // written by a newer nft, which can crash older nft binaries. | ||
| cmd := t.newCmd("nft", "list", "table", t.name) |
There was a problem hiding this comment.
nft list table is being invoked with only the table name. Since Felix creates both IPv4 and IPv6 tables with the same name (e.g. "calico"), this may dump the wrong table or fail depending on nft's argument parsing. Consider including the nftables family (ip/ip6/arp) in the command invocation (store the family on NftablesTable so it can be used here).
Replace separate
List("map")andList("chain")calls inloadDataplaneState()with a singleListAll()call, halving the number of nft subprocesses spawned during each resync cycle.ListAll()was added in knftables v0.0.21 (already on master) specifically for this use case — it runsnft --json --terse list table <family> <table>once and returns all object names grouped by type.Also replace the unscoped
nft list rulesetdebug dump (fired on transaction error) with a table-scopednft list tablecall. The unscoped command parses objects from all kernel nftables tables, which can crash older nft binaries when other tables contain udata written by newer nft versions. See #11750 for details on the nft udata crash.This is the same optimization kube-proxy made in kubernetes/kubernetes#137501.