Skip to content

Use ListAll() to reduce nft subprocesses during resync#12471

Draft
caseydavenport wants to merge 1 commit intoprojectcalico:masterfrom
caseydavenport:casey-nft-listall
Draft

Use ListAll() to reduce nft subprocesses during resync#12471
caseydavenport wants to merge 1 commit intoprojectcalico:masterfrom
caseydavenport:casey-nft-listall

Conversation

@caseydavenport
Copy link
Copy Markdown
Member

Replace separate List("map") and List("chain") calls in loadDataplaneState() with a single ListAll() call, halving the number of nft subprocesses spawned during each resync cycle. ListAll() was added in knftables v0.0.21 (already on master) specifically for this use case — it runs nft --json --terse list table <family> <table> once and returns all object names grouped by type.

Also replace the unscoped nft list ruleset debug dump (fired on transaction error) with a table-scoped nft list table call. The unscoped command parses objects from all kernel nftables tables, which can crash older nft binaries when other tables contain udata written by newer nft versions. See #11750 for details on the nft udata crash.

This is the same optimization kube-proxy made in kubernetes/kubernetes#137501.

None

Replace separate List("map") and List("chain") calls in
loadDataplaneState with a single ListAll() call, halving the number
of nft subprocesses spawned during each resync cycle.

Also replace the unscoped "nft list ruleset" debug dump with a
table-scoped "nft list table" call. The unscoped command parses
objects from all kernel nftables tables, which can crash older nft
binaries when other tables contain udata written by newer nft
versions.
@caseydavenport caseydavenport requested a review from a team as a code owner April 15, 2026 03:28
@caseydavenport caseydavenport added docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact labels Apr 15, 2026
Copilot AI review requested due to automatic review settings April 15, 2026 03:28
@marvin-tigera marvin-tigera added this to the Calico v3.33.0 milestone Apr 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes Felix’s nftables resync path by using knftables ListAll() to fetch all object names in a single invocation, reducing subprocess spawning, and scopes error-time diagnostics to the Calico table to avoid crashes caused by parsing other tables’ udata.

Changes:

  • Replace separate List("map") / List("chain") calls with a single ListAll() call and plumb the resulting object names into map and chain resync logic.
  • Update the maps dataplane resync API to accept a caller-provided context and pre-fetched map names.
  • Replace nft list ruleset debug dumping on transaction failure with a table-scoped nft list table ... dump.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
felix/nftables/table.go Uses ListAll() during resync; threads chain names into hash/rule loading; scopes debug dumps to table.
felix/nftables/maps.go Changes LoadDataplaneState to accept (ctx, mapNames) and removes internal List("map") call.
felix/nftables/table_layer.go Updates table-layer wrapper to the new maps resync signature and adds context import.
felix/nftables/maps_test.go Adjusts unit tests to call maps resync with pre-fetched map names via a helper.

Comment thread felix/nftables/table.go
Comment on lines +658 to +663
} else {
t.logCxt.WithError(err).Warn("Failed to list all nftables objects")
}
// Fall through — maps and chains will get empty slices, which is
// correct when the table doesn't exist yet.
allObjects = map[string][]string{}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On ListAll() failure (non-NotFound), this falls through to an empty object map. That can make Felix treat an existing table as empty, clearing the in-memory dataplane view and potentially causing noisy/failed re-programming attempts. Consider returning early on unexpected ListAll errors, or falling back to separate List("map")/List("chain") calls instead of assuming empty state.

Suggested change
} else {
t.logCxt.WithError(err).Warn("Failed to list all nftables objects")
}
// Fall through — maps and chains will get empty slices, which is
// correct when the table doesn't exist yet.
allObjects = map[string][]string{}
// Fall through — maps and chains will get empty slices, which is
// correct when the table doesn't exist yet.
allObjects = map[string][]string{}
} else {
t.logCxt.WithError(err).Warn("Failed to list all nftables objects")
return
}

Copilot uses AI. Check for mistakes.
Comment thread felix/nftables/table.go
Comment on lines +652 to +653
ctx, cancel := context.WithTimeout(context.Background(), t.contextTimeout)
defer cancel()
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same timeout-scoped ctx is used for both ListAll() and the subsequent map resync (which may issue many ListElements calls). This couples their time budgets and can cause premature context deadline exceeded during large resyncs. Consider using separate contexts/timeouts per operation (or let Maps.LoadDataplaneState derive its own per-call timeouts).

Copilot uses AI. Check for mistakes.
Comment thread felix/nftables/table.go
Comment on lines +1206 to +1210
// Dump our table's state for debugging. We scope this to our
// own table rather than using "nft list ruleset" to avoid
// parsing objects from other tables that may contain udata
// written by a newer nft, which can crash older nft binaries.
cmd := t.newCmd("nft", "list", "table", t.name)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nft list table is being invoked with only the table name. Since Felix creates both IPv4 and IPv6 tables with the same name (e.g. "calico"), this may dump the wrong table or fail depending on nft's argument parsing. Consider including the nftables family (ip/ip6/arp) in the command invocation (store the family on NftablesTable so it can be used here).

Copilot uses AI. Check for mistakes.
@caseydavenport caseydavenport marked this pull request as draft April 15, 2026 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants