Skip to content

[caclmgrd]: Preserve externally-owned INPUT rules across CACL rebuild#396

Open
Xichen96 wants to merge 1 commit into
sonic-net:masterfrom
Xichen96:dev/xichenlin/caclmgrd-dhcp-server-syslog
Open

[caclmgrd]: Preserve externally-owned INPUT rules across CACL rebuild#396
Xichen96 wants to merge 1 commit into
sonic-net:masterfrom
Xichen96:dev/xichenlin/caclmgrd-dhcp-server-syslog

Conversation

@Xichen96

@Xichen96 Xichen96 commented Jun 19, 2026

Copy link
Copy Markdown

Why I did it

Fixes sonic-net/sonic-buildimage#27584

caclmgrd flushes and rebuilds the host INPUT chain from CONFIG_DB on every control-plane ACL (CACL) change. The dhcp_server container (the only bridge-mode container) installs a docker0 syslog exception directly in the kernel:

-A INPUT -i docker0 -p udp --dport 514 -m comment --comment dhcp_server_syslog -j ACCEPT

That rule is not in CONFIG_DB, so the rebuild wipes it and the catch-all DROP then silently drops dhcp_server syslog until the container restarts.

Work item tracking
  • Microsoft ADO (number only): N/A

How I did it

Make caclmgrd preserve INPUT rules it does not own instead of clobbering them:

  • caclmgrd never comments its own rules, so an iptables --comment marks an externally-owned rule. An explicit ignore list IGNORED_INPUT_RULE_COMMENTS (currently dhcp_server_syslog) says which to keep.
  • Before the flush, snapshot matching rules (iptables -S INPUT) and re-add them verbatim early in the rebuild — before any DROP.

Design notes (concerns considered)

  • Ownership boundary — the comment is the marker; caclmgrd only mirrors these rules (never creates/deletes them), so dhcp_server stays the sole owner via its existing start/stop add/remove. No container change.
  • No drop gap — policy is ACCEPT during the rebuild and the catch-all DROP is appended last, while the preserved rule is re-added early, so a DROP never coexists with the rule missing. Verified step-by-step that syslog is accepted at every point of the rebuild.
  • Container keeps control — the rule is re-added byte-identically (from iptables -S), so the container's own -C/-D still match it after a rebuild. Verified.
  • Contained scopecaclmgrd-only: it never writes CONFIG_DB, signals, or restarts dhcp_server. (A rare simultaneous flush/container-update can momentarily duplicate the rule; harmless and self-healing.)

How to verify it

  1. iptables -S INPUT | grep dhcp_server_syslog — rule present.
  2. sonic-db-cli CONFIG_DB HSET "ACL_TABLE|TEST_CACL" "policy_desc" "test" "type" "CTRLPLANE" "stage" "ingress" "services@" "SSH"
  3. iptables -S INPUT | grep dhcp_server_syslog — still present, above the catch-all DROP (previously it disappeared).
  4. pytest tests/caclmgrd/caclmgrd_preserved_input_rules_test.py

Which release branch to backport (provide reason below if selected)

  • 202511

The rule and the CACL flush-and-rebuild both exist on 202511 (the rule was added by sonic-net/sonic-buildimage#26637, included in 202511), so the same drop happens there.

Description for the changelog

[caclmgrd] Preserve externally-owned INPUT rules (e.g. the dhcp_server docker0 syslog rule, tagged by an iptables comment) across the control-plane ACL flush-and-rebuild.

@mssonicbld

Copy link
Copy Markdown

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@Xichen96 Xichen96 force-pushed the dev/xichenlin/caclmgrd-dhcp-server-syslog branch from 149cbfe to 68f7e81 Compare June 19, 2026 08:00
@mssonicbld

Copy link
Copy Markdown

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@Xichen96 Xichen96 changed the title [caclmgrd] Regenerate dhcp_server syslog rule during CACL rebuild [caclmgrd]: Preserve externally-owned INPUT rules across CACL rebuild Jun 19, 2026
@Xichen96 Xichen96 force-pushed the dev/xichenlin/caclmgrd-dhcp-server-syslog branch from 68f7e81 to 7a1650c Compare June 19, 2026 12:36
@mssonicbld

Copy link
Copy Markdown

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates caclmgrd (the host control-plane ACL manager) to preserve specific externally-owned iptables INPUT rules across the daemon’s flush-and-rebuild cycle, preventing those kernel-only rules (e.g., the dhcp_server docker0 syslog exception) from being silently removed after a CACL update.

Changes:

  • Add an allowlist (IGNORED_INPUT_RULE_COMMENTS) and logic to snapshot matching commented INPUT rules before a rebuild and re-add them early in the rebuilt chain.
  • Insert preserved rules into the generated iptables command sequence before any DROP rules are appended.
  • Add unit tests covering snapshot filtering, graceful failure, ordering relative to the catch-all DROP, and namespace scoping.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
scripts/caclmgrd Snapshots allowlisted commented INPUT rules pre-flush and re-inserts them early during CACL rebuild; adds helper get_preserved_input_rules().
tests/caclmgrd/caclmgrd_preserved_input_rules_test.py Adds unit tests validating preservation behavior, ordering before DROP, failure handling, and host-namespace-only behavior.

Comment thread scripts/caclmgrd
Comment on lines +1106 to +1115
for line in output.splitlines():
tokens = line.split()
if len(tokens) < 2 or tokens[0] != '-A' or tokens[1] != 'INPUT':
continue
if '--comment' not in tokens:
continue
comment = tokens[tokens.index('--comment') + 1].strip('"')
if comment not in self.IGNORED_INPUT_RULE_COMMENTS:
continue
preserved.append(self.iptables_cmd_ns_prefix[namespace] + ['iptables'] + tokens)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — switched to shlex.split() with defensive --comment extraction so comments containing spaces (quoted by iptables -S) stay byte-identical on re-add and the owning service's -C/-D keep matching. Added a unit test covering a spaced comment. Fixed in the latest push.

caclmgrd owns the host INPUT chain and flushes/rebuilds it from CONFIG_DB
on every control-plane ACL change. Rules installed by OTHER services live
only in the kernel and are silently wiped by that rebuild. The concrete
case is the dhcp_server container docker0 syslog (UDP 514) ACCEPT
(sonic-net/sonic-buildimage#27584): after any CACL update it is dropped
until dhcp_server restarts, breaking syslog from the bridge-mode container.

Make caclmgrd not clobber externally-owned INPUT rules. caclmgrd never
comments its own rules, so an iptables --comment is an unambiguous marker
of external ownership. Before the flush, snapshot any INPUT rule whose
--comment is in an explicit ignore list (IGNORED_INPUT_RULE_COMMENTS,
currently just "dhcp_server_syslog") and re-add it verbatim early in the
rebuild -- before the ip2me and catch-all DROPs. The INPUT policy is ACCEPT
throughout the rebuild and the catch-all DROP is appended last, so there is
no window where the rule is missing while a DROP exists; no syslog is lost.

The owning service keeps full lifecycle control: caclmgrd never creates or
deletes the rule, it only mirrors it, and re-adds it byte-identically so the
container own -C/-D keep matching. Host namespace / IPv4 only. No change to
the dhcp_server container.

Fixes sonic-net/sonic-buildimage#27584

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xichen96 <lukelin0907@gmail.com>
@Xichen96 Xichen96 force-pushed the dev/xichenlin/caclmgrd-dhcp-server-syslog branch from 7a1650c to 80744ee Compare June 19, 2026 12:56
@mssonicbld

Copy link
Copy Markdown

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@Xichen96

Copy link
Copy Markdown
Author

@Bojun-Feng this addresses sonic-net/sonic-buildimage#27584 with a different approach than #390 — instead of caclmgrd regenerating the rule from FEATURE state, it preserves the externally-owned INPUT rule (identified by its iptables --comment) across the CACL flush-and-rebuild, so dhcp_server stays the sole owner and the container is unchanged. Would appreciate your review.

cc @tirupatihemanth @yejianquan @vmittal-msft @Sourabh-Kumar7 — review requests sent, feedback welcome.

@Xichen96

Copy link
Copy Markdown
Author

@StormLiangMS adding you for visibility — caclmgrd fix for the dhcp_server syslog drop after control-plane ACL updates (fixes sonic-net/sonic-buildimage#27584), with a 202511 backport requested. Happy to walk through the approach.

@tirupatihemanth

tirupatihemanth commented Jun 22, 2026

Copy link
Copy Markdown

Hi @Xichen96 I understand that we are trying to preserve specific commented filtering rules here. Since dhcp container is in bridge networking and we will always need this rule for dhcp syslog traffic - can we make this as part of default CONFIG_DB instead of adding the rule preservation logic in caclmgrd?

The pov I am looking at is dhcp_server should modify kernel ip table rules through CONFIG_DB and not directly.
CMIIW

++ @StormLiangMS

@StormLiangMS StormLiangMS requested a review from yyynini June 24, 2026 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: CACL updates remove dhcp_server syslog iptables rule

4 participants