Skip to content

pimd: clean stale upstream NHT tracking on RP delete#22505

Open
hnattamaisub wants to merge 1 commit into
FRRouting:masterfrom
hnattamaisub:pim_crash
Open

pimd: clean stale upstream NHT tracking on RP delete#22505
hnattamaisub wants to merge 1 commit into
FRRouting:masterfrom
hnattamaisub:pim_crash

Conversation

@hnattamaisub

Copy link
Copy Markdown
Contributor

Issue:
Faced a crash in below sequence.
pim_upstream_mroute_iif_update+0x16 -> pimd/pim_mroute.c:1256
pimd(+0x78a94) -> pimd/pim_nht.c
pim_nexthop_update+0x587 -> pimd/pim_nht.c

Rootcause:
pim_rpf_update() tracks upstreams in pnc->upstream_hash keyed by up->upstream_addr.

Some RP delete/BSM cleanup paths clear RPF state and then mutate up->upstream_addr to PIMADDR_ANY without first removing the upstream from the old RP nexthop cache bucket. If the upstream was previously tracked under the old RP address, a later Zebra NHT update for that address can still walk the stale upstream entry.

Fix:
1)Guard channel_oil at appropriate places.
2)Wrote an api for cleanup.
RP delete and BSM cleanup now remove the upstream from the old NHT bucket before up->upstream_addr is changed to PIMADDR_ANY. This prevents the old pnc->upstream_hash from retaining a stale upstream pointer.

@frrbot frrbot Bot added the pim label Jun 29, 2026
@hnattamaisub hnattamaisub marked this pull request as draft June 29, 2026 08:02
@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a crash triggered when Zebra fires an NHT update for an RP address after the upstream's upstream_addr has been reset to PIMADDR_ANY — the upstream pointer was left stale in the old pnc->upstream_hash, so pim_update_upstream_nh_helper could walk it and dereference a NULL channel_oil.

  • Introduces pim_nht_delete_tracked_upstream, which detaches an upstream from its NHT bucket (and calls pim_nht_drop_maybe) before upstream_addr is clobbered; called in both the pim_rp_del and pim_bsm_clear "RP not found" paths.
  • Adds NULL guards for up->channel_oil in pim_upstream_mroute_iif_update, pim_update_upstream_nh_helper, and pim_upstream_rpf_clear to prevent the NULL dereference directly observed in the crash backtrace.

Confidence Score: 5/5

The targeted crash path is correctly closed by removing the stale upstream pointer before resetting upstream_addr; all touched code paths have been traced and the changes are consistent with the existing NHT lifecycle.

The two-pronged fix — detaching the upstream from its old NHT bucket before any address mutation, and guarding against NULL channel_oil — directly addresses the reported crash sequence. The new helper is idempotent (hash_release is a no-op when the entry is absent) and pim_nht_drop_maybe correctly preserves the pnc when other references remain. No new logic errors were introduced.

pimd/pim_bsm.c warrants a second look: the new cleanup call covers the RP-not-found branch, but the parallel RP-found branch (calls pim_upstream_update directly) was not modified.

Important Files Changed

Filename Overview
pimd/pim_nht.c Adds pim_nht_delete_tracked_upstream helper that removes an upstream from its NHT bucket (with pim_nht_drop_maybe) before the upstream's address is reset to ANY; also guards pim_upstream_mroute_iif_update call with a channel_oil NULL check.
pimd/pim_nht.h Declares the new pim_nht_delete_tracked_upstream API with a clear doc comment explaining its purpose and safety contract.
pimd/pim_rp.c Calls pim_nht_delete_tracked_upstream before pim_upstream_rpf_clear / upstream_addr = ANY in both the rp_all and general RP-not-found branches; the pim_nht_delete_tracked call that precedes the loop already removes those upstreams, so these are belt-and-suspenders but correct.
pimd/pim_bsm.c Adds pim_nht_delete_tracked_upstream only in the RP-not-found branch of pim_bsm_clear; the parallel RP-found branch calls pim_upstream_update without first detaching the upstream from the old NHT bucket.
pimd/pim_mroute.c Adds an early NULL return for c_oil in pim_upstream_mroute_iif_update, preventing the crash observed in the backtrace when c_oil->up is dereferenced immediately after.
pimd/pim_rpf.c Guards the pim_upstream_mroute_iif_update call in pim_upstream_rpf_clear with a channel_oil NULL check, which is now redundant with the guard added inside the callee but remains good defensive practice.

Reviews (2): Last reviewed commit: "pimd: clean stale upstream NHT tracking ..." | Re-trigger Greptile

Comment thread pimd/pim_nht.c
Issue:
Faced a crash in below sequence.
pim_upstream_mroute_iif_update+0x16 -> pimd/pim_mroute.c:1256
pimd(+0x78a94)                     -> pimd/pim_nht.c
pim_nexthop_update+0x587           -> pimd/pim_nht.c

Rootcause:
pim_rpf_update() tracks upstreams in pnc->upstream_hash keyed by
up->upstream_addr.

Some RP delete/BSM cleanup paths clear RPF state and then mutate
up->upstream_addr to PIMADDR_ANY without first removing the upstream from
the old RP nexthop cache bucket. If the upstream was previously tracked
under the old RP address, a later Zebra NHT update for that address can
still walk the stale upstream entry.

Fix:
1)Guard channel_oil at appropriate places.
2)Wrote an api for cleanup.
RP delete and BSM cleanup now remove the upstream from the old NHT bucket before
up->upstream_addr is changed to PIMADDR_ANY. This prevents the old
pnc->upstream_hash from retaining a stale upstream pointer.

Signed-off-by: harini <hnattamaisub@nvidia.com>
@hnattamaisub hnattamaisub marked this pull request as ready for review June 29, 2026 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant