pimd: clean stale upstream NHT tracking on RP delete#22505
Conversation
Greptile SummaryThis PR fixes a crash triggered when Zebra fires an NHT update for an RP address after the upstream's
Confidence Score: 5/5The targeted crash path is correctly closed by removing the stale upstream pointer before resetting upstream_addr; all touched code paths have been traced and the changes are consistent with the existing NHT lifecycle. The two-pronged fix — detaching the upstream from its old NHT bucket before any address mutation, and guarding against NULL channel_oil — directly addresses the reported crash sequence. The new helper is idempotent (hash_release is a no-op when the entry is absent) and pim_nht_drop_maybe correctly preserves the pnc when other references remain. No new logic errors were introduced. pimd/pim_bsm.c warrants a second look: the new cleanup call covers the RP-not-found branch, but the parallel RP-found branch (calls pim_upstream_update directly) was not modified. Important Files Changed
Reviews (2): Last reviewed commit: "pimd: clean stale upstream NHT tracking ..." | Re-trigger Greptile |
Issue: Faced a crash in below sequence. pim_upstream_mroute_iif_update+0x16 -> pimd/pim_mroute.c:1256 pimd(+0x78a94) -> pimd/pim_nht.c pim_nexthop_update+0x587 -> pimd/pim_nht.c Rootcause: pim_rpf_update() tracks upstreams in pnc->upstream_hash keyed by up->upstream_addr. Some RP delete/BSM cleanup paths clear RPF state and then mutate up->upstream_addr to PIMADDR_ANY without first removing the upstream from the old RP nexthop cache bucket. If the upstream was previously tracked under the old RP address, a later Zebra NHT update for that address can still walk the stale upstream entry. Fix: 1)Guard channel_oil at appropriate places. 2)Wrote an api for cleanup. RP delete and BSM cleanup now remove the upstream from the old NHT bucket before up->upstream_addr is changed to PIMADDR_ANY. This prevents the old pnc->upstream_hash from retaining a stale upstream pointer. Signed-off-by: harini <hnattamaisub@nvidia.com>
Issue:
Faced a crash in below sequence.
pim_upstream_mroute_iif_update+0x16 -> pimd/pim_mroute.c:1256
pimd(+0x78a94) -> pimd/pim_nht.c
pim_nexthop_update+0x587 -> pimd/pim_nht.c
Rootcause:
pim_rpf_update() tracks upstreams in pnc->upstream_hash keyed by up->upstream_addr.
Some RP delete/BSM cleanup paths clear RPF state and then mutate up->upstream_addr to PIMADDR_ANY without first removing the upstream from the old RP nexthop cache bucket. If the upstream was previously tracked under the old RP address, a later Zebra NHT update for that address can still walk the stale upstream entry.
Fix:
1)Guard channel_oil at appropriate places.
2)Wrote an api for cleanup.
RP delete and BSM cleanup now remove the upstream from the old NHT bucket before up->upstream_addr is changed to PIMADDR_ANY. This prevents the old pnc->upstream_hash from retaining a stale upstream pointer.