Skip to content

felix/bpf/ut: drain shared ring buffer in BPF UT helpers#12515

Open
lucastigera wants to merge 1 commit intoprojectcalico:masterfrom
lucastigera:fix-bpf-flow-log-ringbuf-drain
Open

felix/bpf/ut: drain shared ring buffer in BPF UT helpers#12515
lucastigera wants to merge 1 commit intoprojectcalico:masterfrom
lucastigera:fix-bpf-flow-log-ringbuf-drain

Conversation

@lucastigera
Copy link
Copy Markdown
Contributor

@lucastigera lucastigera commented Apr 16, 2026

Summary

cali_rb_evnt is a pinned, shared BPF_MAP_TYPE_RINGBUF. Its consumer/producer positions are kernel state and persist across ringbuf.New() calls, so a reader opened by one test sees any records left unread by an earlier test. ParsePolicyVerdict then slices past the end of, e.g., an 84-byte EVENT_PROTO_STATS record (emitted by the enterprise `TestKprobe`) and panics:

panic: runtime error: slice bounds out of range [:88] with capacity 76
github.com/projectcalico/calico/felix/bpf/events.ParsePolicyVerdict
  felix/bpf/events/events.go:268
github.com/projectcalico/calico/felix/bpf/ut_test.TestFlowLogV6Events
  felix/bpf/ut/flow_log_events_test.go:57

Fix:

  • Add a `newTestRingBuf()` helper in `bpf_prog_test.go` that opens a reader on `cali_rb_evnt` and immediately drains any leftover records, so `rb.Next()` only returns events produced after the call.
  • Convert all six ring-buffer-consuming tests (`flow_log_events_test.go`, `ringbuf_events_test.go`) to use the helper so future tests get the same protection for free.
  • Assert the expected event type in the flow log tests so any future mismatch fails with a clear message rather than panicking inside the parser.

The OSS test suite alone cannot reproduce the panic because only the enterprise fork has kprobe programs that emit stray records into `cali_rb_evnt` during earlier tests, but the fix hardens the pattern for everyone.

Test plan

  • `make ut-bpf FOCUS=TestFlowLog` — all subtests pass.
  • `make ut-bpf FOCUS=TestRingBuf` — all subtests pass.
  • Confirmed the original panic reproduces on enterprise fork before the fix and no longer reproduces after cherry-picking this change.

Release note:

None

cali_rb_evnt is a pinned, shared BPF_MAP_TYPE_RINGBUF. Its consumer
and producer positions are kernel state that persist across
ringbuf.New() calls, so a reader opened by one test sees any records
left unread by earlier tests. ParsePolicyVerdict then slices past the
end of, e.g., an 84-byte EVENT_PROTO_STATS record (emitted by the
enterprise TestKprobe) and panics:

    panic: runtime error: slice bounds out of range [:88] with capacity 76

Introduce a newTestRingBuf() helper in bpf_prog_test.go that opens a
reader on cali_rb_evnt and immediately drains any leftover records, so
rb.Next() only returns events produced after the call. Convert all six
ring-buffer-consuming tests (flow_log_events_test.go, ringbuf_events_test.go)
to use it so future tests get the same protection for free. Also assert
the expected event type in the flow log tests, so a mismatch fails with
a clear message instead of panicking inside ParsePolicyVerdict.
@lucastigera lucastigera requested a review from a team as a code owner April 16, 2026 21:34
Copilot AI review requested due to automatic review settings April 16, 2026 21:34
@marvin-tigera marvin-tigera added this to the Calico v3.33.0 milestone Apr 16, 2026
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Apr 16, 2026
@lucastigera lucastigera added docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact and removed release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Apr 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens Felix BPF unit tests that consume the shared pinned ring buffer (cali_rb_evnt) by ensuring each test starts from a clean read position, preventing leftover records from earlier tests (including enterprise-only producers) from causing mis-parsing and panics.

Changes:

  • Added a newTestRingBuf() helper that opens the shared ring buffer reader and drains any pre-existing events.
  • Updated ring-buffer-consuming UTs to use newTestRingBuf() instead of constructing readers directly.
  • Added explicit event-type assertions in flow log tests to fail clearly on unexpected event types.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
felix/bpf/ut/bpf_prog_test.go Adds newTestRingBuf() helper that drains the shared pinned ring buffer on open.
felix/bpf/ut/ringbuf_events_test.go Switches ring buffer tests to use the draining helper; updates comments around clean-state assumptions.
felix/bpf/ut/flow_log_events_test.go Uses the draining helper and asserts event types before parsing policy verdicts.

Comment on lines +98 to 99
rb := newTestRingBuf()

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestRingBufReaderRecovery creates a ring buffer reader without a defer rb.Close(). If an assertion fails before the explicit rb.Close() call, the reader can be leaked (mmap/epoll fds), which may impact subsequent tests. Consider adding a deferred close immediately after creation (even if you still close early to test re-opening).

Copilot uses AI. Check for mistakes.
Comment on lines +696 to +700
func newTestRingBuf() *ringbuf.RingBuffer {
rb, err := ringbuf.New(ringBufMap, rbSize)
Expect(err).NotTo(HaveOccurred())
rb.Drain()
return rb
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newTestRingBuf() depends on rbSize, which is currently declared in ringbuf_events_test.go, and initMapsOnce() hard-codes the ring buffer map size as 1024*1024. That cross-file coupling makes it easy for these values to drift or for the helper to stop compiling if the constant is moved/guarded by build tags. Consider defining a single shared constant (e.g., in bpf_prog_test.go) and using it both for ringbuf.Map(..., size) and ringbuf.New(..., size) (and have tests reference that constant).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants