Skip to content

zebra: clean up VRF handling by using dataplane provided vrf_id#20318

Merged
mjstapp merged 3 commits into
FRRouting:masterfrom
maxime-leroy:zebra_vrf_id_cleanup
May 20, 2026
Merged

zebra: clean up VRF handling by using dataplane provided vrf_id#20318
mjstapp merged 3 commits into
FRRouting:masterfrom
maxime-leroy:zebra_vrf_id_cleanup

Conversation

@maxime-leroy

Copy link
Copy Markdown
Contributor

Zebra was implicitly assuming that the VRF netdevice ifindex could be used as the VRF identifier, leading to casts from ifindex to vrf_id_t. While this works with the Linux kernel dataplane, it does not respect the dataplane API, which exposes these identifiers as distinct concepts.

The code is updated to use the VRF identifier explicitly provided by the dataplane context and removes the need for casting ifindex to vrf_id_t. In addition, the VRF delete path is clarified by no longer reading unused dataplane fields, avoiding confusion about which identifiers are actually required.

There is no behavior change for the Linux kernel dataplane.

@github-actions

Copy link
Copy Markdown

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@maxime-leroy

Copy link
Copy Markdown
Contributor Author

ci:rerun

1 similar comment
@maxime-leroy

Copy link
Copy Markdown
Contributor Author

ci:rerun

@donaldsharp donaldsharp self-requested a review January 6, 2026 15:58
Comment thread zebra/interface.c Outdated
Comment thread zebra/interface.c Outdated
Comment thread zebra/interface.c Outdated

@vjardin vjardin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, LGTM

@vjardin

vjardin commented Jan 14, 2026

Copy link
Copy Markdown
Contributor

Let's wait for the end of the CI just in case. I'd prefer someone external to ack't+merge it since I am a Maxime's colleague.

Comment thread zebra/zebra_trace.h

@mjstapp mjstapp left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's some confusion here - but I'm not sure that this PR isn't making things harder to understand.
I agree that it's confusing to repeatedly cast a value that appears as an ifindex - that's not a great pattern. but I can imagine a very small couple of lines that would correct that in interface.c - I don't see that reorganizing the whole vrf device change flow is helping.

@maxime-leroy

Copy link
Copy Markdown
Contributor Author

there's some confusion here - but I'm not sure that this PR isn't making things harder to understand. I agree that it's confusing to repeatedly cast a value that appears as an ifindex - that's not a great pattern. but I can imagine a very small couple of lines that would correct that in interface.c - I don't see that reorganizing the whole vrf device change flow is helping.

There are two separate commits in this PR.

The second commit is the actual fix: interface_vrf_change() was implicitly assuming that the VRF netdevice ifindex could be used as zebra’s vrf_id (via casting).
This holds true for the Linux kernel dataplane (VRF ID == VRF ifindex), so there is no behavior change there, but it is incorrect for non-kernel dataplanes such as Grout (DPDK) where vrf_id != ifindex.
The code now uses the dataplane-provided vrf_id (dplane_ctx_get_ifp_vrf_id()), which fixes VRF handling for Grout.

The first commit is only a cleanup: in the netlink backend ifp_table_id is set for RTM_NEWLINK VRF events but not for RTM_DELLINK,
and VRF deletion does not require table_id anyway. Splitting VRF handling into interface_vrf_update() / interface_vrf_del() avoids relying on an unset dplane field and makes the logic clearer.

If needed, the first commit can be ditched. However, in my view it is a nice-to-have cleanup that clarifies the delete vs update requirements.

@maxime-leroy maxime-leroy force-pushed the zebra_vrf_id_cleanup branch 2 times, most recently from c52e4a6 to 15e7bb3 Compare January 14, 2026 15:36
@mjstapp

mjstapp commented Jan 14, 2026

Copy link
Copy Markdown
Contributor

yes, I understood the two commits, I think.
my point was: your choice to "clean up" what is the common handler function had a significant impact. you could just have ensured that the vrf_id value was conveyed. but instead, you removed the handler, replaced it with two new functions, encountered the lttng traces that are involved. a two-line change turned into 100 lines. that has an impact on the reviewers as well - we have to try to confirm that your changes retain the important logic that you propose moving into separate functions, we have to sort out the traceing api issues.
so: I think you should not try to make that "clean up" change, and just get the vrf_id straight.

If needed, the first commit can be ditched. However, in my view it is a nice-to-have cleanup that clarifies the delete vs update requirements.

@maxime-leroy

Copy link
Copy Markdown
Contributor Author

yes, I understood the two commits, I think. my point was: your choice to "clean up" what is the common handler function had a significant impact. you could just have ensured that the vrf_id value was conveyed. but instead, you removed the handler, replaced it with two new functions, encountered the lttng traces that are involved. a two-line change turned into 100 lines. that has an impact on the reviewers as well - we have to try to confirm that your changes retain the important logic that you propose moving into separate functions, we have to sort out the traceing api issues. so: I think you should not try to make that "clean up" change, and just get the vrf_id straight.

Thanks for the detailed feedback — I understand the concern about review surface / churn.

That said, I’d like to clarify the intent and why I still think the split is justified:

  • The commits are intentionally split:

    The first commit is cleanup-only, and the second commit contains the actual fix (using the dplane-provided vrf_id).
    This is deliberate to separate “mechanical refactor” from “functional change” and make the review easier.

  • No functional change is introduced by the first commit:

    This is explicitly stated in the commit log and PR description, and I also mentioned during the weekly meeting that the first commit was only a split/cleanup with no functional change. The Linux kernel dataplane behavior is unchanged.

  • The cleanup is not only cosmetic:

    In the netlink backend, ifp_table_id is set for RTM_NEWLINK VRF events but not for RTM_DELLINK. The delete path does not require table_id, and reading/tracing it on delete can produce misleading trace data (e.g., a delete event always carrying table_id == 0). Splitting the flow makes the required inputs explicit: update uses table_id, delete does not. This avoids consuming an unset dataplane field and emitting invalid trace payloads.

  • About the LTTng issue:
    While working on this PR, I introduced a compilation issue related to the new tracepoints in zebra/interface.c during a rebase. Sorry about that — I was not familiar with LTTng tracing at the time. I am working on fixing this in the current series.

On the positive side, this led to adding CI coverage for --enable-lttng (thanks to @vjardin), which also uncovered an existing LTTng compilation issue already present on master. I fixed that master issue as well.

So I agree the diff is larger than the minimal vrf_id accessor change, but it is not arbitrary churn: it makes the delete/update requirements explicit and avoids relying on/tracing unset dataplane fields. The actual behavior change remains isolated in the second commit.

@frrbot frrbot Bot added the bfd label Jan 14, 2026
@vjardin vjardin removed the bfd label Jan 15, 2026
@vjardin

vjardin commented Jan 15, 2026

Copy link
Copy Markdown
Contributor

@mjstapp: Fair point on the size, but the split makes the contracts explicit (delete doesn't need table_id/ns_id, update does). With commit 1 being pure refactor and commit 2 the isolated fix, I think it lands in a better place for future maintenance.

@maxime-leroy

Copy link
Copy Markdown
Contributor Author

@mjstapp I added the extra {} blocks in interface_vrf_update() / interface_vrf_del()
in the first commit to preserve the original indentation and minimize the diff,
so it should be easier to review.

I’ve added a follow-up commit that removes these redundant blocks and runs
clang-format.

Hope this makes the review easier on your side.

@maxime-leroy

Copy link
Copy Markdown
Contributor Author

ci:rerun

@vjardin

vjardin commented Jan 25, 2026

Copy link
Copy Markdown
Contributor

@donaldsharp and @mjstapp : I had direct discussion with @maxime-leroy about it. I do not see additional issues with this pull request. Can you argue if there is a strong concern ? if not, let's merge, it is a hurt-less clean up that helps to have a generic support.

@mjstapp

mjstapp commented Jan 26, 2026

Copy link
Copy Markdown
Contributor

I think I've been pretty clear about the issue. A useful 10-line change has turned into a 100-line change ... and that just seems like something that's unnecessary.

@donaldsharp and @mjstapp : I had direct discussion with @maxime-leroy about it. I do not see additional issues with this pull request. Can you argue if there is a strong concern ? if not, let's merge, it is a hurt-less clean up that helps to have a generic support.

@maxime-leroy

Copy link
Copy Markdown
Contributor Author

I think I've been pretty clear about the issue. A useful 10-line change has turned into a 100-line change ... and that just seems like something that's unnecessary.

@donaldsharp and @mjstapp : I had direct discussion with @maxime-leroy about it. I do not see additional issues with this pull request. Can you argue if there is a strong concern ? if not, let's merge, it is a hurt-less clean up that helps to have a generic support.

@mjstapp , I hear you about keeping this small. While working on this PR I noticed two related issues in the VRF delete path:

  • table-id is not set on delete, yet we still read it via dplane_ctx_get_ifp_table_id(ctx)
  • the delete trace always reports table-id = 0, which is misleading

The function split was my way to avoid using/printing a meaningless value there.
You’re right that if the PR only focused on using the correct API to get vrf_id, it would be ~10 lines.
When I notice additional issues in the code I’m touching, I usually try to fix them as well.

If you prefer, I can instead keep the change minimal and simply stop tracing table-id on delete.
Note that with this minimal change, table-id would still be retrieved by the caller (via dplane_ctx_get_ifp_table_id(ctx)) and passed to interface_vrf_change() even on delete, even though it has no meaning there.

@frrbot frrbot Bot added the bfd label Jan 27, 2026
@github-actions github-actions Bot added size/M and removed size/L labels Jan 27, 2026
@maxime-leroy

Copy link
Copy Markdown
Contributor Author

I've reworked the patch as requested. The diff is now +30/-38 lines compared to the previous +97/-75, which is roughly a 60% reduction in diff size.

I still think that splitting interface_vrf_change() into two functions could result in cleaner code, but I understand that this may not be the preferred approach for this particular fix. For reference, I’ve pushed an alternative version here:
https://github.com/maxime-leroy/frr/commits/zebra_vrf_id_cleanup_split_version/

Thanks for the review and feedback.

@maxime-leroy

Copy link
Copy Markdown
Contributor Author

@greptileai

@greptile-apps

greptile-apps Bot commented Jan 27, 2026

Copy link
Copy Markdown

Greptile Overview

Greptile Summary

This PR refactors VRF handling in zebra to properly respect the dataplane abstraction by using the VRF identifier explicitly provided by dplane_ctx_get_ifp_vrf_id() instead of casting ifindex to vrf_id_t.

Key changes:

  • interface_vrf_change() signature updated to accept a VRF pointer (for DELETE) or vrf_id parameter (for UPDATE), eliminating the implicit assumption that VRF ifindex equals vrf_id
  • For DELETE operations: VRF pointer is captured before if_delete_update() and passed directly, avoiding redundant vrf_lookup_by_id() call
  • For UPDATE operations: Uses dataplane-provided vrf_id from dplane_ctx_get_ifp_vrf_id(ctx) instead of casting ifindex
  • Tracepoint updated to use vrf_id_t instead of ifindex_t for semantic correctness

Impact:

  • No behavior change for Linux kernel dataplane (where vrf_id == VRF ifindex)
  • Enables support for non-kernel dataplanes like Grout (DPDK) where VRF ID ≠ VRF interface ifindex
  • Improves code clarity by making VRF ID handling explicit rather than implicit

Confidence Score: 5/5

  • This PR is safe to merge with no risks identified
  • The refactoring is well-structured across three logical commits, maintains backward compatibility with the Linux kernel dataplane while enabling support for alternative dataplanes. The changes are purely semantic - replacing implicit ifindex-to-vrf_id casts with explicit dataplane-provided values. All call sites are properly updated, and the VRF pointer handling in the delete path is protected by a null check at the call site.
  • No files require special attention

Important Files Changed

Filename Overview
zebra/interface.c Refactored VRF handling to use dataplane-provided vrf_id and pass VRF pointer in delete path to eliminate redundant lookup
zebra/zebra_trace.h Updated tracepoint to use vrf_id instead of ifindex, aligning with semantic changes in interface_vrf_change

Sequence Diagram

sequenceDiagram
    participant DP as Dataplane
    participant ZIDH as zebra_if_dplane_ifp_handling
    participant IVC as interface_vrf_change
    participant VRF as VRF subsystem

    alt VRF Interface Delete
        DP->>ZIDH: DPLANE_OP_INTF_DELETE
        Note over ZIDH: zif_type == ZEBRA_IF_VRF
        ZIDH->>ZIDH: vrf = ifp->vrf (capture before delete)
        ZIDH->>ZIDH: if_delete_update(&ifp)
        ZIDH->>IVC: interface_vrf_change(op, vrf, 0, NULL, 0, ns_id)
        Note over IVC: Use vrf->vrf_id, vrf->name, vrf->data.l.table_id
        IVC->>VRF: vrf_delete(vrf)
    else VRF Interface Update
        DP->>ZIDH: DPLANE_OP_INTF_UPDATE
        Note over ZIDH: zif_type == ZEBRA_IF_VRF
        ZIDH->>ZIDH: vrf_id = dplane_ctx_get_ifp_vrf_id(ctx)
        ZIDH->>ZIDH: tableid = dplane_ctx_get_ifp_table_id(ctx)
        ZIDH->>IVC: interface_vrf_change(op, NULL, vrf_id, name, tableid, ns_id)
        Note over IVC: Use vrf_id from dataplane (no ifindex cast)
        IVC->>VRF: vrf_update(vrf_id, name)
        IVC->>VRF: vrf_enable(vrf)
    end
Loading

@maxime-leroy maxime-leroy requested a review from vjardin January 27, 2026 18:54
@maxime-leroy maxime-leroy force-pushed the zebra_vrf_id_cleanup branch from 775dd4a to 37d493b Compare April 13, 2026 07:13
@maxime-leroy

Copy link
Copy Markdown
Contributor Author

ci:rerun

@maxime-leroy

maxime-leroy commented Apr 21, 2026

Copy link
Copy Markdown
Contributor Author

Hi @mjstapp, gentle ping whenever you have a moment. The patch has been reduced to the minimal version you asked for (+30/-38) since January 27 and is sitting ready.

FWIW, greptile (configured by @mwinter-osr on an unrelated FRR PR) also recommends splitting this kind of function: opensourcerouting#234 (comment)

Thanks!

zebra_if_dplane_ifp_handling() was reading dplane_ctx_get_ifp_table_id()
for VRF events.

In the netlink dplane backend, ifp_table_id is only set via
netlink_vrf_change(), which is invoked from netlink_link_change() for
RTM_NEWLINK VRF events. It is not set for RTM_DELLINK. This causes
interface_vrf_change() to receive table_id as 0 on VRF deletes.

For delete operations, save the VRF pointer before if_delete_update()
and retrieve table_id from vrf->data.l.table_id. For add/update
operations, continue using the dplane-provided table_id.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
@maxime-leroy maxime-leroy force-pushed the zebra_vrf_id_cleanup branch from 37d493b to add4e75 Compare April 21, 2026 12:28
Comment thread zebra/zebra_trace.h Outdated
Comment thread zebra/interface.c
interface_vrf_change() was implicitly assuming that the VRF netdevice
ifindex could be used as zebra's vrf_id.

This assumption holds true for the Linux kernel dataplane, where the VRF
ID is defined as the ifindex of the VRF interface, so this change does
not alter kernel behavior.

However, the dataplane API already exposes both concepts explicitly via
dplane_ctx_get_ifindex() and dplane_ctx_get_ifp_vrf_id(ctx). Using the
proper accessor avoids casting an ifindex to vrf_id_t and better respects
the dataplane abstraction.

On interface updates, the vrf_id provided by the dataplane is now used
directly. On interface deletion (DELLINK), where the dataplane context
may no longer carry vrf information, zebra relies on the existing ifp
state (ifp->vrf->vrf_id) before if_delete_update() is called. The table_id
is also retrieved from ifp->vrf->data.l.table_id since
dplane_ctx_get_ifp_table_id() is not set for RTM_DELLINK events.

The if_vrf_change LTTng tracepoint is moved out of interface_vrf_change()
and into the caller zebra_if_dplane_ifp_handling(), where the original
ifindex from the dplane context is still available. This preserves the
existing tracepoint ABI (field name "ifindex", type ifindex_t) so that
existing trace consumers are not affected.

This is required for non-kernel dataplanes such as Grout (DPDK), where
the VRF ID is not equal to the VRF interface ifindex. In that case, using
the correct vrf_id fixes VRF handling between zebra and the Grout
dataplane.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
In the delete path, we already hold a reference to the vrf structure
before calling if_delete_update(). Passing this pointer directly to
interface_vrf_change() avoids a redundant vrf_lookup_by_id() call and
allows accessing vrf->name, vrf->vrf_id, and vrf->data.l.table_id
directly.

For update operations, the vrf pointer is passed as NULL and the
function continues to use the vrf_id, name, and tableid parameters
from the dataplane context.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
@maxime-leroy maxime-leroy force-pushed the zebra_vrf_id_cleanup branch from add4e75 to 3c06982 Compare May 19, 2026 10:19

@mjstapp mjstapp left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me now

@vjardin vjardin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@mjstapp mjstapp merged commit e5fc994 into FRRouting:master May 20, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants