zebra: clean up VRF handling by using dataplane provided vrf_id#20318
Conversation
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
54daa00 to
df97d46
Compare
|
ci:rerun |
1 similar comment
|
ci:rerun |
df97d46 to
c772877
Compare
c772877 to
8905597
Compare
8905597 to
bb025aa
Compare
|
Let's wait for the end of the CI just in case. I'd prefer someone external to ack't+merge it since I am a Maxime's colleague. |
mjstapp
left a comment
There was a problem hiding this comment.
there's some confusion here - but I'm not sure that this PR isn't making things harder to understand.
I agree that it's confusing to repeatedly cast a value that appears as an ifindex - that's not a great pattern. but I can imagine a very small couple of lines that would correct that in interface.c - I don't see that reorganizing the whole vrf device change flow is helping.
There are two separate commits in this PR. The second commit is the actual fix: interface_vrf_change() was implicitly assuming that the VRF netdevice ifindex could be used as zebra’s vrf_id (via casting). The first commit is only a cleanup: in the netlink backend ifp_table_id is set for RTM_NEWLINK VRF events but not for RTM_DELLINK, If needed, the first commit can be ditched. However, in my view it is a nice-to-have cleanup that clarifies the delete vs update requirements. |
c52e4a6 to
15e7bb3
Compare
|
yes, I understood the two commits, I think.
|
Thanks for the detailed feedback — I understand the concern about review surface / churn. That said, I’d like to clarify the intent and why I still think the split is justified:
On the positive side, this led to adding CI coverage for --enable-lttng (thanks to @vjardin), which also uncovered an existing LTTng compilation issue already present on master. I fixed that master issue as well. So I agree the diff is larger than the minimal vrf_id accessor change, but it is not arbitrary churn: it makes the delete/update requirements explicit and avoids relying on/tracing unset dataplane fields. The actual behavior change remains isolated in the second commit. |
15e7bb3 to
33f7ada
Compare
33f7ada to
1988a00
Compare
|
@mjstapp: Fair point on the size, but the split makes the contracts explicit (delete doesn't need table_id/ns_id, update does). With commit 1 being pure refactor and commit 2 the isolated fix, I think it lands in a better place for future maintenance. |
1988a00 to
9cf9fc8
Compare
|
@mjstapp I added the extra {} blocks in interface_vrf_update() / interface_vrf_del() I’ve added a follow-up commit that removes these redundant blocks and runs Hope this makes the review easier on your side. |
|
ci:rerun |
|
@donaldsharp and @mjstapp : I had direct discussion with @maxime-leroy about it. I do not see additional issues with this pull request. Can you argue if there is a strong concern ? if not, let's merge, it is a hurt-less clean up that helps to have a generic support. |
|
I think I've been pretty clear about the issue. A useful 10-line change has turned into a 100-line change ... and that just seems like something that's unnecessary.
|
@mjstapp , I hear you about keeping this small. While working on this PR I noticed two related issues in the VRF delete path:
The function split was my way to avoid using/printing a meaningless value there. If you prefer, I can instead keep the change minimal and simply stop tracing table-id on delete. |
9cf9fc8 to
b047b04
Compare
b047b04 to
50b4ac7
Compare
50b4ac7 to
775dd4a
Compare
|
I've reworked the patch as requested. The diff is now +30/-38 lines compared to the previous +97/-75, which is roughly a 60% reduction in diff size. I still think that splitting interface_vrf_change() into two functions could result in cleaner code, but I understand that this may not be the preferred approach for this particular fix. For reference, I’ve pushed an alternative version here: Thanks for the review and feedback. |
Greptile OverviewGreptile SummaryThis PR refactors VRF handling in zebra to properly respect the dataplane abstraction by using the VRF identifier explicitly provided by Key changes:
Impact:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant DP as Dataplane
participant ZIDH as zebra_if_dplane_ifp_handling
participant IVC as interface_vrf_change
participant VRF as VRF subsystem
alt VRF Interface Delete
DP->>ZIDH: DPLANE_OP_INTF_DELETE
Note over ZIDH: zif_type == ZEBRA_IF_VRF
ZIDH->>ZIDH: vrf = ifp->vrf (capture before delete)
ZIDH->>ZIDH: if_delete_update(&ifp)
ZIDH->>IVC: interface_vrf_change(op, vrf, 0, NULL, 0, ns_id)
Note over IVC: Use vrf->vrf_id, vrf->name, vrf->data.l.table_id
IVC->>VRF: vrf_delete(vrf)
else VRF Interface Update
DP->>ZIDH: DPLANE_OP_INTF_UPDATE
Note over ZIDH: zif_type == ZEBRA_IF_VRF
ZIDH->>ZIDH: vrf_id = dplane_ctx_get_ifp_vrf_id(ctx)
ZIDH->>ZIDH: tableid = dplane_ctx_get_ifp_table_id(ctx)
ZIDH->>IVC: interface_vrf_change(op, NULL, vrf_id, name, tableid, ns_id)
Note over IVC: Use vrf_id from dataplane (no ifindex cast)
IVC->>VRF: vrf_update(vrf_id, name)
IVC->>VRF: vrf_enable(vrf)
end
|
775dd4a to
37d493b
Compare
|
ci:rerun |
|
Hi @mjstapp, gentle ping whenever you have a moment. The patch has been reduced to the minimal version you asked for (+30/-38) since January 27 and is sitting ready. FWIW, greptile (configured by @mwinter-osr on an unrelated FRR PR) also recommends splitting this kind of function: opensourcerouting#234 (comment) Thanks! |
zebra_if_dplane_ifp_handling() was reading dplane_ctx_get_ifp_table_id() for VRF events. In the netlink dplane backend, ifp_table_id is only set via netlink_vrf_change(), which is invoked from netlink_link_change() for RTM_NEWLINK VRF events. It is not set for RTM_DELLINK. This causes interface_vrf_change() to receive table_id as 0 on VRF deletes. For delete operations, save the VRF pointer before if_delete_update() and retrieve table_id from vrf->data.l.table_id. For add/update operations, continue using the dplane-provided table_id. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
37d493b to
add4e75
Compare
interface_vrf_change() was implicitly assuming that the VRF netdevice ifindex could be used as zebra's vrf_id. This assumption holds true for the Linux kernel dataplane, where the VRF ID is defined as the ifindex of the VRF interface, so this change does not alter kernel behavior. However, the dataplane API already exposes both concepts explicitly via dplane_ctx_get_ifindex() and dplane_ctx_get_ifp_vrf_id(ctx). Using the proper accessor avoids casting an ifindex to vrf_id_t and better respects the dataplane abstraction. On interface updates, the vrf_id provided by the dataplane is now used directly. On interface deletion (DELLINK), where the dataplane context may no longer carry vrf information, zebra relies on the existing ifp state (ifp->vrf->vrf_id) before if_delete_update() is called. The table_id is also retrieved from ifp->vrf->data.l.table_id since dplane_ctx_get_ifp_table_id() is not set for RTM_DELLINK events. The if_vrf_change LTTng tracepoint is moved out of interface_vrf_change() and into the caller zebra_if_dplane_ifp_handling(), where the original ifindex from the dplane context is still available. This preserves the existing tracepoint ABI (field name "ifindex", type ifindex_t) so that existing trace consumers are not affected. This is required for non-kernel dataplanes such as Grout (DPDK), where the VRF ID is not equal to the VRF interface ifindex. In that case, using the correct vrf_id fixes VRF handling between zebra and the Grout dataplane. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
In the delete path, we already hold a reference to the vrf structure before calling if_delete_update(). Passing this pointer directly to interface_vrf_change() avoids a redundant vrf_lookup_by_id() call and allows accessing vrf->name, vrf->vrf_id, and vrf->data.l.table_id directly. For update operations, the vrf pointer is passed as NULL and the function continues to use the vrf_id, name, and tableid parameters from the dataplane context. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
add4e75 to
3c06982
Compare
mjstapp
left a comment
There was a problem hiding this comment.
Thanks, looks good to me now
Zebra was implicitly assuming that the VRF netdevice ifindex could be used as the VRF identifier, leading to casts from ifindex to vrf_id_t. While this works with the Linux kernel dataplane, it does not respect the dataplane API, which exposes these identifiers as distinct concepts.
The code is updated to use the VRF identifier explicitly provided by the dataplane context and removes the need for casting ifindex to vrf_id_t. In addition, the VRF delete path is clarified by no longer reading unused dataplane fields, avoiding confusion about which identifiers are actually required.
There is no behavior change for the Linux kernel dataplane.