Skip to content

Pcep more testing#256

Open
suphawitwanl wants to merge 3 commits into
aitest-base-pr-21896from
aitest-sw-pr-21896
Open

Pcep more testing#256
suphawitwanl wants to merge 3 commits into
aitest-base-pr-21896from
aitest-sw-pr-21896

Conversation

@suphawitwanl

Copy link
Copy Markdown

Fix a crash in pathd when it receives commands in a certain order, add a test that provides more coverage to pathd and pceplib

donaldsharp and others added 3 commits May 14, 2026 15:33
The validation code crashes, because the validation
code is asking for the nb code to assert if the path
does not exist yet.  Modify the code such that the
non-existence of the policy yet is not a crashable
event in validation.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently the testing coverage of pathd/pceplib is abysmmally small.
Let's add a bit more testing to get greater code coverage of this
code.  This test suite adds a basic harness to allow for interactions
with pceplib as well.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
config.json details

Settings:
- Strictness 2, status check enabled, re-reviews on every push
- Comment types: logic, syntax, best_practices, security

Rule structure:
Each rule has an id, description, scope, and severity.
- Scope: glob patterns that limit which files the rule applies to
  (e.g. **/*.c, **/*.yang, lib/**)
- Severity: high (P0/P1 — must fix), medium (P2 — should fix), low
  (NOTE — suggestion)

Rules cover:
- Memory safety: XMALLOC/XCALLOC/XFREE enforcement, no raw malloc/free
- String safety: no strcpy/strcat/sprintf, enforce strlcpy/snprintf
- Banned functions: no system(), no fork()+exec()
  - Added popen() to banned-functions rule (same issue with SIGINT to be
    ignored and break daemon shutdown)
- Typesafe containers: reject legacy linklist.h/hash.h, enforce lib/typesafe.h
- CLI conventions: DEFPY required, YANG model required for YANGified daemons
- YANG/northbound: API-agnostic callbacks, JSON output backed by YANG models
- Packet parsing: bounds checking on all network message parsing
- RCU safety: lock discipline, no direct XFREE on RCU-protected data
- Logging: zlog_* only, debug guarded by CLI flags
- Licensing: SPDX headers, GPL compatibility checks
- Formatting: include order, whitespace discipline, printfrr usage
- Commit messages: subsystem prefix, imperative mood, Signed-off-by
- Topotest coverage (P0): blocks merge if daemon code changes without
  tests/topotests/ changes
- User docs (P0): blocks merge if CLI/YANG changes without doc/user/ updates
  - Mark User docs as Prority 1 checks to run first
  - Flag doc updates as mandatory for features
- Topotest coverage: check for coverage as priority 2 (after used docs)
  - Flag Topotests coverage as mandatory for features

Signed-off-by: Suphawit Wanlaung <suphawitw@netdef.org>
@greptile-apps

greptile-apps Bot commented May 14, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a crash in pathd's northbound config callback when a policy name is set at the same time as the parent policy is created (VALIDATE fires before APPLY, so the policy isn't yet in the running config), and adds a new topotest that brings the PCEP PCC session up against a lightweight Python PCE simulator and tears it down.

  • pathd/path_nb_config.c: changes nb_running_get_entry(..., true) to false in pathd_srte_policy_name_modify and adds a policy && guard in the VALIDATE branch — correct fix, but the subsequent APPLY branch still dereferences policy without a NULL check.
  • tests/topotests/pathd_pcep_topo1/: new topotest directory with a Python PCE simulator (pce_sim.py), FRR config, and a pytest file covering session-up, message-counter verification, standalone pcep_pcc binary, and session-down-on-PCE-death scenarios. One test function deliberately skips show sr-te pcep counters because that command crashes pathd (pre-existing bug, documented inline).

Confidence Score: 3/5

The C change fixes a real crash but leaves a residual NULL dereference risk in the APPLY path; the topotest additions are well-structured with no blocking concerns.

The core bug fix in path_nb_config.c converts a controlled abort (abort_if_not_found=true) into a silent potential NULL dereference on the APPLY code path. In normal operation the policy entry will be present during APPLY, so the crash won't reproduce under typical conditions — but the code no longer has an explicit guard. The topotest files are solid: they use json_cmp, run_and_expect, and the pytest framework correctly, and they cover session lifecycle end-to-end.

pathd/path_nb_config.c — specifically the APPLY branch of pathd_srte_policy_name_modify around the strlcpy call.

Important Files Changed

Filename Overview
pathd/path_nb_config.c Bug fix for crash in pathd_srte_policy_name_modify during NB_EV_VALIDATE on a not-yet-applied policy. The VALIDATE path is correctly fixed but the APPLY path now dereferences policy without a NULL guard.
tests/topotests/pathd_pcep_topo1/test_pathd_pcep_topo1.py New topotest exercising pathd PCEP PCC session lifecycle; uses json_cmp, run_and_expect, and pytest framework correctly. One test deliberately avoids a crashing command, documented with an inline comment.
tests/topotests/pathd_pcep_topo1/pce/pce_sim.py New minimal PCEP PCE simulator script; correctly implements Open/Keepalive exchange with Stateful+SR TLVs; threaded per-connection model is appropriate for a test helper.
tests/topotests/pathd_pcep_topo1/r1/frr.conf New FRR configuration for the topotest router; defines SR-TE policy, segment-list, and PCEP PCE config matching the simulator parameters.
tests/topotests/pathd_pcep_topo1/init.py Empty Python package marker; required by pytest for topotest directory discovery.

Sequence Diagram

sequenceDiagram
    participant NB as FRR Northbound
    participant CB as name_modify callback
    participant RC as Running Config

    Note over NB,RC: BEFORE fix — crash on first-time policy name set
    NB->>CB: NB_EV_VALIDATE
    CB->>RC: "nb_running_get_entry abort=true"
    RC-->>CB: NULL (policy not yet applied)
    CB->>CB: assert/abort crash

    Note over NB,RC: AFTER fix
    NB->>CB: NB_EV_VALIDATE
    CB->>RC: "nb_running_get_entry abort=false"
    RC-->>CB: NULL
    CB->>CB: "policy==NULL so name not fixed, return NB_OK"
    NB->>CB: NB_EV_APPLY
    CB->>RC: "nb_running_get_entry abort=false"
    RC-->>CB: policy ptr (parent already applied)
    CB->>CB: "strlcpy policy->name — no NULL guard"

    Note over NB,CB: topotest PCEP session lifecycle
    participant R1 as FRR r1 pathd PCC
    participant PCE as pce_sim.py

    R1->>PCE: TCP connect port 4189
    R1->>PCE: PCEP Open with SR and Stateful TLVs
    PCE->>R1: PCEP Open with SR and Stateful TLVs
    PCE->>R1: Keepalive ack PCC Open
    R1->>PCE: Keepalive ack PCE Open
    Note over R1,PCE: Session UP verified by test
    loop Periodic keepalives
        R1->>PCE: Keepalive
        PCE->>R1: Keepalive
    end
    Note over PCE: pce_sim killed
    PCE--xR1: TCP closed or dead timer expires
    Note over R1: pcepSessionsConnected drops to 0
Loading

Comments Outside Diff (2)

  1. pathd/path_nb_config.c, line 303-307 (link)

    P1 Unguarded NULL dereference in NB_EV_APPLY path

    nb_running_get_entry(..., false) is now used for both VALIDATE and APPLY events. The VALIDATE path is correctly guarded with if (policy && ...). However, the APPLY path at line 304 calls strlcpy(policy->name, ...) with no NULL check on policy. If nb_running_get_entry returns NULL during APPLY — which, while unlikely in normal flow, can happen if the parent policy's APPLY step was somehow skipped or failed — this silently segfaults instead of the explicit abort-with-diagnostics that the original true flag provided. The proper fix is to look up the entry with true only in the APPLY branch, or to add an explicit NULL check with flog_err + return NB_ERR_RESOURCE before the strlcpy.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: pathd/path_nb_config.c
    Line: 303-307
    
    Comment:
    **Unguarded NULL dereference in NB_EV_APPLY path**
    
    `nb_running_get_entry(..., false)` is now used for both VALIDATE and APPLY events. The VALIDATE path is correctly guarded with `if (policy && ...)`. However, the APPLY path at line 304 calls `strlcpy(policy->name, ...)` with no NULL check on `policy`. If `nb_running_get_entry` returns NULL during APPLY — which, while unlikely in normal flow, can happen if the parent policy's APPLY step was somehow skipped or failed — this silently segfaults instead of the explicit abort-with-diagnostics that the original `true` flag provided. The proper fix is to look up the entry with `true` only in the APPLY branch, or to add an explicit NULL check with `flog_err` + `return NB_ERR_RESOURCE` before the `strlcpy`.
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. tests/topotests/pathd_pcep_topo1/test_pathd_pcep_topo1.py, line 1246-1255 (link)

    P2 Known crash in show sr-te pcep counters left unexercised

    The comment at lines 1248–1255 documents that show sr-te pcep counters crashes pathd in mt_count_free() on ASAN/development builds via path_pcep_cli.c::path_pcep_cli_show_srte_pcep_counters. The test works around this by using the textual per-session output instead. While the workaround is reasonable for unblocking test landing, the underlying memory-management bug in copy_counter_group() should be tracked and fixed so this test can be strengthened to validate counter values directly.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: tests/topotests/pathd_pcep_topo1/test_pathd_pcep_topo1.py
    Line: 1246-1255
    
    Comment:
    **Known crash in `show sr-te pcep counters` left unexercised**
    
    The comment at lines 1248–1255 documents that `show sr-te pcep counters` crashes pathd in `mt_count_free()` on ASAN/development builds via `path_pcep_cli.c::path_pcep_cli_show_srte_pcep_counters`. The test works around this by using the textual per-session output instead. While the workaround is reasonable for unblocking test landing, the underlying memory-management bug in `copy_counter_group()` should be tracked and fixed so this test can be strengthened to validate counter values directly.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
pathd/path_nb_config.c:303-307
**Unguarded NULL dereference in NB_EV_APPLY path**

`nb_running_get_entry(..., false)` is now used for both VALIDATE and APPLY events. The VALIDATE path is correctly guarded with `if (policy && ...)`. However, the APPLY path at line 304 calls `strlcpy(policy->name, ...)` with no NULL check on `policy`. If `nb_running_get_entry` returns NULL during APPLY — which, while unlikely in normal flow, can happen if the parent policy's APPLY step was somehow skipped or failed — this silently segfaults instead of the explicit abort-with-diagnostics that the original `true` flag provided. The proper fix is to look up the entry with `true` only in the APPLY branch, or to add an explicit NULL check with `flog_err` + `return NB_ERR_RESOURCE` before the `strlcpy`.

### Issue 2 of 2
tests/topotests/pathd_pcep_topo1/test_pathd_pcep_topo1.py:1246-1255
**Known crash in `show sr-te pcep counters` left unexercised**

The comment at lines 1248–1255 documents that `show sr-te pcep counters` crashes pathd in `mt_count_free()` on ASAN/development builds via `path_pcep_cli.c::path_pcep_cli_show_srte_pcep_counters`. The test works around this by using the textual per-session output instead. While the workaround is reasonable for unblocking test landing, the underlying memory-management bug in `copy_counter_group()` should be tracked and fixed so this test can be strengthened to validate counter values directly.

Reviews (1): Last reviewed commit: "greptile: Add Greptile configuration for..." | Re-trigger Greptile

@suphawitwanl

Copy link
Copy Markdown
Author

@greptileai what is the rules that list in ./greptile

@greptile-apps

greptile-apps Bot commented May 14, 2026

Copy link
Copy Markdown

There is no .greptile directory or file in the opensourcerouting/frr repository. It doesn't appear to exist.

If you're referring to Greptile's custom review rules, those are typically defined in .greptile/rules.md. You can create one to customize how I review code — see the docs at greptile.com/docs/code-review/custom-standards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants