Skip to content

Add monitor sampling-rate UI, configurable trace fetch page size, and monitor comparison view#1127

Open
nadheesh wants to merge 7 commits into
wso2:mainfrom
nadheesh:eval-monitor-sampling-pagination-compare
Open

Add monitor sampling-rate UI, configurable trace fetch page size, and monitor comparison view#1127
nadheesh wants to merge 7 commits into
wso2:mainfrom
nadheesh:eval-monitor-sampling-pagination-compare

Conversation

@nadheesh

@nadheesh nadheesh commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Three related improvements to eval monitors:

Trace fetch pagination — page size is now configurable and memory-bound (#669)

TraceFetcher previously hardcoded a 1000-trace page size, so up to 1000 fully-parsed traces (with nested spans/payloads) could be held in memory per page. The page size is now a constructor arg (default 10, overridable per call), so peak memory stays bounded regardless of how many traces match a monitor's time window. evaluation-job sets it via a TRACE_FETCH_PAGE_SIZE constant. Also fixed the monitor-start log to print the sampling rate at 2 decimals so values like 0.25 no longer display as 0.2.

Sampling rate UI for monitors (#1126)

The samplingRate field existed in form state and was submitted, but had no visible control — every monitor was silently created with a hidden 25% default. Added a slider (1–100%) in the Data Collection section of the monitor create form, changed the default to 100% (full sampling), and tightened validation to reject 0%.

Compare results across monitors (#1101)

Added a side-by-side monitor comparison view (new compare/:monitorId route), with supporting changes to the monitor view, agent performance card, and evaluation summary card.

Testing

  • amp-evaluation: ruff check / ruff format --check / mypy src — clean; unit tests pass.
  • evaluation-job: ruff check / ruff format --check / mypy main.py — clean; unit tests pass.
  • console: eslint and tsc build pass for the eval, core-ui, and types packages.

Closes #669
Closes #1126
Closes #1101

Summary by CodeRabbit

  • New Features
    • Added “Compare Monitors” page for side-by-side evaluation monitor comparison with radar charts and summary cards.
    • Added a Compare action on monitor detail pages to launch the comparison flow.
  • Bug Fixes
    • Default sampling rate for monitor creation/duplication is now 100%.
    • Sampling rate validation tightened to accept only values between 1–100%.
  • Improvements
    • Enhanced trace fetching with paginated iteration and deterministic trace sampling.
    • Improved evaluation score calculations and radar tooltip/customization options.
  • Tests
    • Expanded coverage for sampling, pagination, and sampling-rate forwarding behavior.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@nadheesh, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 46 minutes and 36 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 26091af6-5218-45cf-8c27-c96f6ff7f486

📥 Commits

Reviewing files that changed from the base of the PR and between 59452e6 and 5e70301.

📒 Files selected for processing (4)
  • console/workspaces/core-ui/src/Route/Route.tsx
  • console/workspaces/core-ui/src/pages/index.tsx
  • console/workspaces/libs/types/src/routes/generated-route.map.ts
  • console/workspaces/libs/types/src/routes/routes.map.ts
📝 Walkthrough

Walkthrough

This PR implements three features: a monitor comparison page that renders side-by-side radar charts and evaluation summaries for two monitors; a sampling-rate slider (defaulting to 100%) for monitor creation with backend Monitor.run plumbing; and iterator-based paginated trace fetching with deterministic SHA-256 trace sampling in the evaluation backend.

Changes

Compare Monitors UI

Layer / File(s) Summary
Shared score utilities and card extensions
console/workspaces/pages/eval/src/utils/monitorScoreUtils.ts, console/workspaces/pages/eval/src/subComponents/AgentPerformanceCard.tsx, console/workspaces/pages/eval/src/subComponents/EvaluationSummaryCard.tsx
Extracts getMean, computeLevelSummaries, computeAverageScore into a new utility module; adds connectNulls, index signature, RadarTooltipContent, optional title/renderTooltipContent to AgentPerformanceCard; adds optional title to EvaluationSummaryCard.
Compare button in ViewMonitor
console/workspaces/pages/eval/src/ViewMonitor.Component.tsx
Imports useListMonitors and shared score utilities; adds compareAnchorEl state and a "Compare" button with dropdown that navigates to the compare route with ?with= param; refactors inline useMemo score computations to use shared utilities.
CompareMonitor page component
console/workspaces/pages/eval/src/CompareMonitor.Component.tsx
New page component reading monitorId/with/sourceTimeRange/targetTimeRange from URL; performs dual monitor metadata and score fetching; computes union radar dataset and series config; renders AgentPerformanceCard with custom tooltip and two EvaluationSummaryCard blocks; shows error Alert when no target is selected.
Route and page registry wiring
console/workspaces/libs/types/src/routes/routes.map.ts, console/workspaces/libs/types/src/routes/generated-route.map.ts, console/workspaces/pages/eval/src/index.ts, console/workspaces/core-ui/src/pages/index.tsx, console/workspaces/core-ui/src/Route/Route.tsx
Registers compare/:monitorId in route maps, exports CompareMonitorComponent and compareMonitor metadata from the eval page index, creates LazyCompareMonitorComponent in the core-ui registry, and adds the <Route> under monitorBase.

Sampling Rate: UI and Backend

Layer / File(s) Summary
Sampling rate slider, schema, and default changes
console/workspaces/pages/eval/src/form/schema.ts, console/workspaces/pages/eval/src/CreateMonitor.Component.tsx, console/workspaces/pages/eval/src/subComponents/CreateMonitorForm.tsx
Tightens samplingRate Zod refinement to > 0; changes duplicate and new-monitor defaults from 25 to 100 and payload conversion fallback from 0 to 100; adds a Slider (1–100%) with marks and validation caption to the form.
Backend sample_rate wiring
libs/amp-evaluation/src/amp_evaluation/runner.py, evaluation-job/main.py, evaluation-job/test_main.py
Adds sample_rate: Optional[float] to Monitor.run with lazy fetch/sample/parse generator; adds TRACE_FETCH_PAGE_SIZE constant, --sampling-rate CLI validation ((0,1]), and sample_rate=args.sampling_rate forwarding in main.py; extends integration tests to assert forwarding and invalid-rate rejection.

Trace Fetch Pagination and Deterministic Sampling

Layer / File(s) Summary
TraceFetcher iterator paging and sample_traces
libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py, libs/amp-evaluation/src/amp_evaluation/trace/__init__.py
Adds page_size to TraceFetcher.__init__, introduces _fetch_page helper, replaces eager fetch_traces with an iterator-based paging loop with deduplication and stop conditions; adds sample_traces with SHA-256 deterministic sampling; exports sample_traces from trace/__init__.py.
Runner lazy iterable consumption
libs/amp-evaluation/src/amp_evaluation/runner.py
Changes _fetch_traces return to Iterable[OTELTrace], updates _evaluate_traces to accept Iterable[Trace] with iterator-driven while loop, materializes traces with list() in Experiment._fetch_and_match_traces.
Pagination and sampling tests
libs/amp-evaluation/tests/test_trace_fetcher.py, libs/amp-evaluation/tests/test_eval_runner.py
Adds TestFetchTracesPagination covering multi-page, cursor, dedup, max_traces, and page-size rules; adds TestSampleTraces for determinism, retention rate, and invalid rates; adds TestMonitorFetchAndSamplePipeline with _FakeFetcher asserting fetch, sample pipeline, and traces= bypass.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(70, 130, 180, 0.5)
    note over EvaluationJob,TraceFetcher: Backend Evaluation Pipeline
    EvaluationJob->>EvaluationJob: validate sampling_rate in (0, 1]
    EvaluationJob->>TraceFetcher: TraceFetcher(page_size=TRACE_FETCH_PAGE_SIZE)
    EvaluationJob->>Monitor: run(start_time, end_time, sample_rate)
    Monitor->>TraceFetcher: fetch_traces(start_time, end_time) → Iterator
    TraceFetcher->>TraceObserverAPI: POST /traces/export page 1
    TraceObserverAPI-->>TraceFetcher: traces + totalCount
    TraceFetcher->>TraceObserverAPI: POST /traces/export page 2 (cursor advanced)
    TraceObserverAPI-->>TraceFetcher: traces + totalCount
    TraceFetcher-->>Monitor: lazy OTELTrace iterator
    Monitor->>Monitor: sample_traces(iterator, sample_rate) → filtered iterator
    Monitor->>Monitor: parse + yield Trace objects
    Monitor->>BaseRunner: _evaluate_traces(Iterable[Trace])
    BaseRunner-->>EvaluationJob: RunResult
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~90 minutes

Possibly related PRs

  • wso2/agent-manager#1102: Adds the duplicateFrom monitor creation mode and pre-fills samplingRate, directly overlapping with this PR's CreateMonitor.Component.tsx sampling-rate default and payload conversion changes.

Poem

🐇 A radar spins two monitors side by side,
Pages of traces now lazily glide,
SHA-256 flips a coin for each trace,
Sliders set sampling at a confident pace,
Compare the webs — let insights reside! 🕸️✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the three main changes: sampling-rate UI, configurable trace fetch page size, and monitor comparison view.
Description check ✅ Passed The PR description clearly outlines the three enhancements, testing performed, and references to closed issues, aligning well with the template structure despite not using formal sections.
Linked Issues check ✅ Passed All code changes directly address the three linked issues: pagination support for trace fetching [#669], sampling rate UI configuration [#1126], and monitor comparison view [#1101].
Out of Scope Changes check ✅ Passed All changes are directly related to the three linked issues with no out-of-scope modifications detected across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
console/workspaces/pages/eval/src/CompareMonitor.Component.tsx (1)

287-291: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Use theme tokens for series colors instead of hardcoded hex values.

Line 287 and Line 291 use hardcoded colors, which can drift from theme variants (including high-contrast modes). Use palette tokens for both series colors.

As per coding guidelines, console/**/*.{ts,tsx,js,jsx} should "Use theme tokens via the sx prop instead of hardcoded colors and spacing values".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@console/workspaces/pages/eval/src/CompareMonitor.Component.tsx` around lines
287 - 291, The sourceColor and targetColor variables contain hardcoded hex color
values (`#3f8cff` and `#f59e0b` respectively) that can drift from theme variants
including high-contrast modes. Replace these hardcoded colors with appropriate
theme palette tokens from the palette object. For sourceColor, it already uses a
fallback to palette?.primary.main, so ensure it consistently uses theme tokens.
For targetColor, instead of the hardcoded `#f59e0b` string, access and use an
appropriate palette token such as palette?.warning.main or another suitable
theme color that maintains high contrast and aligns with the design system's
color tokens.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@console/workspaces/pages/eval/src/CompareMonitor.Component.tsx`:
- Around line 308-312: Remove the nullish coalescing operator that defaults to 0
in the sourceValue and targetValue assignments. Instead of using (meanA ?? 0) *
100 and (meanB ?? 0) * 100, check if meanA and meanB are not null before
multiplying by 100, otherwise preserve null to represent "no data" in the radar
chart. This prevents "all skipped" data from being plotted as a score of 0,
which distorts the comparison. Apply the same fix to the similar code at lines
323-324.
- Around line 150-160: The sourceTimeRange and targetTimeRange useMemo hooks are
casting arbitrary string values from searchParams directly to TraceListTimeRange
enum type without runtime validation. Add a validation function that checks if a
value is a valid TraceListTimeRange enum member before casting it. Apply this
validation to both the sourceTimeRange useMemo (which reads
searchParams.get("sourceTimeRange")) and the targetTimeRange useMemo (which
reads searchParams.get("targetTimeRange")), using it to verify the retrieved
values are legitimate enum values before assignment, and falling back to
TraceListTimeRange.SEVEN_DAYS if validation fails.

In `@console/workspaces/pages/eval/src/CreateMonitor.Component.tsx`:
- Around line 83-85: The sampling rate clamping calculation uses Math.max(0,
...) as the lower bound, but the form schema now requires a minimum sampling
rate of 1 instead of 0. When duplicating older monitors with a sampling rate of
0, this will result in an invalid prefilled value. Change the lower bound in
Math.max from 0 to 1 in the sourceMonitor.samplingRate calculation to align with
the new valid range of 1-100.

In `@evaluation-job/main.py`:
- Around line 563-569: The validation logic for the sampling_rate argument in
the code block starting at line 563 enforces that the value must be in the range
(0, 1] (exclusive of zero, inclusive of one). Find the argparse argument
definition for --sampling-rate and update its help text to accurately reflect
this constraint instead of documenting it as (0.0-1.0), which incorrectly
implies that zero is a valid value. Ensure the help text matches the actual
validation contract being enforced.

In `@libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py`:
- Around line 496-497: Add validation for the page_size parameter immediately
after it is set from either the input parameter or the default self.page_size
value. Check if page_size is less than or equal to zero and raise an appropriate
exception (such as ValueError) with a clear error message indicating that
page_size must be a positive integer. This validation should occur before the
seen_ids set is initialized to fail fast at the API boundary rather than relying
on backend error handling.
- Around line 502-510: The code currently silently returns when encountering a
page containing only previously seen traceId values, which can result in
undercounting traces when the API reports a higher total count. Instead of
silently returning in the condition checking if not new_traces, detect this
pagination stall scenario by comparing whether _total_count indicates more data
exists beyond what has been processed, and surface this condition (through
logging, raising an exception, or other error handling) rather than silently
truncating the iteration. This ensures pagination stalls are visible and don't
silently cause incomplete trace evaluation.
- Around line 499-516: The max_traces check currently happens after the trace is
yielded in the for loop iterating over new_traces, which means when
max_traces=0, one trace is still yielded before the condition triggers and
returns. Move the max_traces check to before the yield statement so that if
max_traces is set to 0 or would be exceeded, the function returns without
yielding any additional traces. Check if max_traces is not None and yielded >=
max_traces before the yield trace line to ensure the limit is respected from the
first trace.

---

Nitpick comments:
In `@console/workspaces/pages/eval/src/CompareMonitor.Component.tsx`:
- Around line 287-291: The sourceColor and targetColor variables contain
hardcoded hex color values (`#3f8cff` and `#f59e0b` respectively) that can drift
from theme variants including high-contrast modes. Replace these hardcoded
colors with appropriate theme palette tokens from the palette object. For
sourceColor, it already uses a fallback to palette?.primary.main, so ensure it
consistently uses theme tokens. For targetColor, instead of the hardcoded
`#f59e0b` string, access and use an appropriate palette token such as
palette?.warning.main or another suitable theme color that maintains high
contrast and aligns with the design system's color tokens.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a6d2212c-c48a-4dde-b70e-5488dfaf8475

📥 Commits

Reviewing files that changed from the base of the PR and between 37cc24c and 4b10b27.

📒 Files selected for processing (20)
  • console/workspaces/core-ui/src/Route/Route.tsx
  • console/workspaces/core-ui/src/pages/index.tsx
  • console/workspaces/libs/types/src/routes/generated-route.map.ts
  • console/workspaces/libs/types/src/routes/routes.map.ts
  • console/workspaces/pages/eval/src/CompareMonitor.Component.tsx
  • console/workspaces/pages/eval/src/CreateMonitor.Component.tsx
  • console/workspaces/pages/eval/src/ViewMonitor.Component.tsx
  • console/workspaces/pages/eval/src/form/schema.ts
  • console/workspaces/pages/eval/src/index.ts
  • console/workspaces/pages/eval/src/subComponents/AgentPerformanceCard.tsx
  • console/workspaces/pages/eval/src/subComponents/CreateMonitorForm.tsx
  • console/workspaces/pages/eval/src/subComponents/EvaluationSummaryCard.tsx
  • console/workspaces/pages/eval/src/utils/monitorScoreUtils.ts
  • evaluation-job/main.py
  • evaluation-job/test_main.py
  • libs/amp-evaluation/src/amp_evaluation/runner.py
  • libs/amp-evaluation/src/amp_evaluation/trace/__init__.py
  • libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py
  • libs/amp-evaluation/tests/test_eval_runner.py
  • libs/amp-evaluation/tests/test_trace_fetcher.py

Comment thread console/workspaces/pages/eval/src/CompareMonitor.Component.tsx Outdated
Comment thread console/workspaces/pages/eval/src/CompareMonitor.Component.tsx
Comment thread console/workspaces/pages/eval/src/CreateMonitor.Component.tsx
Comment thread evaluation-job/main.py
Comment thread libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py
Comment thread libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py Outdated
Comment thread libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow sampling rate to monitors to control cost Ability to compare results across monitors Add pagination support for Eval Monitors

2 participants