Add monitor sampling-rate UI, configurable trace fetch page size, and monitor comparison view#1127
Add monitor sampling-rate UI, configurable trace fetch page size, and monitor comparison view#1127nadheesh wants to merge 7 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 46 minutes and 36 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR implements three features: a monitor comparison page that renders side-by-side radar charts and evaluation summaries for two monitors; a sampling-rate slider (defaulting to 100%) for monitor creation with backend ChangesCompare Monitors UI
Sampling Rate: UI and Backend
Trace Fetch Pagination and Deterministic Sampling
Sequence Diagram(s)sequenceDiagram
rect rgba(70, 130, 180, 0.5)
note over EvaluationJob,TraceFetcher: Backend Evaluation Pipeline
EvaluationJob->>EvaluationJob: validate sampling_rate in (0, 1]
EvaluationJob->>TraceFetcher: TraceFetcher(page_size=TRACE_FETCH_PAGE_SIZE)
EvaluationJob->>Monitor: run(start_time, end_time, sample_rate)
Monitor->>TraceFetcher: fetch_traces(start_time, end_time) → Iterator
TraceFetcher->>TraceObserverAPI: POST /traces/export page 1
TraceObserverAPI-->>TraceFetcher: traces + totalCount
TraceFetcher->>TraceObserverAPI: POST /traces/export page 2 (cursor advanced)
TraceObserverAPI-->>TraceFetcher: traces + totalCount
TraceFetcher-->>Monitor: lazy OTELTrace iterator
Monitor->>Monitor: sample_traces(iterator, sample_rate) → filtered iterator
Monitor->>Monitor: parse + yield Trace objects
Monitor->>BaseRunner: _evaluate_traces(Iterable[Trace])
BaseRunner-->>EvaluationJob: RunResult
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~90 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
console/workspaces/pages/eval/src/CompareMonitor.Component.tsx (1)
287-291: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winUse theme tokens for series colors instead of hardcoded hex values.
Line 287 and Line 291 use hardcoded colors, which can drift from theme variants (including high-contrast modes). Use palette tokens for both series colors.
As per coding guidelines,
console/**/*.{ts,tsx,js,jsx}should "Use theme tokens via thesxprop instead of hardcoded colors and spacing values".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@console/workspaces/pages/eval/src/CompareMonitor.Component.tsx` around lines 287 - 291, The sourceColor and targetColor variables contain hardcoded hex color values (`#3f8cff` and `#f59e0b` respectively) that can drift from theme variants including high-contrast modes. Replace these hardcoded colors with appropriate theme palette tokens from the palette object. For sourceColor, it already uses a fallback to palette?.primary.main, so ensure it consistently uses theme tokens. For targetColor, instead of the hardcoded `#f59e0b` string, access and use an appropriate palette token such as palette?.warning.main or another suitable theme color that maintains high contrast and aligns with the design system's color tokens.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@console/workspaces/pages/eval/src/CompareMonitor.Component.tsx`:
- Around line 308-312: Remove the nullish coalescing operator that defaults to 0
in the sourceValue and targetValue assignments. Instead of using (meanA ?? 0) *
100 and (meanB ?? 0) * 100, check if meanA and meanB are not null before
multiplying by 100, otherwise preserve null to represent "no data" in the radar
chart. This prevents "all skipped" data from being plotted as a score of 0,
which distorts the comparison. Apply the same fix to the similar code at lines
323-324.
- Around line 150-160: The sourceTimeRange and targetTimeRange useMemo hooks are
casting arbitrary string values from searchParams directly to TraceListTimeRange
enum type without runtime validation. Add a validation function that checks if a
value is a valid TraceListTimeRange enum member before casting it. Apply this
validation to both the sourceTimeRange useMemo (which reads
searchParams.get("sourceTimeRange")) and the targetTimeRange useMemo (which
reads searchParams.get("targetTimeRange")), using it to verify the retrieved
values are legitimate enum values before assignment, and falling back to
TraceListTimeRange.SEVEN_DAYS if validation fails.
In `@console/workspaces/pages/eval/src/CreateMonitor.Component.tsx`:
- Around line 83-85: The sampling rate clamping calculation uses Math.max(0,
...) as the lower bound, but the form schema now requires a minimum sampling
rate of 1 instead of 0. When duplicating older monitors with a sampling rate of
0, this will result in an invalid prefilled value. Change the lower bound in
Math.max from 0 to 1 in the sourceMonitor.samplingRate calculation to align with
the new valid range of 1-100.
In `@evaluation-job/main.py`:
- Around line 563-569: The validation logic for the sampling_rate argument in
the code block starting at line 563 enforces that the value must be in the range
(0, 1] (exclusive of zero, inclusive of one). Find the argparse argument
definition for --sampling-rate and update its help text to accurately reflect
this constraint instead of documenting it as (0.0-1.0), which incorrectly
implies that zero is a valid value. Ensure the help text matches the actual
validation contract being enforced.
In `@libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py`:
- Around line 496-497: Add validation for the page_size parameter immediately
after it is set from either the input parameter or the default self.page_size
value. Check if page_size is less than or equal to zero and raise an appropriate
exception (such as ValueError) with a clear error message indicating that
page_size must be a positive integer. This validation should occur before the
seen_ids set is initialized to fail fast at the API boundary rather than relying
on backend error handling.
- Around line 502-510: The code currently silently returns when encountering a
page containing only previously seen traceId values, which can result in
undercounting traces when the API reports a higher total count. Instead of
silently returning in the condition checking if not new_traces, detect this
pagination stall scenario by comparing whether _total_count indicates more data
exists beyond what has been processed, and surface this condition (through
logging, raising an exception, or other error handling) rather than silently
truncating the iteration. This ensures pagination stalls are visible and don't
silently cause incomplete trace evaluation.
- Around line 499-516: The max_traces check currently happens after the trace is
yielded in the for loop iterating over new_traces, which means when
max_traces=0, one trace is still yielded before the condition triggers and
returns. Move the max_traces check to before the yield statement so that if
max_traces is set to 0 or would be exceeded, the function returns without
yielding any additional traces. Check if max_traces is not None and yielded >=
max_traces before the yield trace line to ensure the limit is respected from the
first trace.
---
Nitpick comments:
In `@console/workspaces/pages/eval/src/CompareMonitor.Component.tsx`:
- Around line 287-291: The sourceColor and targetColor variables contain
hardcoded hex color values (`#3f8cff` and `#f59e0b` respectively) that can drift
from theme variants including high-contrast modes. Replace these hardcoded
colors with appropriate theme palette tokens from the palette object. For
sourceColor, it already uses a fallback to palette?.primary.main, so ensure it
consistently uses theme tokens. For targetColor, instead of the hardcoded
`#f59e0b` string, access and use an appropriate palette token such as
palette?.warning.main or another suitable theme color that maintains high
contrast and aligns with the design system's color tokens.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a6d2212c-c48a-4dde-b70e-5488dfaf8475
📒 Files selected for processing (20)
console/workspaces/core-ui/src/Route/Route.tsxconsole/workspaces/core-ui/src/pages/index.tsxconsole/workspaces/libs/types/src/routes/generated-route.map.tsconsole/workspaces/libs/types/src/routes/routes.map.tsconsole/workspaces/pages/eval/src/CompareMonitor.Component.tsxconsole/workspaces/pages/eval/src/CreateMonitor.Component.tsxconsole/workspaces/pages/eval/src/ViewMonitor.Component.tsxconsole/workspaces/pages/eval/src/form/schema.tsconsole/workspaces/pages/eval/src/index.tsconsole/workspaces/pages/eval/src/subComponents/AgentPerformanceCard.tsxconsole/workspaces/pages/eval/src/subComponents/CreateMonitorForm.tsxconsole/workspaces/pages/eval/src/subComponents/EvaluationSummaryCard.tsxconsole/workspaces/pages/eval/src/utils/monitorScoreUtils.tsevaluation-job/main.pyevaluation-job/test_main.pylibs/amp-evaluation/src/amp_evaluation/runner.pylibs/amp-evaluation/src/amp_evaluation/trace/__init__.pylibs/amp-evaluation/src/amp_evaluation/trace/fetcher.pylibs/amp-evaluation/tests/test_eval_runner.pylibs/amp-evaluation/tests/test_trace_fetcher.py
Summary
Three related improvements to eval monitors:
Trace fetch pagination — page size is now configurable and memory-bound (#669)
TraceFetcherpreviously hardcoded a 1000-trace page size, so up to 1000 fully-parsed traces (with nested spans/payloads) could be held in memory per page. The page size is now a constructor arg (default 10, overridable per call), so peak memory stays bounded regardless of how many traces match a monitor's time window.evaluation-jobsets it via aTRACE_FETCH_PAGE_SIZEconstant. Also fixed the monitor-start log to print the sampling rate at 2 decimals so values like0.25no longer display as0.2.Sampling rate UI for monitors (#1126)
The
samplingRatefield existed in form state and was submitted, but had no visible control — every monitor was silently created with a hidden 25% default. Added a slider (1–100%) in the Data Collection section of the monitor create form, changed the default to 100% (full sampling), and tightened validation to reject 0%.Compare results across monitors (#1101)
Added a side-by-side monitor comparison view (new
compare/:monitorIdroute), with supporting changes to the monitor view, agent performance card, and evaluation summary card.Testing
amp-evaluation:ruff check/ruff format --check/mypy src— clean; unit tests pass.evaluation-job:ruff check/ruff format --check/mypy main.py— clean; unit tests pass.console:eslintandtscbuild pass for the eval, core-ui, and types packages.Closes #669
Closes #1126
Closes #1101
Summary by CodeRabbit