EVA Apps

Streamlit applications for exploring EVA results.

Analysis App

Interactive dashboard for visualizing and comparing results.

Usage

streamlit run apps/analysis.py

By default, the app looks for runs in the output/ directory. You can change this in the sidebar or by setting the EVA_OUTPUT_DIR environment variable:

EVA_OUTPUT_DIR=path/to/results streamlit run apps/analysis.py

Views

Cross-Run Comparison — Compare aggregate metrics across multiple runs. Filter by model, provider, and pipeline type. Includes an EVA scatter plot (accuracy vs. experience) and per-metric bar charts.

Run Overview — Drill into a single run: per-category metric breakdowns, score distributions, and a full records table with per-metric scores.

Record Detail — Deep-dive into individual conversation records:

Audio playback (mixed recording)
Transcript with color-coded speaker turns
Metric scores with explanations
Conversation trace: tool calls, LLM calls, and audit log entries with a timeline view
Database state diff (expected vs. actual)
User goal, persona, and ground truth from the evaluation record

Sidebar Navigation

Output Directory — Path to the directory containing run folders
View — Switch between the three views above
Run Selection — Pick a run (with metadata summary)
Record Selection — Pick a record within the selected run
Trial Selection — If a record has multiple trials, pick one

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVA Apps

Analysis App

Usage

Views

Sidebar Navigation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

EVA Apps

Analysis App

Usage

Views

Sidebar Navigation