diff --git a/benchmarks/prompt_eval/.env.example b/benchmarks/prompt_eval/.env.example new file mode 100644 index 00000000..4133831a --- /dev/null +++ b/benchmarks/prompt_eval/.env.example @@ -0,0 +1,4 @@ + +BASE_URLS=https://example/v1/;https://example2/v1/; +API_KEYS=sk-api_key1;sk-api_key2; +MODELS=model1;model2; diff --git a/benchmarks/prompt_eval/Report_query_decomposition.md b/benchmarks/prompt_eval/Report_query_decomposition.md new file mode 100644 index 00000000..ad8a3e2f --- /dev/null +++ b/benchmarks/prompt_eval/Report_query_decomposition.md @@ -0,0 +1,63 @@ +# Query Reformulation Prompt Evaluation + +**Dataset:** [datasets/query_decomposition.json](datasets/query_decomposition.json) (80 cases — 20 D1, 40 D2, 20 D3) +**Scope:** reformulation + decomposition only. Temporal filters are out of scope (see [Report_temporal_filter.md](Report_temporal_filter.md)). + +## Dataset + +| Tier | # Cases | Description | +|------|--------:|-------------| +| **D1** | 20 | Standalone queries that do **not** need decomposition. | +| **D2** | 40 | Queries that **definitely need** decomposition, 2 to 4 sub-queries. Clear signals: distinct entities, distinct time periods, unrelated dimensions, or exclusions. | +| **D3** | 20 | **Ambiguous** queries. Surface features suggest decomposition (conjunctions, comparisons) but the semantics may require one retrieval or several — e.g. trends, interactions, joint effects, multi-attribute-for-one-subject. | + +20 domains (finance, healthcare, legal, engineering, science, HR, education, marketing, real_estate, technology, environment, logistics, agriculture, energy, manufacturing, public_policy, retail, telecommunications, insurance, aviation), each represented 3–5 times across tiers. + +Gold labels live in `expected_queries` (shape: `SearchQueries`, so it deserializes directly into the pipeline's Pydantic model). Relative dates in `query` strings are pre-resolved to the dataset's current date (2026-04-17); `temporal_filters` is null throughout. + +## Metrics + +1. **decomposition_count_matching** — `len(generated.query_list) == len(expected.query_list)`. +2. **decomposition_semantic_coverage** — LLM-as-judge boolean: do the generated sub-queries, taken together, cover every expected sub-query (count-insensitive, order-insensitive)? When coverage is incomplete, the judge returns a short reasoning naming the missing expected sub-query. + +Reported slices: overall and per-difficulty (D1 / D2 / D3). + +## Results + +Full raw output: [results/result_query_decomposition.json](results/result_query_decomposition.json). Judge: `Qwen3-VL-8B-Instruct-FP8`. + +**Overall** + +| Prompt | Model | count_match | semantic_coverage | +|---|---|---|---| +| v0 | Mistral-Small-3.1-24B-Instruct-2503 | 66/80 (82.5%) | 74/80 (92.5%) | +| v0 | Qwen3-VL-8B-Instruct-FP8 | 58/80 (72.5%) | 69/80 (86.2%) | +| **v1** | **Mistral-Small-3.1-24B-Instruct-2503** | **69/80 (86.2%)** | **76/80 (95.0%)** | +| v1 | Qwen3-VL-8B-Instruct-FP8 | 66/80 (82.5%) | 71/80 (88.8%) | + +**Per-difficulty (count_match · semantic_coverage)** + +| Prompt | Model | D1 (n=20) | D2 (n=40) | D3 (n=20) | +|---|---|---|---|---| +| v0 | Mistral-Small | 19/20 · 20/20 | 39/40 · 38/40 | **8/20** · 16/20 | +| v0 | Qwen3-VL-8B | 20/20 · 20/20 | 28/40 · 35/40 | 10/20 · 14/20 | +| v1 | Mistral-Small | 16/20 · 20/20 | 38/40 · 38/40 | **15/20** · 18/20 | +| v1 | Qwen3-VL-8B | 20/20 · 20/20 | 35/40 · 36/40 | 11/20 · 15/20 | + +### v0 vs v1 — Mistral-Small + +- **v1 handles D3 much better than v0** (D3 count_match 8 vs 15): v0 splits "interaction / joint-effect / multi-attribute-for-one-subject" questions that should stay as one (id 68, 69, 71, 75, 79, 80), while v1 keeps them together. v1 also correctly splits the two bounded-range trend cases (id 61, 65) per the "evolution / trend over a bounded range" rule. +- **v1 is slightly weaker on D1** (19 vs 16). Chat-history turns pull prior-turn topics into the reformulation and trigger spurious splits (id 2, 5, 11). +- **Shared failures on both prompts**: id 30 (Lambda/GCF collapsed), id 63 (Kafka/RabbitMQ — prior gRPC turn leaks in), id 66 (HR onboarding under-split), id 70 (air/sea freight), id 74 (EASA/FAA under-split). + +### v0 vs v1 — Qwen3-VL-8B + +- v1 gains on count_match (+8) and coverage (+2): mostly D2 (28 vs 35) as the explicit split rules embolden Qwen to separate multi-entity/region/time-period queries it previously collapsed. +- Qwen still under-splits comparative questions and misses both bounded-range trend cases (id 61, 65) — it keeps them as one query despite the rule. + +### Common hard cases (both models, both prompts) + +- **id 30** — "Compare AWS Lambda and Google Cloud Functions" stays as a single comparison query. +- **id 63** — "Kafka vs RabbitMQ": earlier gRPC turn in the history contaminates the reformulation. +- **id 70** — "air freight vs sea freight": emitted as a single comparison instead of two independent lookups. +- **id 74** — "EASA vs FAA certification": same pattern. diff --git a/benchmarks/prompt_eval/Report_temporal_filter.md b/benchmarks/prompt_eval/Report_temporal_filter.md new file mode 100644 index 00000000..48a3d949 --- /dev/null +++ b/benchmarks/prompt_eval/Report_temporal_filter.md @@ -0,0 +1,78 @@ +# Temporal Filter Generation Prompt Evaluation (v2) + +**Dataset:** [datasets/temporal_filter.json](datasets/temporal_filter.json) (40 cases — 20 positive, 20 negative) +**Scope:** whether the model emits `temporal_filters` when (and only when) it should, and whether the emitted predicates are correct. Decomposition is out of scope (see [Report_query_decomposition.md](Report_query_decomposition.md)). + +## Dataset + +Each case has the minimal schema `{id, messages, query_with_temporal_filter}`: + +| Class | # Cases | Description | +|------|--------:|-------------| +| **Positive** (`true`) | 20 | User restricts by document creation/authoring/publication date. Covers all resolution rules: today, yesterday, this/last week, this/last month, this/last year, past N days/weeks/months, recent/latest, since X, before X, bare MONTH, in YEAR, specific date, exclusion, multi-entity with shared time, plus multi-turn context (3- and 5-message). | +| **Negative** (`false`) | 20 | Filter must be null. Three sub-patterns: (a) dates that describe the topic/subject ("2024 sustainability report", "trends 2020→2025", "2016 US election"); (b) no temporal reference (policy, how-to, trivia); (c) conversational fillers (greetings, thanks). Includes a 5-message negative where the last turn pivots to a pure topic question. | + +Document types and verbs are varied on purpose — design specs, incident reports, PRs, commits, lab results, audit logs, invoices, legal briefs, slide decks, meeting minutes, safety bulletins, etc. — so the evaluation does not reduce to "the model learned the word *uploaded*". + +## Metrics + +Positive class = a filter **was** / **should have been** emitted. + +1. **filter_detection_accuracy** — (TP + TN) / N. +2. **filter_detection_precision** — TP / (TP + FP). How often an emitted filter was actually wanted. +3. **filter_detection_recall** — TP / (TP + FN). How often a wanted filter was actually emitted. +4. **filter_detection_f1** — harmonic mean of the two. +5. **filter_correctness** — LLM-as-judge boolean on TP cases only: given the chat history, current date, and generator output JSON, are the predicates correct as a whole (operator, field, ISO values, closed-vs-open intervals, exclusion split)? + +Judge is invoked **only on TP** (filter expected and emitted). Precision/recall capture the detection decision; correctness captures the filter body. + +## Results + +Full raw output: [results/result_filter_generation.json](results/result_filter_generation.json). Judge: `Qwen3-VL-8B-Instruct-FP8`. Current date at eval time: Sunday, April 19, 2026. + +**Overall** + +| Prompt | Model | Acc | Precision | Recall | F1 | TP / FP / FN / TN | filter_correctness (TP only) | +|---|---|---:|---:|---:|---:|:-:|---:| +| v0 | Mistral-Small-3.1-24B-Instruct-2503 | 75.0% | 100.0% | 50.0% | 66.7% | 10 / 0 / 10 / 20 | 8/10 (80.0%) | +| v0 | Qwen3-VL-8B-Instruct-FP8 | 57.5% | 100.0% | 15.0% | 26.1% | 3 / 0 / 17 / 20 | 2/3 (66.7%) | +| **v1** | **Mistral-Small-3.1-24B-Instruct-2503** | **100.0%** | **100.0%** | **100.0%** | **100.0%** | 20 / 0 / 0 / 20 | **19/20 (95.0%)** | +| v1 | Qwen3-VL-8B-Instruct-FP8 | 92.5% | 87.0% | 100.0% | 93.0% | 20 / 3 / 0 / 17 | 17/20 (85.0%) | + +(Numbers above are from a matched-conditions rerun with `DATASET_CURRENT_DATE = "Sunday, April 19, 2026"` pinned in the evaluator. Minor v0/v1 drift vs earlier archived runs reflects generator + LLM-as-judge variance.) + +### v0 vs v1 + +v0 contains no `temporal_filters` rules, so both models default to "no filter" and collapse on recall (Mistral 50%, Qwen 15%). v1's explicit resolution table brings both to 100% recall. No false positives under v0 — the cost of v0 is recall only, not precision. + +Exclusion (id 14) illustrates the v0 body failure: without a rule, Mistral emits three contradictory predicates (`>= 2025-04-19 AND < 2025-03-01 AND >= 2025-04-…`); Qwen collapses to the full year including March. v1 fixes this for Mistral. + +### v1 — Mistral-Small (winner) + +Perfect detection (20/20 TP, 0/20 FP). Two judge-rejected filter bodies among TP: + +- **id 20 "Latest safety bulletins"** → emitted a 12-month window `[2025-07-20, 2026-07-20)` extending into the future. Prompt rule is `recent/latest → past 90 days`, expected `[2026-01-19, 2026-04-20)`. Real generator bug. +- **id 18 "Commits pushed since last Monday"** → added a spurious upper bound `< 2026-04-19`. Prompt rule is `since X → one predicate >= X`. Real generator bug. + +### v1 — Qwen3-VL-8B + +Perfect recall but 2 false positives where a year describes the **content**, not the document creation date: + +| id | Query | Bad filter | +|---|---|---| +| 30 | "Findings in the 2024 annual sustainability report." | `created_at ∈ [2024-01-01, 2025-01-01)` | +| 34 | "Effects of climate change on Arctic sea ice between 2010 and 2020." | `created_at ∈ [2010-01-01, 2021-01-01)` | + +Cause: v1's topic-vs-creation section is short and its single null example ("Q3 2024 reporting template") does not cover research/historical framings with spanning year ranges. + +Two judge-rejected filter bodies among TP: + +- **id 18 "Commits pushed since last Monday"** → added a spurious upper bound `< 2026-04-20`. Prompt rule is `since X → one predicate >= X`. Correct rejection. +- **id 9 "Recent SRE incident reports"** → emitted past 10 days instead of past 90. Correct rejection. + +## Recommendations + +1. **Ship v1 + Mistral-Small-24B as the production pairing** — 100% detection, 90% filter correctness, with the two remaining body bugs on "latest" and "since X" worth a targeted prompt tweak. +2. **Patch v1's topic-vs-creation section.** Add null examples for the two patterns Qwen still trips on: `"findings in the YEAR report"` and `"events in YEAR"` / `"between YEAR1 and YEAR2"` when the year is the subject. +3. **Reinforce `since X` and `latest` rules.** Both Mistral and Qwen emit an unwanted upper bound on "since last Monday"; Mistral emits an over-wide, future-extending window for "latest". A short prompt clarification or an additional example should eliminate these. +4. **Upgrade the judge.** Qwen3-VL-8B as judge is flaky (one call returned `None` on id 8) and occasionally misreads the prompt's resolution rules. A stronger judge would tighten the correctness metric. diff --git a/benchmarks/prompt_eval/datasets/query_decomposition.json b/benchmarks/prompt_eval/datasets/query_decomposition.json new file mode 100644 index 00000000..22c01728 --- /dev/null +++ b/benchmarks/prompt_eval/datasets/query_decomposition.json @@ -0,0 +1,2026 @@ +[ + { + "id": 1, + "difficulty": 1, + "domain": "finance", + "expected_queries": { + "query_list": [ + { + "query": "Total revenue in Q3 2024", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What was our total revenue in Q3 2024?" + } + ] + }, + { + "id": 2, + "difficulty": 1, + "domain": "healthcare", + "expected_queries": { + "query_list": [ + { + "query": "Recommended flu vaccination schedule for adults over 65", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "Can you summarize the current hospital policy on visitor hours?" + }, + { + "role": "assistant", + "content": "According to the facility policy, general visitor hours are 10:00 to 20:00, with ICU access limited to two immediate family members at a time. Pediatric wards allow one parent to stay overnight." + }, + { + "role": "user", + "content": "Helpful. And what's the minimum staffing ratio required for the night shift?" + }, + { + "role": "assistant", + "content": "The current policy requires a minimum of one nurse per four patients on the night shift for general wards, and one nurse per two patients in the ICU." + }, + { + "role": "user", + "content": "what is the recommended flu vaccination schedule for adults over 65?" + } + ] + }, + { + "id": 3, + "difficulty": 1, + "domain": "legal", + "expected_queries": { + "query_list": [ + { + "query": "Statute of limitations for breach of contract in California", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the statute of limitations for breach of contract in California?" + } + ] + }, + { + "id": 4, + "difficulty": 1, + "domain": "engineering", + "expected_queries": { + "query_list": [ + { + "query": "Maximum rainfall intensity for a 10-year return period in Toulouse", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the maximum rainfall intensity for a 10-year return period in Toulouse?" + } + ] + }, + { + "id": 5, + "difficulty": 1, + "domain": "science", + "expected_queries": { + "query_list": [ + { + "query": "Average mutation rate of human mitochondrial DNA", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "Can you remind me what CRISPR-Cas9 is primarily used for in gene editing?" + }, + { + "role": "assistant", + "content": "CRISPR-Cas9 is a genome-editing system that allows researchers to add, remove, or alter specific sections of DNA by targeting precise sequences with a guide RNA and then cutting with the Cas9 nuclease." + }, + { + "role": "user", + "content": "Got it. Switching topic — what is the average mutation rate of human mitochondrial DNA?" + } + ] + }, + { + "id": 6, + "difficulty": 1, + "domain": "HR", + "expected_queries": { + "query_list": [ + { + "query": "Parental leave entitlement for employees in the French office", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "We have teams in Paris, Berlin and London and I'm reviewing benefit policies." + }, + { + "role": "assistant", + "content": "Those three offices operate under different national labor frameworks, so benefit entitlements such as leave, healthcare, and pension contributions vary significantly between them." + }, + { + "role": "user", + "content": "What is the parental leave entitlement for employees in our French office?" + } + ] + }, + { + "id": 7, + "difficulty": 1, + "domain": "education", + "expected_queries": { + "query_list": [ + { + "query": "Number of students who graduated from the computer science program in the 2024-2025 academic year", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "How many students enrolled in the computer science program last year?" + }, + { + "role": "assistant", + "content": "According to the registrar's annual report, the CS program had 342 new enrollments in the 2024-2025 academic year, a 6% increase over the prior year." + }, + { + "role": "user", + "content": "And how many graduated?" + } + ] + }, + { + "id": 8, + "difficulty": 1, + "domain": "marketing", + "expected_queries": { + "query_list": [ + { + "query": "Average engagement rate for sponsored Instagram posts targeting the 25-35 age group", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the average engagement rate for sponsored posts on Instagram targeting the 25-35 age group?" + } + ] + }, + { + "id": 9, + "difficulty": 1, + "domain": "real_estate", + "expected_queries": { + "query_list": [ + { + "query": "Average rental yield for residential properties in Lyon", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the average rental yield for residential properties in Lyon?" + } + ] + }, + { + "id": 10, + "difficulty": 1, + "domain": "technology", + "expected_queries": { + "query_list": [ + { + "query": "Recommended steps for migrating a MySQL database to PostgreSQL", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the recommended steps for migrating a MySQL database to PostgreSQL?" + } + ] + }, + { + "id": 11, + "difficulty": 1, + "domain": "environment", + "expected_queries": { + "query_list": [ + { + "query": "Total energy consumption of the Paris headquarters in 2024", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What's the current headcount at the Paris HQ?" + }, + { + "role": "assistant", + "content": "According to the HR directory, the Paris HQ currently employs approximately 1,200 people across engineering, sales, and corporate functions." + }, + { + "role": "user", + "content": "Thanks. Unrelated question — what was the total energy consumption of our Paris headquarters in 2024?" + } + ] + }, + { + "id": 12, + "difficulty": 1, + "domain": "logistics", + "expected_queries": { + "query_list": [ + { + "query": "Current lead time for ocean freight from Shanghai to Rotterdam", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the current lead time for ocean freight from Shanghai to Rotterdam?" + } + ] + }, + { + "id": 13, + "difficulty": 1, + "domain": "agriculture", + "expected_queries": { + "query_list": [ + { + "query": "Optimal irrigation frequency for winter wheat during the tillering stage", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the optimal irrigation frequency for winter wheat during the tillering stage?" + } + ] + }, + { + "id": 14, + "difficulty": 1, + "domain": "energy", + "expected_queries": { + "query_list": [ + { + "query": "Capacity factor of the wind farm in Normandy", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the capacity factor of our wind farm in Normandy?" + } + ] + }, + { + "id": 15, + "difficulty": 1, + "domain": "manufacturing", + "expected_queries": { + "query_list": [ + { + "query": "Rework procedure for solder joint failures on Line C", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "A batch on Line C just failed solder joint inspection. What is the acceptable defect threshold for that line?" + }, + { + "role": "assistant", + "content": "Based on the Line C quality standards document, the acceptable defect threshold for solder joint inspections is 0.5% of joints per batch. Batches exceeding this rate are quarantined for rework." + }, + { + "role": "user", + "content": "What is the rework procedure for solder joint failures on Line C?" + } + ] + }, + { + "id": 16, + "difficulty": 1, + "domain": "public_policy", + "expected_queries": { + "query_list": [ + { + "query": "Current carbon tax rate per ton of CO2 in France", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the carbon tax rate per ton of CO2 in France?" + } + ] + }, + { + "id": 17, + "difficulty": 1, + "domain": "retail", + "expected_queries": { + "query_list": [ + { + "query": "Safety stock level for the premium wireless headphones", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "We launched a new line of premium wireless headphones last quarter and sales are trending above plan." + }, + { + "role": "assistant", + "content": "That's encouraging. Strong early sales typically trigger a review of replenishment parameters so that demand spikes don't cause stockouts on the new SKU." + }, + { + "role": "user", + "content": "What is the safety stock level for them?" + } + ] + }, + { + "id": 18, + "difficulty": 1, + "domain": "telecommunications", + "expected_queries": { + "query_list": [ + { + "query": "Packet loss rate on the fiber backbone between Paris and Marseille", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the packet loss rate on our fiber backbone between Paris and Marseille?" + } + ] + }, + { + "id": 19, + "difficulty": 1, + "domain": "insurance", + "expected_queries": { + "query_list": [ + { + "query": "Loss ratio for the commercial property insurance line", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "Our gross written premium grew significantly last year. What drove that growth?" + }, + { + "role": "assistant", + "content": "According to the annual report, GWP growth was primarily driven by a 18% increase in commercial property lines and expansion into the SME segment, which added approximately €42M in new premiums." + }, + { + "role": "user", + "content": "What is the loss ratio for our commercial property line?" + } + ] + }, + { + "id": 20, + "difficulty": 1, + "domain": "aviation", + "expected_queries": { + "query_list": [ + { + "query": "Next C-check due date for aircraft with registration F-GKXA", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "When is the next C-check due for aircraft registration F-GKXA?" + } + ] + }, + { + "id": 21, + "difficulty": 2, + "domain": "finance", + "expected_queries": { + "query_list": [ + { + "query": "Returns on the S&P 500 index fund in 2025", + "temporal_filters": null + }, + { + "query": "Returns on the emerging markets fund in 2025", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "I'm reviewing portfolio options and narrowed the choice down to the S&P 500 index fund and our emerging markets fund." + }, + { + "role": "assistant", + "content": "Those two funds sit at different ends of the risk spectrum: the S&P 500 fund tracks large-cap US equities with lower volatility, while the emerging markets fund carries higher volatility but also higher growth potential over long horizons." + }, + { + "role": "user", + "content": "What were their returns last year?" + } + ] + }, + { + "id": 22, + "difficulty": 2, + "domain": "finance", + "expected_queries": { + "query_list": [ + { + "query": "Total revenue in Q1 2024", + "temporal_filters": null + }, + { + "query": "Total revenue in Q2 2024", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What was our total revenue in Q3 and Q4 of 2024?" + }, + { + "role": "assistant", + "content": "Based on the consolidated financial statements, Q3 2024 revenue was €412M and Q4 2024 revenue was €478M, bringing second-half revenue to €890M." + }, + { + "role": "user", + "content": "And Q1 and Q2?" + } + ] + }, + { + "id": 23, + "difficulty": 2, + "domain": "legal", + "expected_queries": { + "query_list": [ + { + "query": "Data privacy requirements in the European Union", + "temporal_filters": null + }, + { + "query": "Data privacy requirements in Brazil", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the data privacy requirements in the EU and in Brazil?" + } + ] + }, + { + "id": 24, + "difficulty": 2, + "domain": "engineering", + "expected_queries": { + "query_list": [ + { + "query": "Fatigue resistance rating for carbon steel under cyclic loading", + "temporal_filters": null + }, + { + "query": "Fatigue resistance rating for stainless steel under cyclic loading", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "We're choosing between carbon steel and stainless steel for structural beams on a bridge project." + }, + { + "role": "assistant", + "content": "Both are viable options. Carbon steel offers higher tensile strength at lower cost, while stainless steel provides better corrosion resistance and longer service life in aggressive environments. The choice typically depends on the exposure conditions and lifecycle cost targets." + }, + { + "role": "user", + "content": "What are the fatigue resistance ratings for carbon steel and stainless steel under cyclic loading?" + } + ] + }, + { + "id": 25, + "difficulty": 2, + "domain": "science", + "expected_queries": { + "query_list": [ + { + "query": "Average Arctic sea ice extent in summer 2015", + "temporal_filters": null + }, + { + "query": "Average Arctic sea ice extent in summer 2020", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What was the average Arctic sea ice extent in summer 2015 and summer 2020?" + } + ] + }, + { + "id": 26, + "difficulty": 2, + "domain": "HR", + "expected_queries": { + "query_list": [ + { + "query": "Employee satisfaction score for the London office", + "temporal_filters": null + }, + { + "query": "Employee satisfaction score for the Singapore office", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the employee satisfaction scores for our London office and our Singapore office?" + } + ] + }, + { + "id": 27, + "difficulty": 2, + "domain": "education", + "expected_queries": { + "query_list": [ + { + "query": "Student outcome results for project-based learning", + "temporal_filters": null + }, + { + "query": "Student outcome results for traditional lecture-based instruction", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the student outcome results for project-based learning versus traditional lecture-based instruction?" + } + ] + }, + { + "id": 28, + "difficulty": 2, + "domain": "marketing", + "expected_queries": { + "query_list": [ + { + "query": "Click-through rate for the summer campaign on Facebook", + "temporal_filters": null + }, + { + "query": "Click-through rate for the summer campaign on Instagram", + "temporal_filters": null + }, + { + "query": "Click-through rate for the summer campaign on Google Ads", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What was the click-through rate for our summer campaign on Facebook, Instagram, and Google Ads?" + } + ] + }, + { + "id": 29, + "difficulty": 2, + "domain": "real_estate", + "expected_queries": { + "query_list": [ + { + "query": "Average commercial lease rate in downtown Manhattan", + "temporal_filters": null + }, + { + "query": "Average commercial lease rate in the Miami financial district", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the average commercial lease rates in downtown Manhattan and in the Miami financial district?" + } + ] + }, + { + "id": 30, + "difficulty": 2, + "domain": "technology", + "expected_queries": { + "query_list": [ + { + "query": "Serverless pricing and cold start latency for AWS Lambda", + "temporal_filters": null + }, + { + "query": "Serverless pricing and cold start latency for Google Cloud Functions", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "We just migrated our auth service to Kubernetes. What's the recommended way to manage secrets across environments?" + }, + { + "role": "assistant", + "content": "The typical recommendation is to use Kubernetes Secrets with an external secret manager such as HashiCorp Vault or AWS Secrets Manager, integrated via a sync operator. This keeps credentials out of manifests and enables per-environment rotation." + }, + { + "role": "user", + "content": "Understood. New topic — compare the serverless pricing and cold start latency of AWS Lambda and Google Cloud Functions." + } + ] + }, + { + "id": 31, + "difficulty": 2, + "domain": "environment", + "expected_queries": { + "query_list": [ + { + "query": "Total CO2 emissions for the main campus in Q1 2026", + "temporal_filters": null + }, + { + "query": "Total CO2 emissions for the main campus in Q3 2026", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "Our sustainability report shows a reduction in total emissions this year. Which quarter drove the biggest drop?" + }, + { + "role": "assistant", + "content": "According to the emissions tracking data, Q3 showed the largest reduction, down 14% versus the prior year, mainly due to the switch to renewable energy at the main manufacturing facility in July." + }, + { + "role": "user", + "content": "What were the total CO2 emissions for the main campus in Q1 and Q3 of this year?" + } + ] + }, + { + "id": 32, + "difficulty": 2, + "domain": "logistics", + "expected_queries": { + "query_list": [ + { + "query": "Average delivery time for shipments to Southeast Asia", + "temporal_filters": null + }, + { + "query": "Average delivery time for shipments to Western Europe", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the average delivery times for shipments to Southeast Asia and to Western Europe?" + } + ] + }, + { + "id": 33, + "difficulty": 2, + "domain": "agriculture", + "expected_queries": { + "query_list": [ + { + "query": "Per-hectare yield for wheat in the 2025 harvest season", + "temporal_filters": null + }, + { + "query": "Per-hectare yield for corn in the 2025 harvest season", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What were the per-hectare yields for wheat and corn this harvest season?" + } + ] + }, + { + "id": 34, + "difficulty": 2, + "domain": "energy", + "expected_queries": { + "query_list": [ + { + "query": "Solar energy generation at our farm in summer 2024", + "temporal_filters": null + }, + { + "query": "Solar energy generation at our farm in winter 2024", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "How did solar energy generation at our farm compare between summer 2024 and winter 2024?" + } + ] + }, + { + "id": 35, + "difficulty": 2, + "domain": "manufacturing", + "expected_queries": { + "query_list": [ + { + "query": "Overall defect rate for Assembly Line A in Q2 2026", + "temporal_filters": null + }, + { + "query": "Overall defect rate for Assembly Line B in Q2 2026", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the overall defect rates for Assembly Line A and Assembly Line B this quarter?" + } + ] + }, + { + "id": 36, + "difficulty": 2, + "domain": "public_policy", + "expected_queries": { + "query_list": [ + { + "query": "Current rent control measures in Berlin", + "temporal_filters": null + }, + { + "query": "Current rent control measures in New York City", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What rent control measures are currently in place in Berlin and in New York City?" + } + ] + }, + { + "id": 37, + "difficulty": 2, + "domain": "retail", + "expected_queries": { + "query_list": [ + { + "query": "Total online sales figures for Black Friday", + "temporal_filters": null + }, + { + "query": "Total online sales figures for Cyber Monday", + "temporal_filters": null + }, + { + "query": "Total online sales figures for Christmas week", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What were total online sales figures for Black Friday, Cyber Monday, and Christmas week?" + } + ] + }, + { + "id": 38, + "difficulty": 2, + "domain": "telecommunications", + "expected_queries": { + "query_list": [ + { + "query": "Average 5G download speeds in urban areas on our network", + "temporal_filters": null + }, + { + "query": "Average 5G download speeds in rural areas on our network", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the average 5G download speeds in urban areas versus rural areas on our network?" + } + ] + }, + { + "id": 39, + "difficulty": 2, + "domain": "insurance", + "expected_queries": { + "query_list": [ + { + "query": "Average claim resolution time for auto insurance claims", + "temporal_filters": null + }, + { + "query": "Average claim resolution time for home insurance claims", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the average claim resolution time for auto insurance versus home insurance claims?" + } + ] + }, + { + "id": 40, + "difficulty": 2, + "domain": "aviation", + "expected_queries": { + "query_list": [ + { + "query": "On-time departure rate in January 2025", + "temporal_filters": null + }, + { + "query": "On-time departure rate in February 2025", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "Can you pull up our operational KPIs for early 2025?" + }, + { + "role": "assistant", + "content": "Sure, the main operational KPIs include on-time departure rate, load factor, baggage mishandling rate, and flight completion factor. Which metric would you like to look at first?" + }, + { + "role": "user", + "content": "Let's focus on on-time departures. What was our rate in January and February of 2025?" + } + ] + }, + { + "id": 41, + "difficulty": 2, + "domain": "finance", + "expected_queries": { + "query_list": [ + { + "query": "Returns of the Growth Fund in Q3 2024", + "temporal_filters": null + }, + { + "query": "Returns of the Growth Fund in Q4 2024", + "temporal_filters": null + }, + { + "query": "Returns of the Value Fund in Q3 2024", + "temporal_filters": null + }, + { + "query": "Returns of the Value Fund in Q4 2024", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What were the returns of the Growth Fund and the Value Fund in Q3 and Q4 of 2024?" + } + ] + }, + { + "id": 42, + "difficulty": 2, + "domain": "healthcare", + "expected_queries": { + "query_list": [ + { + "query": "Average ER wait time at the Boston hospital", + "temporal_filters": null + }, + { + "query": "Average ER wait time at the Philadelphia hospital", + "temporal_filters": null + }, + { + "query": "Cardiology appointment lead time at the Boston hospital", + "temporal_filters": null + }, + { + "query": "Cardiology appointment lead time at the Philadelphia hospital", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the average ER wait times and cardiology appointment lead times at our Boston and Philadelphia hospitals?" + } + ] + }, + { + "id": 43, + "difficulty": 2, + "domain": "energy", + "expected_queries": { + "query_list": [ + { + "query": "Percentage of electricity from renewables at the Texas plant", + "temporal_filters": null + }, + { + "query": "Percentage of electricity from renewables at the Ohio plant", + "temporal_filters": null + }, + { + "query": "Percentage of electricity from renewables at the Oregon plant", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What percentage of electricity comes from renewables at our Texas, Ohio, and Oregon plants?" + } + ] + }, + { + "id": 44, + "difficulty": 2, + "domain": "legal", + "expected_queries": { + "query_list": [ + { + "query": "Antitrust cases filed against Google between 2023 and 2026", + "temporal_filters": null + }, + { + "query": "Antitrust cases filed against Apple between 2023 and 2026", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "We're seeing a lot of antitrust activity in the tech sector lately." + }, + { + "role": "assistant", + "content": "That's correct. The past three years have seen a significant uptick in enforcement actions, particularly targeting large platform companies in the EU and the US. The EU's Digital Markets Act has been a major driver of new cases." + }, + { + "role": "user", + "content": "What antitrust cases have been filed against Google and Apple in the last three years?" + } + ] + }, + { + "id": 45, + "difficulty": 2, + "domain": "manufacturing", + "expected_queries": { + "query_list": [ + { + "query": "Overall equipment effectiveness for the Shenzhen factory in H1 2024", + "temporal_filters": null + }, + { + "query": "Overall equipment effectiveness for the Shenzhen factory in H2 2024", + "temporal_filters": null + }, + { + "query": "Overall equipment effectiveness for the Munich factory in H1 2024", + "temporal_filters": null + }, + { + "query": "Overall equipment effectiveness for the Munich factory in H2 2024", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What was the overall equipment effectiveness for the Shenzhen factory and the Munich factory in H1 and H2 of 2024?" + } + ] + }, + { + "id": 46, + "difficulty": 2, + "domain": "education", + "expected_queries": { + "query_list": [ + { + "query": "Number of students enrolled in the fall 2024 semester", + "temporal_filters": null + }, + { + "query": "Number of students enrolled in the spring 2025 semester", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "How many students enrolled in the fall 2024 semester versus the spring 2025 semester?" + } + ] + }, + { + "id": 47, + "difficulty": 2, + "domain": "marketing", + "expected_queries": { + "query_list": [ + { + "query": "Net Promoter Score for the ProLine series", + "temporal_filters": null + }, + { + "query": "Net Promoter Score for the EcoLine series", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "The ProLine series and the EcoLine series are our two flagship product lines this year." + }, + { + "role": "assistant", + "content": "Both lines have been prominent in recent marketing reports. ProLine targets professional users with premium pricing, while EcoLine targets sustainability-conscious consumers at a mid-tier price point." + }, + { + "role": "user", + "content": "What are their Net Promoter Scores?" + } + ] + }, + { + "id": 48, + "difficulty": 2, + "domain": "logistics", + "expected_queries": { + "query_list": [ + { + "query": "Daily order processing volume at the Memphis hub in March 2025", + "temporal_filters": null + }, + { + "query": "Daily order processing volume at the Memphis hub in April 2025", + "temporal_filters": null + }, + { + "query": "Daily order processing volume at the Rotterdam hub in March 2025", + "temporal_filters": null + }, + { + "query": "Daily order processing volume at the Rotterdam hub in April 2025", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What was the daily order processing volume at our Memphis hub and our Rotterdam hub in March and April 2025?" + } + ] + }, + { + "id": 49, + "difficulty": 2, + "domain": "agriculture", + "expected_queries": { + "query_list": [ + { + "query": "Average soil nitrogen levels in our soybean fields", + "temporal_filters": null + }, + { + "query": "Average soil nitrogen levels in our cotton fields", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the average soil nitrogen levels in our soybean fields versus our cotton fields?" + } + ] + }, + { + "id": 50, + "difficulty": 2, + "domain": "technology", + "expected_queries": { + "query_list": [ + { + "query": "API uptime percentage for January 2025", + "temporal_filters": null + }, + { + "query": "API uptime percentage for February 2025", + "temporal_filters": null + }, + { + "query": "API uptime percentage for March 2025", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "Our SLA targets uptime above 99.9%. Did we meet that in Q1 2025?" + }, + { + "role": "assistant", + "content": "In Q1 2025, overall API uptime averaged 99.87%, which fell slightly below the 99.9% SLA target. January was the weakest month at 99.71% due to a storage incident on January 14th." + }, + { + "role": "user", + "content": "What was the API uptime percentage for January, February, and March 2025?" + } + ] + }, + { + "id": 51, + "difficulty": 2, + "domain": "retail", + "expected_queries": { + "query_list": [ + { + "query": "Revenue per square foot for East Coast stores in Q1 2026", + "temporal_filters": null + }, + { + "query": "Revenue per square foot for West Coast stores in Q1 2026", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What was the revenue per square foot for our East Coast stores versus our West Coast stores last quarter?" + } + ] + }, + { + "id": 52, + "difficulty": 2, + "domain": "insurance", + "expected_queries": { + "query_list": [ + { + "query": "Number of flood insurance claims in 2023", + "temporal_filters": null + }, + { + "query": "Number of flood insurance claims in 2024", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "How did the number of flood insurance claims compare between 2023 and 2024?" + } + ] + }, + { + "id": 53, + "difficulty": 2, + "domain": "telecommunications", + "expected_queries": { + "query_list": [ + { + "query": "Monthly churn rate for the Premium plan", + "temporal_filters": null + }, + { + "query": "Monthly churn rate for the Basic plan", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "We currently run two subscription tiers: the Premium plan and the Basic plan." + }, + { + "role": "assistant", + "content": "Got it. Those two tiers typically have very different retention dynamics: Premium subscribers usually churn at a lower rate due to bundled features, while Basic subscribers are more price-sensitive and churn higher." + }, + { + "role": "user", + "content": "What is the monthly churn rate for each of them?" + } + ] + }, + { + "id": 54, + "difficulty": 2, + "domain": "public_policy", + "expected_queries": { + "query_list": [ + { + "query": "Current minimum wage in France", + "temporal_filters": null + }, + { + "query": "Current minimum wage in Japan", + "temporal_filters": null + }, + { + "query": "Current minimum wage in Australia", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the current minimum wage in France, Japan, and Australia?" + } + ] + }, + { + "id": 55, + "difficulty": 2, + "domain": "aviation", + "expected_queries": { + "query_list": [ + { + "query": "Load factor for the New York-London route", + "temporal_filters": null + }, + { + "query": "Load factor for the Los Angeles-Tokyo route", + "temporal_filters": null + }, + { + "query": "Fuel cost per seat-mile for the New York-London route", + "temporal_filters": null + }, + { + "query": "Fuel cost per seat-mile for the Los Angeles-Tokyo route", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the load factor and fuel cost per seat-mile for our New York-London and Los Angeles-Tokyo routes?" + } + ] + }, + { + "id": 56, + "difficulty": 2, + "domain": "science", + "expected_queries": { + "query_list": [ + { + "query": "Clinical trial results from the past 90 days (since 2026-01-17) for CRISPR treatments for sickle cell disease", + "temporal_filters": null + }, + { + "query": "Clinical trial results from the past 90 days (since 2026-01-17) for CRISPR treatments for beta-thalassemia", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "CRISPR therapies have been advancing fast recently." + }, + { + "role": "assistant", + "content": "Yes, 2023 and 2024 saw major milestones. The FDA approved the first CRISPR-based therapy for sickle cell disease in December 2023, and several beta-thalassemia trials reported highly promising results around the same period." + }, + { + "role": "user", + "content": "What are the latest clinical trial results for CRISPR treatments for sickle cell disease and for beta-thalassemia?" + } + ] + }, + { + "id": 57, + "difficulty": 2, + "domain": "HR", + "expected_queries": { + "query_list": [ + { + "query": "Voluntary turnover rate for engineering in 2023", + "temporal_filters": null + }, + { + "query": "Voluntary turnover rate for engineering in 2024", + "temporal_filters": null + }, + { + "query": "Voluntary turnover rate for marketing in 2023", + "temporal_filters": null + }, + { + "query": "Voluntary turnover rate for marketing in 2024", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What were the voluntary turnover rates for engineering and marketing in 2023 and 2024?" + } + ] + }, + { + "id": 58, + "difficulty": 2, + "domain": "environment", + "expected_queries": { + "query_list": [ + { + "query": "Annual water consumption at our manufacturing operations", + "temporal_filters": null + }, + { + "query": "Annual water consumption at our data center operations", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the annual water consumption at our manufacturing operations versus our data center operations?" + } + ] + }, + { + "id": 59, + "difficulty": 2, + "domain": "real_estate", + "expected_queries": { + "query_list": [ + { + "query": "Median home sale price in the Austin metro area in Q1 2025", + "temporal_filters": null + }, + { + "query": "Median home sale price in the Austin metro area in Q2 2025", + "temporal_filters": null + }, + { + "query": "Median home sale price in the Austin metro area in Q3 2025", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What was the median home sale price in the Austin metro area in Q1, Q2, and Q3 of 2025?" + } + ] + }, + { + "id": 60, + "difficulty": 2, + "domain": "finance", + "expected_queries": { + "query_list": [ + { + "query": "Financial documents from January or February 2025", + "temporal_filters": null + }, + { + "query": "Financial documents from April to December 2025", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "Show me the financial documents from last year except for those filed in March." + } + ] + }, + { + "id": 61, + "difficulty": 3, + "domain": "finance", + "ambiguity_reason": "Evolution trend over a bounded range — split per year so each year can be retrieved independently.", + "expected_queries": { + "query_list": [ + { + "query": "Operating costs in 2024", + "temporal_filters": null + }, + { + "query": "Operating costs in 2025", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "How have our operating costs evolved over the past two years?" + } + ] + }, + { + "id": 62, + "difficulty": 3, + "domain": "healthcare", + "ambiguity_reason": "Two patient groups follow distinct clinical protocols typically documented separately — separate retrieval per group is needed despite the comparative framing.", + "expected_queries": { + "query_list": [ + { + "query": "Standard discharge procedure for elderly patients", + "temporal_filters": null + }, + { + "query": "Standard discharge procedure for pediatric patients", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What is the standard discharge procedure for elderly patients versus pediatric patients?" + } + ] + }, + { + "id": 63, + "difficulty": 3, + "domain": "technology", + "ambiguity_reason": "Two distinct systems each with independent documentation — a comparison query still requires two separate retrievals to gather the source material.", + "expected_queries": { + "query_list": [ + { + "query": "Main architectural features of Kafka", + "temporal_filters": null + }, + { + "query": "Main architectural features of RabbitMQ", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "Can you explain what gRPC is and where it's usually a better choice than REST?" + }, + { + "role": "assistant", + "content": "gRPC is a high-performance RPC framework built on HTTP/2 and protocol buffers. It typically outperforms REST in microservice-to-microservice communication, streaming use cases, and low-latency environments where strong typing matters." + }, + { + "role": "user", + "content": "Thanks. I'll move to another topic — what are the main architectural differences between Kafka and RabbitMQ?" + } + ] + }, + { + "id": 64, + "difficulty": 3, + "domain": "legal", + "ambiguity_reason": "Two legal concepts mentioned but the question is about their interaction — a relational question that is semantically one unit.", + "expected_queries": { + "query_list": [ + { + "query": "Differences between trade secrets and patents in terms of protection duration and disclosure requirements", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "We're advising a client on whether to patent their process or keep it as a trade secret." + }, + { + "role": "assistant", + "content": "That's a common strategic decision. Patents offer time-limited exclusive rights in exchange for public disclosure, while trade secrets provide indefinite protection as long as the information stays confidential. The right choice depends on how easily the process could be reverse-engineered and how long the competitive advantage needs to last." + }, + { + "role": "user", + "content": "How do trade secrets and patents differ in terms of protection duration and disclosure requirements?" + } + ] + }, + { + "id": 65, + "difficulty": 3, + "domain": "environment", + "ambiguity_reason": "Trend question over a bounded range — split per year so each year can be retrieved independently.", + "expected_queries": { + "query_list": [ + { + "query": "Amazon deforestation rates in 2021", + "temporal_filters": null + }, + { + "query": "Amazon deforestation rates in 2022", + "temporal_filters": null + }, + { + "query": "Amazon deforestation rates in 2023", + "temporal_filters": null + }, + { + "query": "Amazon deforestation rates in 2024", + "temporal_filters": null + }, + { + "query": "Amazon deforestation rates in 2025", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What has been the trend in Amazon deforestation rates over the last five years?" + } + ] + }, + { + "id": 66, + "difficulty": 3, + "domain": "HR", + "ambiguity_reason": "Sales and customer success have distinct onboarding processes typically documented separately — one retrieval per department is needed despite the shared framing.", + "expected_queries": { + "query_list": [ + { + "query": "Onboarding process for sales new hires", + "temporal_filters": null + }, + { + "query": "Onboarding process for customer success new hires", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What does the onboarding process look like for sales and customer success new hires?" + } + ] + }, + { + "id": 67, + "difficulty": 3, + "domain": "science", + "ambiguity_reason": "Two substances but the question is about their interaction as a combined effect — splitting fragments a question that only makes sense as a whole.", + "expected_queries": { + "query_list": [ + { + "query": "Known interactions between metformin and ibuprofen", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the known interactions between metformin and ibuprofen?" + } + ] + }, + { + "id": 68, + "difficulty": 3, + "domain": "retail", + "ambiguity_reason": "Two seasons referenced but the question asks for a typical pattern — a trend question, not two independent data lookups.", + "expected_queries": { + "query_list": [ + { + "query": "Typical comparison of product return rates between summer and winter seasons", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "Our return rates seem to spike at certain times of year." + }, + { + "role": "assistant", + "content": "That's a common pattern in retail. Return rates typically peak in January following holiday gift purchases, and can also rise after summer sales events. The average return rate across categories runs about 18% in January versus 9% in summer months based on industry benchmarks." + }, + { + "role": "user", + "content": "How do product return rates typically compare between the summer and winter seasons?" + } + ] + }, + { + "id": 69, + "difficulty": 3, + "domain": "engineering", + "ambiguity_reason": "Two properties asked for one material — a multi-attribute lookup for a single subject, not two independent searches.", + "expected_queries": { + "query_list": [ + { + "query": "Thermal conductivity and fire resistance ratings of ceramic fiber insulation", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the thermal conductivity and fire resistance ratings of ceramic fiber insulation?" + } + ] + }, + { + "id": 70, + "difficulty": 3, + "domain": "logistics", + "ambiguity_reason": "Air and sea freight are separate logistics categories with distinct documentation — cost and lead time for each require independent retrieval despite the comparative framing.", + "expected_queries": { + "query_list": [ + { + "query": "Cost and lead time for air freight transatlantic shipments", + "temporal_filters": null + }, + { + "query": "Cost and lead time for sea freight transatlantic shipments", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the cost and lead time differences between air freight and sea freight for transatlantic shipments?" + } + ] + }, + { + "id": 71, + "difficulty": 3, + "domain": "public_policy", + "ambiguity_reason": "Short-term and long-term effects of one policy are part of one coherent question, not two independent retrievals.", + "expected_queries": { + "query_list": [ + { + "query": "Short-term and long-term effects of the Finnish UBI pilot on employment and well-being", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What were the short-term and long-term effects of the Finnish UBI pilot on employment and well-being?" + } + ] + }, + { + "id": 72, + "difficulty": 3, + "domain": "manufacturing", + "ambiguity_reason": "Surface bubbling and adhesion failure are distinct failure modes with separate root cause analyses in QA documentation — each requires its own retrieval.", + "expected_queries": { + "query_list": [ + { + "query": "Root causes of surface bubbling on the automotive paint line", + "temporal_filters": null + }, + { + "query": "Root causes of adhesion failure on the automotive paint line", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "Our automotive paint line has been flagging two recurring defects: surface bubbling and adhesion failure." + }, + { + "role": "assistant", + "content": "Those are both common finish defects. They typically have different upstream origins — bubbling often relates to solvent or temperature control issues, while adhesion failures usually point to surface preparation or primer compatibility problems." + }, + { + "role": "user", + "content": "What are their root causes on our line?" + } + ] + }, + { + "id": 73, + "difficulty": 3, + "domain": "education", + "ambiguity_reason": "Two grade levels but the question is about a pedagogical approach in general — not a data query per grade level.", + "expected_queries": { + "query_list": [ + { + "query": "Application of differentiated instruction in elementary and middle school classrooms", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "How is differentiated instruction applied in elementary and middle school classrooms?" + } + ] + }, + { + "id": 74, + "difficulty": 3, + "domain": "aviation", + "ambiguity_reason": "EASA and FAA are independent regulatory bodies with entirely separate frameworks — one retrieval per agency is needed even though the question asks for a comparison.", + "expected_queries": { + "query_list": [ + { + "query": "EASA certification requirements for commercial drone operations", + "temporal_filters": null + }, + { + "query": "FAA certification requirements for commercial drone operations", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "We're pursuing certification on both sides of the Atlantic for our new drone." + }, + { + "role": "assistant", + "content": "Dual certification under EASA and FAA is common for commercial operators targeting international markets. The processes share some technical standards but differ significantly in procedural requirements and approval timelines, with FAA typically faster for initial approvals." + }, + { + "role": "user", + "content": "How do EASA and FAA certification requirements differ for commercial drone operations?" + } + ] + }, + { + "id": 75, + "difficulty": 3, + "domain": "marketing", + "ambiguity_reason": "Two audience segments but the question is about positioning strategy for both at once — a single strategic question.", + "expected_queries": { + "query_list": [ + { + "query": "Positioning strategy for new enterprise software targeting both IT decision-makers and end users", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "How should we position our new enterprise software to both IT decision-makers and end users?" + } + ] + }, + { + "id": 76, + "difficulty": 3, + "domain": "insurance", + "ambiguity_reason": "Two risk factors but the question asks how they jointly affect a premium — a combined underwriting question, semantically one unit.", + "expected_queries": { + "query_list": [ + { + "query": "Joint effect of flood zone classification and building age on commercial property insurance premiums", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "How do flood zone classification and building age jointly affect commercial property insurance premiums?" + } + ] + }, + { + "id": 77, + "difficulty": 3, + "domain": "agriculture", + "ambiguity_reason": "Aphids and rootworm are distinct pests with separate treatment protocols in agronomic documentation — independent retrieval per pest is needed despite the shared crop context.", + "expected_queries": { + "query_list": [ + { + "query": "Recommended treatments for aphids in corn crops", + "temporal_filters": null + }, + { + "query": "Recommended treatments for rootworm in corn crops", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the recommended treatments for aphids and rootworm in corn crops?" + } + ] + }, + { + "id": 78, + "difficulty": 3, + "domain": "telecommunications", + "ambiguity_reason": "Sub-6 GHz and mmWave are distinct frequency bands with separate technical specifications — one retrieval per band is needed even though the question asks for a unified comparison.", + "expected_queries": { + "query_list": [ + { + "query": "Coverage and throughput characteristics of sub-6 GHz bands for 5G deployment", + "temporal_filters": null + }, + { + "query": "Coverage and throughput characteristics of mmWave bands for 5G deployment", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "What are the coverage and throughput trade-offs between sub-6 GHz and mmWave bands for 5G deployment?" + } + ] + }, + { + "id": 79, + "difficulty": 3, + "domain": "real_estate", + "ambiguity_reason": "Two market conditions but the question asks how they interact to affect returns — a holistic investment strategy question.", + "expected_queries": { + "query_list": [ + { + "query": "Combined effect of rising interest rates and increasing housing supply on residential property investment returns", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "How do rising interest rates and increasing housing supply together affect residential property investment returns?" + } + ] + }, + { + "id": 80, + "difficulty": 3, + "domain": "energy", + "ambiguity_reason": "Two storage technologies compared in a single decision context — a comparison question is one semantic unit, not two independent lookups.", + "expected_queries": { + "query_list": [ + { + "query": "Comparison of lithium-ion battery systems and pumped hydro storage in terms of scalability and round-trip efficiency", + "temporal_filters": null + } + ] + }, + "messages": [ + { + "role": "user", + "content": "How do lithium-ion battery systems compare to pumped hydro storage in terms of scalability and round-trip efficiency?" + } + ] + } +] \ No newline at end of file diff --git a/benchmarks/prompt_eval/datasets/temporal_filter.json b/benchmarks/prompt_eval/datasets/temporal_filter.json new file mode 100644 index 00000000..dbc6c441 --- /dev/null +++ b/benchmarks/prompt_eval/datasets/temporal_filter.json @@ -0,0 +1,296 @@ +[ + { + "id": 1, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Show me design specs authored today."} + ] + }, + { + "id": 2, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Which incident reports were filed yesterday?"} + ] + }, + { + "id": 3, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Pull up the code reviews posted this week."} + ] + }, + { + "id": 4, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "I'm reviewing our quarterly filings."}, + {"role": "assistant", "content": "Sure — they're organized by submission date in the regulatory portal."}, + {"role": "user", "content": "Show me the ones filed last week."} + ] + }, + { + "id": 5, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Contracts drafted last month."} + ] + }, + { + "id": 6, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "What court decisions were issued last year?"} + ] + }, + { + "id": 7, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Lab results from the past 7 days."} + ] + }, + { + "id": 8, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Patient records updated in the past 30 days."} + ] + }, + { + "id": 9, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Show me recent SRE incident reports."} + ] + }, + { + "id": 10, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Legal briefs submitted since January 1, 2026."} + ] + }, + { + "id": 11, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Invoices issued before March 2026."} + ] + }, + { + "id": 12, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Audit logs recorded in 2024."} + ] + }, + { + "id": 13, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Slide decks produced in February."} + ] + }, + { + "id": 14, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Show me audit logs from last year except March."} + ] + }, + { + "id": 15, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Design specs authored on January 15, 2026."} + ] + }, + { + "id": 16, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Hi, I'm on SRE triage rotation this morning."}, + {"role": "assistant", "content": "Sure — I can help. Which service are you looking into?"}, + {"role": "user", "content": "Checkout has been flaky in production."}, + {"role": "assistant", "content": "Got it. I can pull relevant incident reports for you."}, + {"role": "user", "content": "Show me the ones filed yesterday."} + ] + }, + { + "id": 17, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Emails written last month from the Paris and Berlin offices."} + ] + }, + { + "id": 18, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Commits pushed since last Monday."} + ] + }, + { + "id": 19, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "I'm reviewing our Q1 financial filings."}, + {"role": "assistant", "content": "Sure — those are from our Q1 2026 reporting cycle."}, + {"role": "user", "content": "Are they all in the portal?"}, + {"role": "assistant", "content": "Yes, every Q1 filing has been uploaded."}, + {"role": "user", "content": "Show me only the ones submitted last week."} + ] + }, + { + "id": 20, + "query_with_temporal_filter": true, + "messages": [ + {"role": "user", "content": "Latest safety bulletins."} + ] + }, + { + "id": 21, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "Give me the Q3 2024 revenue report."} + ] + }, + { + "id": 22, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "What is the 2025 strategic roadmap?"} + ] + }, + { + "id": 23, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "What is the standard onboarding procedure for new engineers?"} + ] + }, + { + "id": 24, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "Hello!"} + ] + }, + { + "id": 25, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "Thanks, that's very helpful."} + ] + }, + { + "id": 26, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "Can you help me understand the 2008 financial crisis?"}, + {"role": "assistant", "content": "Sure — which aspect interests you?"}, + {"role": "user", "content": "The role of subprime mortgages."}, + {"role": "assistant", "content": "Subprime mortgages were high-risk loans that were packaged and resold as securities, which amplified losses when borrowers defaulted."}, + {"role": "user", "content": "Thanks. And what about the 2020 economic downturn?"} + ] + }, + { + "id": 27, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "What was the carbon tax rate in France in 2024?"} + ] + }, + { + "id": 28, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "Summarize trends in AI research from 2020 to 2025."} + ] + }, + { + "id": 29, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "What are the key clauses in the 2023 French labor reform?"} + ] + }, + { + "id": 30, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "What were the findings in the 2024 annual sustainability report?"} + ] + }, + { + "id": 31, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "What's the current minimum wage in Germany?"} + ] + }, + { + "id": 32, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "Explain how OAuth 2.0 works."} + ] + }, + { + "id": 33, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "Give me a summary of the Paris Olympics 2024."} + ] + }, + { + "id": 34, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "What are the effects of climate change on Arctic sea ice between 2010 and 2020?"} + ] + }, + { + "id": 35, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "What happened in the 2016 US presidential election?"} + ] + }, + { + "id": 36, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "What are the side effects of ibuprofen?"} + ] + }, + { + "id": 37, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "Compare the main feature differences between Python 3.11 and 3.12."} + ] + }, + { + "id": 38, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "What was the outcome of the Nuremberg trials?"} + ] + }, + { + "id": 39, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "How do I reset my password on the corporate VPN?"} + ] + }, + { + "id": 40, + "query_with_temporal_filter": false, + "messages": [ + {"role": "user", "content": "What are best practices for REST API versioning?"} + ] + } +] diff --git a/benchmarks/prompt_eval/eval_query_decomposition.py b/benchmarks/prompt_eval/eval_query_decomposition.py new file mode 100644 index 00000000..1635bd89 --- /dev/null +++ b/benchmarks/prompt_eval/eval_query_decomposition.py @@ -0,0 +1,574 @@ +""" +Prompt evaluation script for query reformulation prompts. + +Loads the query_decomposition dataset, runs each test case through one or more +prompt templates against every model defined in the `MODELS` dict, and scores +two decomposition metrics. + +Metrics (this pass): + 1. decomposition_count_matching — generated query count equals expected + 2. decomposition_semantic_coverage — LLM-as-judge boolean: does the generated + split, taken as a whole, semantically + cover every expected sub-query? The + count does not have to match exactly; + only semantic coverage matters. + +Usage: + uv run python eval_query_decomposition.py [OPTIONS] + +Options: + --dataset PATH Path to the dataset JSON file + (default: datasets/query_decomposition.json) + --prompt PATH Path to a specific prompt template file. + If omitted, all *.txt files in ./prompts/ are evaluated. + --output PATH Write JSON results to this file. + +Required environment (candidate models under evaluation — semicolon-separated): + BASE_URLS, API_KEYS, MODELS + +Optional environment (LLM-as-judge for semantic coverage; defaults to the +first candidate model if unset): + JUDGE_BASE_URL, JUDGE_API_KEY, JUDGE_MODEL +""" + +from __future__ import annotations + +import argparse +import asyncio +import json +import os +from dataclasses import asdict, dataclass, field +from datetime import datetime +from pathlib import Path +from typing import Literal + +from dotenv import load_dotenv +from langchain_openai import ChatOpenAI +from pydantic import BaseModel, Field +from tqdm.asyncio import tqdm + +load_dotenv() + +# --------------------------------------------------------------------------- +# Models to evaluate — configured via .env +# --------------------------------------------------------------------------- + + +def _parse_env_list(key: str) -> list[str]: + return [v for v in os.environ.get(key, "").split(";") if v.strip()] + + +def _build_models() -> dict[str, dict]: + base_urls = _parse_env_list("BASE_URLS") + api_keys = _parse_env_list("API_KEYS") + models = _parse_env_list("MODELS") + if not (base_urls and api_keys and models): + return {} + if not (len(base_urls) == len(api_keys) == len(models)): + raise ValueError( + f"BASE_URLS ({len(base_urls)}), API_KEYS ({len(api_keys)}), and MODELS ({len(models)}) " + "must have the same number of semicolon-separated entries." + ) + return { + model: {"base_url": base_url, "api_key": api_key, "model": model} + for base_url, api_key, model in zip(base_urls, api_keys, models) + } + + +MODELS: dict[str, dict] = _build_models() + + +def _judge_config() -> dict | None: + base_url = os.environ.get("JUDGE_BASE_URL") + api_key = os.environ.get("JUDGE_API_KEY") + model = os.environ.get("JUDGE_MODEL") + if base_url and api_key and model: + return {"base_url": base_url, "api_key": api_key, "model": model} + # Fall back to the first candidate model. + if MODELS: + first = next(iter(MODELS.values())) + return dict(first) + return None + + +# --------------------------------------------------------------------------- +# Reference date — pinned so relative-date expressions in the dataset +# resolve deterministically across reruns. Gold labels were built against +# 2026-04-17; change this only if you regenerate the gold. +# --------------------------------------------------------------------------- + +DATASET_CURRENT_DATE = datetime(2026, 4, 17).strftime("%A, %B %d, %Y, %H:%M:%S") + + +# --------------------------------------------------------------------------- +# Pydantic models — mirrors openrag/components/pipeline.py +# --------------------------------------------------------------------------- + + +class TemporalPredicate(BaseModel): + field: Literal["created_at"] = Field(default="created_at") + operator: Literal["==", "!=", ">", "<", ">=", "<="] + value: str = Field(description='ISO 8601 datetime with timezone, e.g. "2026-03-15T00:00:00+00:00".') + + +class Query(BaseModel): + query: str = Field(description="A semantically enriched, descriptive query for vector similarity search.") + temporal_filters: list[TemporalPredicate] | None = Field( + default=None, + description="Date predicates on created_at, AND-combined. Null when no creation-date restriction.", + ) + + +class SearchQueries(BaseModel): + """Search queries for semantic retrieval.""" + + query_list: list[Query] = Field(..., description="Search sub-queries to retrieve relevant documents.") + + +class CoverageJudgment(BaseModel): + """LLM-as-judge output for decomposition_semantic_coverage.""" + + covered: bool = Field( + description="True if the generated sub-queries, taken as a whole, semantically cover the information need of every expected sub-query. The number of generated sub-queries does NOT have to match the expected count — only the semantic coverage matters." + ) + reasoning: str | None = Field( + default=None, + description="Only set when covered=false. One or two sentences naming which expected sub-query is NOT covered by any generated sub-query. Leave null when covered=true.", + ) + + +# --------------------------------------------------------------------------- +# Result dataclasses +# --------------------------------------------------------------------------- + + +@dataclass +class CaseResult: + id: int + difficulty: int + domain: str + n_expected_queries: int + n_generated_queries: int + decomposition_count_match: bool + decomposition_semantic_coverage: bool + coverage_reasoning: str | None + expected_queries: list[str] + generated_queries: list[str] + error: str | None = None + + +@dataclass +class ModelReport: + model_name: str + timestamp: str + prompt_path: str + dataset_path: str + judge_model: str + total: int = 0 + errors: int = 0 + # decomposition_count_matching + count_match_passed: int = 0 + count_match_accuracy: float = 0.0 + # decomposition_semantic_coverage + semantic_coverage_passed: int = 0 + semantic_coverage_accuracy: float = 0.0 + by_difficulty: dict = field(default_factory=dict) + cases: list[dict] = field(default_factory=list) + + +# --------------------------------------------------------------------------- +# Core evaluation logic +# --------------------------------------------------------------------------- + + +def build_llm_messages(prompt: str, messages: list[dict]) -> list[dict]: + """Build the two-message list sent to the LLM, mirroring pipeline.py.""" + chat_history = "".join(f"{m['role']}: {m['content']}\n" for m in messages) + return [ + {"role": "system", "content": prompt}, + {"role": "user", "content": f"Here is the chat history: \n{chat_history}\n"}, + ] + + +def _model_kwargs(base_url: str) -> dict: + """Return call-time kwargs; omit vLLM-specific extra_body for OpenAI endpoints.""" + kwargs: dict = {"max_completion_tokens": 512} + if "openai.com" not in base_url: + kwargs["extra_body"] = {"chat_template_kwargs": {"enable_thinking": False}} + return kwargs + + +def format_prompt(template: str, last_message: str) -> str: + """Fill {current_date} and {query_language} placeholders.""" + try: + from langdetect import detect # type: ignore + + lang = detect(last_message) + except Exception: + lang = "en" + + return template.format( + current_date=DATASET_CURRENT_DATE, + query_language=lang, + ) + + +JUDGE_SYSTEM_PROMPT = """You are an impartial evaluator judging whether a set of GENERATED sub-queries semantically covers a set of EXPECTED sub-queries. + +A generated sub-query "covers" an expected sub-query when it targets the same information need: same entity/subject, same time period (if any), same dimension/aspect. Wording need not match — coverage is about retrieval intent. The generated split does NOT have to match the expected count; what matters is that every expected information need is addressed by at least one generated sub-query. + +Return JSON with: +- covered: boolean — true iff EVERY expected sub-query is semantically covered by at least one generated sub-query. +- reasoning: only set this field when covered=false; give one or two sentences naming which expected sub-query is missing. When covered=true, leave reasoning null. +""" + + +def _format_judge_input(expected: list[str], generated: list[str]) -> str: + exp_lines = "\n".join(f"{i + 1}. {q}" for i, q in enumerate(expected)) + gen_lines = "\n".join(f"{i + 1}. {q}" for i, q in enumerate(generated)) if generated else "(none)" + return f"EXPECTED sub-queries:\n{exp_lines}\n\nGENERATED sub-queries:\n{gen_lines}\n" + + +async def judge_semantic_coverage( + expected: list[str], + generated: list[str], + judge: ChatOpenAI, + judge_base_url: str, +) -> CoverageJudgment: + """Call the judge LLM to decide whether the generated split covers all expected.""" + if not expected: + return CoverageJudgment(covered=True, reasoning=None) + if not generated: + return CoverageJudgment(covered=False, reasoning="No generated queries produced.") + messages = [ + {"role": "system", "content": JUDGE_SYSTEM_PROMPT}, + {"role": "user", "content": _format_judge_input(expected, generated)}, + ] + judgment: CoverageJudgment = await judge.bind(**_model_kwargs(judge_base_url)).ainvoke(messages) + # Defensive: the judge is instructed to leave reasoning null on covered=true, + # but enforce it here too so callers can rely on the invariant. + if judgment.covered: + judgment.reasoning = None + return judgment + + +async def run_case( + case: dict, + prompt_template: str, + query_generator: ChatOpenAI, + model_base_url: str, + coverage_judge: ChatOpenAI, + judge_base_url: str, +) -> CaseResult: + """Run a single test case and return its result.""" + messages = case["messages"] + last_message = messages[-1]["content"] + prompt = format_prompt(prompt_template, last_message) + llm_messages = build_llm_messages(prompt, messages) + + expected_queries = [q["query"] for q in case["expected_queries"]["query_list"]] + n_expected = len(expected_queries) + + generated_queries: list[str] = [] + n_generated = 0 + error: str | None = None + + try: + output: SearchQueries = await query_generator.bind(**_model_kwargs(model_base_url)).ainvoke(llm_messages) + generated_queries = [q.query for q in output.query_list] + n_generated = len(generated_queries) + except Exception as exc: + error = f"generator: {exc}" + + count_match = n_generated == n_expected and error is None + + # Semantic coverage (judge) — runs even when counts mismatch, unless generator errored fatally. + covered = False + coverage_reasoning: str | None = None + if error is None: + try: + judgment = await judge_semantic_coverage( + expected_queries, generated_queries, coverage_judge, judge_base_url + ) + covered = judgment.covered + coverage_reasoning = judgment.reasoning + except Exception as exc: + error = f"coverage_judge: {exc}" + + return CaseResult( + id=case["id"], + difficulty=case["difficulty"], + domain=case["domain"], + n_expected_queries=n_expected, + n_generated_queries=n_generated, + decomposition_count_match=count_match, + decomposition_semantic_coverage=covered, + coverage_reasoning=coverage_reasoning, + expected_queries=expected_queries, + generated_queries=generated_queries, + error=error, + ) + + +async def run_eval_for_model( + model_name: str, + model_cfg: dict, + judge_cfg: dict, + dataset: list[dict], + prompt_template: str, + prompt_path: str, + dataset_path: str, +) -> ModelReport: + """Run all dataset cases for one model, with a tqdm progress bar.""" + model_base_url = model_cfg["base_url"] + # Use function_calling (the LangChain default) so that the Pydantic schema is + # passed to the model as a tool definition. json_mode only forces "some JSON" + # and relies on the prompt to describe the schema — v0-style prompts that do + # not prescribe the output shape would otherwise fail every case. + query_generator = ChatOpenAI( + base_url=model_base_url, + api_key=model_cfg.get("api_key", "EMPTY"), + model=model_cfg["model"], + temperature=0.1, + ).with_structured_output(SearchQueries, method="function_calling") + + judge_base_url = judge_cfg["base_url"] + judge_base = ChatOpenAI( + base_url=judge_base_url, + api_key=judge_cfg.get("api_key", "EMPTY"), + model=judge_cfg["model"], + temperature=0.0, + ) + coverage_judge = judge_base.with_structured_output(CoverageJudgment, method="function_calling") + + tasks = [ + run_case( + case, + prompt_template, + query_generator, + model_base_url, + coverage_judge, + judge_base_url, + ) + for case in dataset + ] + + results: list[CaseResult] = [] + for coro in tqdm( + asyncio.as_completed(tasks), + total=len(tasks), + desc=f"{model_name}", + unit="case", + leave=True, + ): + results.append(await coro) + + return build_model_report(results, model_name, prompt_path, dataset_path, judge_cfg["model"]) + + +# --------------------------------------------------------------------------- +# Reporting +# --------------------------------------------------------------------------- + + +def build_model_report( + results: list[CaseResult], + model_name: str, + prompt_path: str, + dataset_path: str, + judge_model: str, +) -> ModelReport: + report = ModelReport( + model_name=model_name, + timestamp=datetime.now().isoformat(), + prompt_path=prompt_path, + dataset_path=dataset_path, + judge_model=judge_model, + ) + + by_diff: dict[int, dict] = {} + + for r in results: + report.total += 1 + if r.error: + report.errors += 1 + if r.decomposition_count_match: + report.count_match_passed += 1 + if r.decomposition_semantic_coverage: + report.semantic_coverage_passed += 1 + + bucket = by_diff.setdefault( + r.difficulty, + {"total": 0, "errors": 0, "count_match_passed": 0, "semantic_coverage_passed": 0}, + ) + bucket["total"] += 1 + if r.error: + bucket["errors"] += 1 + if r.decomposition_count_match: + bucket["count_match_passed"] += 1 + if r.decomposition_semantic_coverage: + bucket["semantic_coverage_passed"] += 1 + + report.count_match_accuracy = report.count_match_passed / report.total if report.total else 0.0 + report.semantic_coverage_accuracy = report.semantic_coverage_passed / report.total if report.total else 0.0 + + for bucket in by_diff.values(): + total = bucket["total"] or 1 + bucket["count_match_accuracy"] = bucket["count_match_passed"] / total + bucket["semantic_coverage_accuracy"] = bucket["semantic_coverage_passed"] / total + report.by_difficulty = {str(k): v for k, v in sorted(by_diff.items())} + + for r in results: + report.cases.append(asdict(r)) + + return report + + +def print_model_summary(report: ModelReport) -> None: + print() + print(f" Model : {report.model_name} (judge: {report.judge_model})") + print(f" Total : {report.total} Errors: {report.errors}") + print( + f" count_match : {report.count_match_passed:>3}/{report.total:<3} ({report.count_match_accuracy:.1%})" + ) + print( + f" semantic_coverage : {report.semantic_coverage_passed:>3}/{report.total:<3} " + f"({report.semantic_coverage_accuracy:.1%})" + ) + for diff, stats in report.by_difficulty.items(): + err_tag = f" [{stats['errors']} errors]" if stats["errors"] else "" + print( + f" D{diff}: count_match {stats['count_match_passed']:>3}/{stats['total']:<3} " + f"({stats['count_match_accuracy']:.1%}) | " + f"coverage {stats['semantic_coverage_passed']:>3}/{stats['total']:<3} " + f"({stats['semantic_coverage_accuracy']:.1%})" + f"{err_tag}" + ) + + +# --------------------------------------------------------------------------- +# CLI +# --------------------------------------------------------------------------- + +HERE = Path(__file__).parent + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser( + description="Evaluate query_decomposition prompts (decomposition count matching + semantic coverage)." + ) + parser.add_argument( + "--dataset", + default=str(HERE / "datasets" / "query_decomposition.json"), + help="Path to the dataset JSON file", + ) + parser.add_argument( + "--prompt", + default=None, + help="Path to a specific prompt template file (default: evaluate all *.txt files in ./prompts/)", + ) + parser.add_argument("--output", default=None, help="Write JSON results to this file") + return parser.parse_args() + + +async def main() -> None: + args = parse_args() + + if not MODELS: + print("No models configured. Set BASE_URLS/API_KEYS/MODELS in the environment.") + return + + judge_cfg = _judge_config() + if not judge_cfg: + print("No judge model configured (JUDGE_BASE_URL/JUDGE_API_KEY/JUDGE_MODEL) and no fallback available.") + return + + # Validate model configs before doing any work + errors = [] + for name, cfg in MODELS.items(): + for key in ("base_url", "model", "api_key"): + if not cfg.get(key): + errors.append(f" [{name}] missing or None: '{key}'") + if errors: + print("Invalid model configuration:") + for e in errors: + print(e) + return + + # Resolve dataset path and enforce it stays within the benchmark directory + dataset_path = Path(args.dataset).resolve() + try: + dataset_path.relative_to(HERE) + except ValueError: + print(f"Error: --dataset path must be inside {HERE}") + return + + # Resolve prompt path(s) + if args.prompt: + prompt_path = Path(args.prompt).resolve() + try: + prompt_path.relative_to(HERE) + except ValueError: + print(f"Error: --prompt path must be inside {HERE}") + return + prompt_paths = [prompt_path] + else: + prompt_paths = sorted((HERE / "prompts").glob("*.txt")) + if not prompt_paths: + print(f"No prompt files found in {HERE / 'prompts'}") + return + + with dataset_path.open() as f: + dataset: list[dict] = json.load(f) + print(f"Loaded {len(dataset)} test cases from {dataset_path.name}") + print(f"Found {len(prompt_paths)} prompt(s): {', '.join(p.name for p in prompt_paths)}") + print(f"Evaluating {len(MODELS)} model(s): {', '.join(MODELS)}") + print(f"Judge model: {judge_cfg['model']}") + + # Run each prompt × each model + output_prompts: list[dict] = [] + for prompt_path in prompt_paths: + prompt_template = prompt_path.read_text() + prompt_rel = str(prompt_path.relative_to(HERE)) + sep = "-" * 72 + print(f"\n{sep}") + print(f"PROMPT: {prompt_path.name}") + print(sep) + + prompt_reports: list[ModelReport] = [] + for model_name, model_cfg in MODELS.items(): + report = await run_eval_for_model( + model_name=model_name, + model_cfg=model_cfg, + judge_cfg=judge_cfg, + dataset=dataset, + prompt_template=prompt_template, + prompt_path=prompt_rel, + dataset_path=str(dataset_path.relative_to(HERE)), + ) + print_model_summary(report) + prompt_reports.append(report) + + output_prompts.append( + { + "prompt": prompt_rel, + "models": [asdict(r) for r in prompt_reports], + } + ) + + # Optionally persist results + if args.output: + output_path = Path(args.output) + output_path.parent.mkdir(parents=True, exist_ok=True) + output = { + "dataset": str(dataset_path.relative_to(HERE)), + "judge_model": judge_cfg["model"], + "prompts": output_prompts, + } + with output_path.open("w") as f: + json.dump(output, f, indent=2) + print(f"\nResults written to {args.output}") + + +if __name__ == "__main__": + asyncio.run(main()) diff --git a/benchmarks/prompt_eval/eval_temporal_filter_generation.py b/benchmarks/prompt_eval/eval_temporal_filter_generation.py new file mode 100644 index 00000000..1f11e169 --- /dev/null +++ b/benchmarks/prompt_eval/eval_temporal_filter_generation.py @@ -0,0 +1,617 @@ +""" +Prompt evaluation script for temporal filter generation (v2). + +Dataset schema (temporal_filter.json): + { + "id": int, + "messages": [{"role": ..., "content": ...}, ...], + "query_with_temporal_filter": bool + } + +Pipeline output schema mirrors `openrag/components/pipeline.py` and +`eval_query_decomposition.py`: a `SearchQueries` object containing +`Query` items, each optionally carrying a `temporal_filters` list of +`TemporalPredicate` items. + +Metrics: + - filter_detection_accuracy : (TP + TN) / N + - filter_detection_precision : TP / (TP + FP) + - filter_detection_recall : TP / (TP + FN) + - filter_correctness : LLM-as-judge verdict on the generated filter + (evaluated only on TP cases — filter expected + AND generated). Ratio of cases where the judge + marks the filter correct. + +Positive class = a filter IS expected / was generated. + TP: expected=True, generated=True + FP: expected=False, generated=True + FN: expected=True, generated=False + TN: expected=False, generated=False + +Judge: + Called only on TP cases. Receives the chat history, the generated + `SearchQueries` JSON, and the current date. Returns a single boolean + verdict covering the whole generated output (all sub-queries, all + temporal filters considered together). + +Usage: + uv run python eval_temporal_filter_generation_v2.py [OPTIONS] + +Options: + --dataset PATH Path to the dataset JSON file + (default: datasets/temporal_filter.json) + --prompt PATH Path to a specific prompt template file. + If omitted, all *.txt files in ./prompts/ are evaluated. + --output PATH Write JSON results to this file. + +Required environment (candidate models — semicolon-separated): + BASE_URLS, API_KEYS, MODELS + +Optional environment (LLM-as-judge; defaults to the first candidate model +if unset): + JUDGE_BASE_URL, JUDGE_API_KEY, JUDGE_MODEL +""" + +from __future__ import annotations + +import argparse +import asyncio +import json +import os +from dataclasses import asdict, dataclass, field +from datetime import datetime +from pathlib import Path +from typing import Literal + +from dotenv import load_dotenv +from langchain_openai import ChatOpenAI +from pydantic import BaseModel, Field +from tqdm.asyncio import tqdm + +load_dotenv() + +# --------------------------------------------------------------------------- +# Models to evaluate — configured via .env +# --------------------------------------------------------------------------- + + +def _parse_env_list(key: str) -> list[str]: + return [v for v in os.environ.get(key, "").split(";") if v.strip()] + + +def _build_models() -> dict[str, dict]: + base_urls = _parse_env_list("BASE_URLS") + api_keys = _parse_env_list("API_KEYS") + models = _parse_env_list("MODELS") + if not (base_urls and api_keys and models): + return {} + if not (len(base_urls) == len(api_keys) == len(models)): + raise ValueError( + f"BASE_URLS ({len(base_urls)}), API_KEYS ({len(api_keys)}), and MODELS ({len(models)}) " + "must have the same number of semicolon-separated entries." + ) + return { + model: {"base_url": base_url, "api_key": api_key, "model": model} + for base_url, api_key, model in zip(base_urls, api_keys, models) + } + + +MODELS: dict[str, dict] = _build_models() + + +def _judge_config() -> dict | None: + base_url = os.environ.get("JUDGE_BASE_URL") + api_key = os.environ.get("JUDGE_API_KEY") + model = os.environ.get("JUDGE_MODEL") + if base_url and api_key and model: + return {"base_url": base_url, "api_key": api_key, "model": model} + if MODELS: + first = next(iter(MODELS.values())) + return dict(first) + return None + + +# --------------------------------------------------------------------------- +# Reference date — pinned so relative-date expressions in the dataset +# resolve deterministically across reruns. Gold labels were built against +# 2026-04-19; change this only if you regenerate the gold. +# --------------------------------------------------------------------------- + +DATASET_CURRENT_DATE = datetime(2026, 4, 19).strftime("%A, %B %d, %Y") + + +# --------------------------------------------------------------------------- +# Pydantic models — mirrors openrag/components/pipeline.py +# --------------------------------------------------------------------------- + + +class TemporalPredicate(BaseModel): + field: Literal["created_at"] = Field(default="created_at") + operator: Literal["==", "!=", ">", "<", ">=", "<="] + value: str = Field(description='ISO 8601 datetime with timezone, e.g. "2026-03-15T00:00:00+00:00".') + + +class Query(BaseModel): + query: str = Field(description="A semantically enriched, descriptive query for vector similarity search.") + temporal_filters: list[TemporalPredicate] | None = Field( + default=None, + description="Date predicates on created_at, AND-combined. Null when no creation-date restriction.", + ) + + +class SearchQueries(BaseModel): + """Search queries for semantic retrieval.""" + + query_list: list[Query] = Field(..., description="Search sub-queries to retrieve relevant documents.") + + +class FilterJudgment(BaseModel): + """LLM-as-judge verdict on the generated temporal filter(s).""" + + correct: bool = Field( + description=( + "True iff the generated temporal_filters, taken as a whole across all sub-queries, " + "correctly represent the time constraint implied by the user's last message. " + "For relative expressions (e.g. 'last week', 'past N days') allow reasonable " + "interpretations. Second-level precision is not required." + ) + ) + reasoning: str | None = Field( + default=None, + description="Only set when correct=false. One or two sentences naming the defect. Null when correct=true.", + ) + + +# --------------------------------------------------------------------------- +# Result dataclasses +# --------------------------------------------------------------------------- + + +@dataclass +class CaseResult: + id: int + expected_filter: bool + generated_filter: bool + detection_correct: bool # expected == generated + judge_verdict: bool | None # only set on TP cases + judge_reasoning: str | None + generated_queries: list[dict] # [{"query": str, "temporal_filters": list | None}] + error: str | None = None + + +@dataclass +class ModelReport: + model_name: str + timestamp: str + prompt_path: str + dataset_path: str + judge_model: str + total: int = 0 + errors: int = 0 + # Confusion matrix on filter detection + tp: int = 0 + fp: int = 0 + fn: int = 0 + tn: int = 0 + # Detection metrics + filter_detection_accuracy: float = 0.0 + filter_detection_precision: float = 0.0 + filter_detection_recall: float = 0.0 + filter_detection_f1: float = 0.0 + # Judge metrics — computed over TP cases only + judge_total: int = 0 + judge_correct: int = 0 + filter_correctness: float = 0.0 + cases: list[dict] = field(default_factory=list) + + +# --------------------------------------------------------------------------- +# Core evaluation logic +# --------------------------------------------------------------------------- + + +def build_llm_messages(prompt: str, messages: list[dict]) -> list[dict]: + """Build the two-message list sent to the LLM, mirroring pipeline.py.""" + chat_history = "".join(f"{m['role']}: {m['content']}\n" for m in messages) + return [ + {"role": "system", "content": prompt}, + {"role": "user", "content": f"Here is the chat history: \n{chat_history}\n"}, + ] + + +def _model_kwargs(base_url: str) -> dict: + """Return call-time kwargs; omit vLLM-specific extra_body for OpenAI endpoints.""" + kwargs: dict = {"max_completion_tokens": 512} + if "openai.com" not in base_url: + kwargs["extra_body"] = {"chat_template_kwargs": {"enable_thinking": False}} + return kwargs + + +def format_prompt(template: str, last_message: str, current_date: str) -> str: + """Fill {current_date} and {query_language} placeholders.""" + try: + from langdetect import detect # type: ignore + + lang = detect(last_message) + except Exception: + lang = "en" + + return template.format(current_date=current_date, query_language=lang) + + +JUDGE_SYSTEM_PROMPT = """You are an impartial evaluator judging whether the temporal filters generated for a user query are correct. + +You will receive: + - The conversation history (role-tagged messages). + - The current date the generator was working with. + - The generator's structured output: a JSON object of the shape + {{"query_list": [{{"query": str, "temporal_filters": [...] | null}}, ...]}} + where each `temporal_filters` entry, when present, is a list of predicates + of the form {{"field": "created_at", "operator": ">=|>|<=|<|==|!=", "value": "ISO 8601 UTC"}}, + AND-combined inside a single sub-query. + +Judge the temporal filters as a whole across all sub-queries — not each sub-query in isolation. + +Rules: + - Correct iff the generated temporal_filters together capture the creation-date + constraint implied by the user's last turn (and any necessary context from + earlier turns). + - Relative expressions ("last week", "past N days", "this month") admit + reasonable interpretations (e.g. Monday-to-Monday vs rolling window for + "last week"). Second-level precision is not required. + - Open-ended recency ("past N days/weeks", "this week", "since X") is correctly + represented by a lower-bound-only predicate — do NOT penalise a missing upper + bound in that case. + - Closed intervals ("yesterday", "on [date]", "in [month]", "in [year]", "between X and Y") + require both bounds. + - `before X` is correctly represented by a single `<` or `<=` predicate on X. + - `since X` is correctly represented by a single `>=` or `>` predicate on X. + - For exclusions (e.g. "last year except March"), the split into two + sub-queries with disjoint half-open intervals is correct — a single negated + predicate is not. + - For multi-entity splits that share a time period, the same filter should + appear on every sub-query. + +Return JSON with: + - correct: bool — true iff the generated temporal_filters are correct overall. + - reasoning: only set this field when correct=false; one or two sentences + naming the defect. When correct=true, leave reasoning null. +""" + + +def _format_judge_input(messages: list[dict], current_date: str, generated: SearchQueries) -> str: + history = "\n".join(f"{m['role']}: {m['content']}" for m in messages) + generated_json = json.dumps( + {"query_list": [q.model_dump() for q in generated.query_list]}, + ensure_ascii=False, + indent=2, + ) + return f"Current date: {current_date}\n\nChat history:\n{history}\n\nGenerator output:\n{generated_json}\n" + + +async def judge_filter( + messages: list[dict], + current_date: str, + generated: SearchQueries, + judge: ChatOpenAI, + judge_base_url: str, +) -> FilterJudgment: + """Call the judge LLM to decide whether the generated filter(s) are correct.""" + prompt_messages = [ + {"role": "system", "content": JUDGE_SYSTEM_PROMPT}, + {"role": "user", "content": _format_judge_input(messages, current_date, generated)}, + ] + judgment: FilterJudgment = await judge.bind(**_model_kwargs(judge_base_url)).ainvoke(prompt_messages) + if judgment.correct: + judgment.reasoning = None + return judgment + + +async def run_case( + case: dict, + prompt_template: str, + query_generator: ChatOpenAI, + model_base_url: str, + judge: ChatOpenAI, + judge_base_url: str, + current_date: str, +) -> CaseResult: + """Run a single test case and return its result.""" + messages = case["messages"] + last_message = messages[-1]["content"] + prompt = format_prompt(prompt_template, last_message, current_date) + llm_messages = build_llm_messages(prompt, messages) + + expected_filter: bool = bool(case["query_with_temporal_filter"]) + + generated_queries: list[dict] = [] + generated_filter = False + output: SearchQueries | None = None + error: str | None = None + + try: + output = await query_generator.bind(**_model_kwargs(model_base_url)).ainvoke(llm_messages) + generated_queries = [ + { + "query": q.query, + "temporal_filters": ( + [p.model_dump() for p in q.temporal_filters] if q.temporal_filters is not None else None + ), + } + for q in output.query_list + ] + generated_filter = any(q.temporal_filters for q in output.query_list) + except Exception as exc: + error = f"generator: {exc}" + + detection_correct = error is None and (expected_filter == generated_filter) + + judge_verdict: bool | None = None + judge_reasoning: str | None = None + # Judge only on TP cases — filter expected AND generated. + if error is None and expected_filter and generated_filter and output is not None: + try: + judgment = await judge_filter(messages, current_date, output, judge, judge_base_url) + judge_verdict = judgment.correct + judge_reasoning = judgment.reasoning + except Exception as exc: + error = f"judge: {exc}" + + return CaseResult( + id=case["id"], + expected_filter=expected_filter, + generated_filter=generated_filter, + detection_correct=detection_correct, + judge_verdict=judge_verdict, + judge_reasoning=judge_reasoning, + generated_queries=generated_queries, + error=error, + ) + + +async def run_eval_for_model( + model_name: str, + model_cfg: dict, + judge_cfg: dict, + dataset: list[dict], + prompt_template: str, + prompt_path: str, + dataset_path: str, + current_date: str, +) -> ModelReport: + """Run all dataset cases for one model, with a tqdm progress bar.""" + model_base_url = model_cfg["base_url"] + query_generator = ChatOpenAI( + base_url=model_base_url, + api_key=model_cfg.get("api_key", "EMPTY"), + model=model_cfg["model"], + temperature=0.1, + ).with_structured_output(SearchQueries, method="function_calling") + + judge_base_url = judge_cfg["base_url"] + judge = ChatOpenAI( + base_url=judge_base_url, + api_key=judge_cfg.get("api_key", "EMPTY"), + model=judge_cfg["model"], + temperature=0.0, + ).with_structured_output(FilterJudgment, method="function_calling") + + tasks = [ + run_case(case, prompt_template, query_generator, model_base_url, judge, judge_base_url, current_date) + for case in dataset + ] + + results: list[CaseResult] = [] + for coro in tqdm( + asyncio.as_completed(tasks), + total=len(tasks), + desc=f"{model_name}", + unit="case", + leave=True, + ): + results.append(await coro) + + return build_model_report(results, model_name, prompt_path, dataset_path, judge_cfg["model"]) + + +# --------------------------------------------------------------------------- +# Reporting +# --------------------------------------------------------------------------- + + +def _safe_div(num: float, den: float) -> float: + return num / den if den else 0.0 + + +def build_model_report( + results: list[CaseResult], + model_name: str, + prompt_path: str, + dataset_path: str, + judge_model: str, +) -> ModelReport: + report = ModelReport( + model_name=model_name, + timestamp=datetime.now().isoformat(), + prompt_path=prompt_path, + dataset_path=dataset_path, + judge_model=judge_model, + ) + + for r in results: + report.total += 1 + if r.error: + report.errors += 1 + continue + if r.expected_filter and r.generated_filter: + report.tp += 1 + elif not r.expected_filter and r.generated_filter: + report.fp += 1 + elif r.expected_filter and not r.generated_filter: + report.fn += 1 + else: + report.tn += 1 + if r.judge_verdict is not None: + report.judge_total += 1 + if r.judge_verdict: + report.judge_correct += 1 + + scored = report.total - report.errors + report.filter_detection_accuracy = _safe_div(report.tp + report.tn, scored) + report.filter_detection_precision = _safe_div(report.tp, report.tp + report.fp) + report.filter_detection_recall = _safe_div(report.tp, report.tp + report.fn) + p, rec = report.filter_detection_precision, report.filter_detection_recall + report.filter_detection_f1 = _safe_div(2 * p * rec, p + rec) + report.filter_correctness = _safe_div(report.judge_correct, report.judge_total) + + for r in results: + report.cases.append(asdict(r)) + + return report + + +def print_model_summary(report: ModelReport) -> None: + print() + print(f" Model : {report.model_name} (judge: {report.judge_model})") + print(f" Total : {report.total} Errors: {report.errors}") + print(f" Confusion : TP={report.tp} FP={report.fp} FN={report.fn} TN={report.tn}") + print(f" detection accuracy : {report.filter_detection_accuracy:.1%}") + print(f" detection precision : {report.filter_detection_precision:.1%}") + print(f" detection recall : {report.filter_detection_recall:.1%}") + print(f" detection F1 : {report.filter_detection_f1:.1%}") + if report.judge_total: + print( + f" filter correctness : {report.judge_correct:>3}/{report.judge_total:<3} " + f"({report.filter_correctness:.1%}) [judge called on TP only]" + ) + else: + print(" filter correctness : n/a (no TP cases)") + + +# --------------------------------------------------------------------------- +# CLI +# --------------------------------------------------------------------------- + +HERE = Path(__file__).parent + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser( + description="Evaluate temporal filter generation (v2): detection precision/recall + LLM-judged filter correctness on TP cases." + ) + parser.add_argument( + "--dataset", + default=str(HERE / "datasets" / "temporal_filter.json"), + help="Path to the dataset JSON file", + ) + parser.add_argument( + "--prompt", + default=None, + help="Path to a specific prompt template file (default: evaluate all *.txt files in ./prompts/)", + ) + parser.add_argument("--output", default=None, help="Write JSON results to this file") + return parser.parse_args() + + +async def main() -> None: + args = parse_args() + + if not MODELS: + print("No models configured. Set BASE_URLS/API_KEYS/MODELS in the environment.") + return + + judge_cfg = _judge_config() + if not judge_cfg: + print("No judge model configured (JUDGE_BASE_URL/JUDGE_API_KEY/JUDGE_MODEL) and no fallback available.") + return + + errors = [] + for name, cfg in MODELS.items(): + for key in ("base_url", "model", "api_key"): + if not cfg.get(key): + errors.append(f" [{name}] missing or None: '{key}'") + if errors: + print("Invalid model configuration:") + for e in errors: + print(e) + return + + dataset_path = Path(args.dataset).resolve() + try: + dataset_path.relative_to(HERE) + except ValueError: + print(f"Error: --dataset path must be inside {HERE}") + return + + if args.prompt: + prompt_path = Path(args.prompt).resolve() + try: + prompt_path.relative_to(HERE) + except ValueError: + print(f"Error: --prompt path must be inside {HERE}") + return + prompt_paths = [prompt_path] + else: + prompt_paths = sorted((HERE / "prompts").glob("*.txt")) + if not prompt_paths: + print(f"No prompt files found in {HERE / 'prompts'}") + return + + with dataset_path.open() as f: + dataset: list[dict] = json.load(f) + + current_date = DATASET_CURRENT_DATE + + print(f"Loaded {len(dataset)} test cases from {dataset_path.name}") + print(f"Found {len(prompt_paths)} prompt(s): {', '.join(p.name for p in prompt_paths)}") + print(f"Evaluating {len(MODELS)} model(s): {', '.join(MODELS)}") + print(f"Judge model : {judge_cfg['model']}") + print(f"Current date: {current_date}") + + output_prompts: list[dict] = [] + for prompt_path in prompt_paths: + prompt_template = prompt_path.read_text() + prompt_rel = str(prompt_path.relative_to(HERE)) + sep = "-" * 72 + print(f"\n{sep}") + print(f"PROMPT: {prompt_path.name}") + print(sep) + + prompt_reports: list[ModelReport] = [] + for model_name, model_cfg in MODELS.items(): + report = await run_eval_for_model( + model_name=model_name, + model_cfg=model_cfg, + judge_cfg=judge_cfg, + dataset=dataset, + prompt_template=prompt_template, + prompt_path=prompt_rel, + dataset_path=str(dataset_path.relative_to(HERE)), + current_date=current_date, + ) + print_model_summary(report) + prompt_reports.append(report) + + output_prompts.append( + { + "prompt": prompt_rel, + "models": [asdict(r) for r in prompt_reports], + } + ) + + if args.output: + output_path = Path(args.output) + output_path.parent.mkdir(parents=True, exist_ok=True) + output = { + "dataset": str(dataset_path.relative_to(HERE)), + "judge_model": judge_cfg["model"], + "current_date": current_date, + "prompts": output_prompts, + } + with output_path.open("w") as f: + json.dump(output, f, indent=2) + print(f"\nResults written to {args.output}") + + +if __name__ == "__main__": + asyncio.run(main()) diff --git a/benchmarks/prompt_eval/prompts/query_contextualizer_tmpl_v0.txt b/benchmarks/prompt_eval/prompts/query_contextualizer_tmpl_v0.txt new file mode 100644 index 00000000..fccdad4b --- /dev/null +++ b/benchmarks/prompt_eval/prompts/query_contextualizer_tmpl_v0.txt @@ -0,0 +1,39 @@ +From a chat history, reformulate the user's last message into one or more autonomous search queries by integrating relevant context from previous conversations. + +# Task: +- For follow-up questions: Reformulate by replacing pronouns with corresponding nouns and add relevant keywords +- For independent questions: Apply only minimal corrections (grammar, keywords) +- For simple thank you messages: No reformulation necessary +- Enrich queries with prior conversation context only when relevant to the query. + +## Query decomposition guidelines +- Use full, descriptive sentences (not isolated keywords) to improve semantic retrieval. +- Inject previous chat history only when it is directly relevant to the current query, as unnecessary context can degrade search quality. +- Split complex queries into independent sub-queries, especially for comparisons or multi-dimensional questions. + - Distinct time periods + - "What are the sales figures for Product A in Q1 and Q2?" should give two sub-queries (one per quarter). + - "Evolution of carbon footprint in the last 2 years" should give two sub-queries (one per year). + + - Distinct entities + - "What are the sales figures for Product A and Product B?" should give two sub-queries (one per product). + + - Unrelated dimensions + - "What are the sales figures for Product A in the US and in Europe?" should give two sub-queries (one per region). + +## Temporal filter rules +For "past N days/hours/weeks/months" and "this week/quarter" queries, use only a lower-bound filter. Add an upper bound only for closed intervals ("between X and Y", "only yesterday", "on [specific date]"). + +# additional information +- Current date: {current_date} + +# Requirements: +- Preserve the original tone and intention +- Do not add information beyond necessary context +- Do not answer the questions, only reformulate them +- Respond in the language of the user's last message : {query_language} + +# Examples: +- User: I'm planning a trip to Italy and I'm interested in historical monuments and local cuisine. +- Assistant: Italy offers a wealth of history and culinary delights. +- User: What are the must-see sites? +Reformulated query: What are the must-see historical monuments and local cuisine restaurants in Italy? diff --git a/benchmarks/prompt_eval/prompts/query_contextualizer_tmpl_v1.txt b/benchmarks/prompt_eval/prompts/query_contextualizer_tmpl_v1.txt new file mode 100644 index 00000000..67a134d3 --- /dev/null +++ b/benchmarks/prompt_eval/prompts/query_contextualizer_tmpl_v1.txt @@ -0,0 +1,64 @@ +Produce a JSON object listing sub-queries derived from the user's last message. + +Output shape (return this JSON object, nothing else): +{{"query_list": [ {{"query": "", "temporal_filters": }} ]}} + +Current date: {current_date} +Language for `query` field: {query_language} +Timestamps: UTC (`+00:00`). Week starts Monday. + +# Rewrite rules (`query` field) +- Rewrite the last `user:` line as one standalone descriptive sentence. +- Use earlier turns only to resolve pronouns or add context directly relevant to the query; do not inject unrelated history. +- For independent questions: minimal changes (grammar, missing keywords). +- Greetings / thanks: copy verbatim, `temporal_filters: null`. +- Do not answer. Only reformulate. + +# Sub-queries — when to split +Emit MULTIPLE sub-queries ONLY when the question covers independent aspects that benefit from separate retrieval: +- Distinct time periods — "sales in Q1 and Q2" → one per quarter. +- Distinct entities — "Product A and Product B" → one per product. +- Unrelated dimensions — "sales in US and in Europe" → one per region. +- Exclusions — "last year except March" → one per remaining range. +- Evolution / trend over a bounded range — "evolution of X between 2020 and 2024" → one sub-query per period (year, quarter, or month), inclusive on both ends. Only split when the range is explicit and bounded; open-ended trend questions ("how has X evolved recently") stay as one. +Otherwise emit ONE sub-query. Do not split for synonyms or rephrasings of the same question. + +# Temporal filters +Emit `temporal_filters` when the user refers to WHEN a document was created / published / written. Leave null if the date only describes the topic. +Each predicate: `field="created_at"`, `operator` ∈ {{`>=`, `>`, `<=`, `<`}}, `value` = ISO 8601 UTC. +Predicates are AND-combined. Always use half-open intervals: `>= start` AND `< next_start`. + +Date resolution (vs Current date): +- today / now → `[today, tomorrow)` +- yesterday → `[yesterday, today)` +- this / last week → Monday-to-Monday +- this / last month → 1st-to-1st +- this / last year → Jan 1-to-Jan 1 +- past N days/weeks/months/years → `[Current − N, tomorrow)` +- recent / latest → past 90 days (~3 months, NOT a year): `[Current − 90 days, tomorrow)` +- since X → ONE predicate only: `>= X`. Do NOT add an upper bound. +- before X → one predicate `< X` +- bare MONTH (no year) → nearest past occurrence +- in YEAR → `[Jan 1 YEAR, Jan 1 YEAR+1)` + +For exclusions, split into two sub-queries covering each remaining range. Never negate with a single predicate. + +# Examples (Current date = Wednesday, April 15, 2026) + +User: "Summary of meeting notes uploaded in the past month" +{{"query_list":[{{"query":"Summary of meeting notes uploaded in the past month","temporal_filters":[{{"field":"created_at","operator":">=","value":"2026-03-15T00:00:00+00:00"}},{{"field":"created_at","operator":"<","value":"2026-04-16T00:00:00+00:00"}}]}}]}} + +User: "Sales figures for Product A and Product B" +{{"query_list":[{{"query":"Sales figures for Product A","temporal_filters":null}},{{"query":"Sales figures for Product B","temporal_filters":null}}]}} + +User: "Documents from last year except March" +{{"query_list":[{{"query":"Documents from January or February 2025","temporal_filters":[{{"field":"created_at","operator":">=","value":"2025-01-01T00:00:00+00:00"}},{{"field":"created_at","operator":"<","value":"2025-03-01T00:00:00+00:00"}}]}},{{"query":"Documents from April to December 2025","temporal_filters":[{{"field":"created_at","operator":">=","value":"2025-04-01T00:00:00+00:00"}},{{"field":"created_at","operator":"<","value":"2026-01-01T00:00:00+00:00"}}]}}]}} + +User: "Q3 2024 reporting template" +{{"query_list":[{{"query":"Q3 2024 reporting template","temporal_filters":null}}]}} + +User: "Evolution of the Department of Justice budget between 2020 and 2022" +{{"query_list":[{{"query":"Department of Justice budget in 2020","temporal_filters":null}},{{"query":"Department of Justice budget in 2021","temporal_filters":null}},{{"query":"Department of Justice budget in 2022","temporal_filters":null}}]}} + +User: "Latest safety bulletins" +{{"query_list":[{{"query":"Latest safety bulletins","temporal_filters":[{{"field":"created_at","operator":">=","value":"2026-01-15T00:00:00+00:00"}},{{"field":"created_at","operator":"<","value":"2026-04-16T00:00:00+00:00"}}]}}]}} \ No newline at end of file diff --git a/benchmarks/prompt_eval/pyproject.toml b/benchmarks/prompt_eval/pyproject.toml new file mode 100644 index 00000000..f93cc7e6 --- /dev/null +++ b/benchmarks/prompt_eval/pyproject.toml @@ -0,0 +1,13 @@ +[project] +name = "prompt-eval" +version = "0.1.0" +description = "Add your description here" +readme = "README.md" +requires-python = ">=3.12" +dependencies = [ + "langchain-openai>=1.1.12", + "openai>=2.31.0", + "langdetect>=1.0.9", + "tqdm>=4.66", + "python-dotenv>=1.2.2", +] diff --git a/benchmarks/prompt_eval/results/result_filter_generation.json b/benchmarks/prompt_eval/results/result_filter_generation.json new file mode 100644 index 00000000..d099f57f --- /dev/null +++ b/benchmarks/prompt_eval/results/result_filter_generation.json @@ -0,0 +1,3214 @@ +{ + "dataset": "datasets/temporal_filter.json", + "judge_model": "Qwen3-VL-8B-Instruct-FP8", + "current_date": "Sunday, April 19, 2026", + "prompts": [ + { + "prompt": "prompts/query_contextualizer_tmpl_v0.txt", + "models": [ + { + "model_name": "Mistral-Small-3.1-24B-Instruct-2503", + "timestamp": "2026-04-21T16:29:21.494096", + "prompt_path": "prompts/query_contextualizer_tmpl_v0.txt", + "dataset_path": "datasets/temporal_filter.json", + "judge_model": "Qwen3-VL-8B-Instruct-FP8", + "total": 40, + "errors": 0, + "tp": 9, + "fp": 0, + "fn": 11, + "tn": 20, + "filter_detection_accuracy": 0.725, + "filter_detection_precision": 1.0, + "filter_detection_recall": 0.45, + "filter_detection_f1": 0.6206896551724138, + "judge_total": 9, + "judge_correct": 7, + "filter_correctness": 0.7777777777777778, + "cases": [ + { + "id": 24, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Ciao!", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 9, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Show recent SRE incident reports", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 20, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Latest safety bulletins", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 21, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Q3 2024 revenue report", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 5, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Contracts drafted in March 2026", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 39, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "How to reset the password on the corporate VPN?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 38, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What was the outcome of the Nuremberg trials?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 25, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Thanks, that's very helpful.", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 12, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Audit logs recorded in 2024", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 31, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What is the current minimum wage in Germany?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 40, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are best practices for REST API versioning?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 33, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Summary of the Paris Olympics 2024", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 13, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Slide decks produced in February 2026", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 36, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the side effects of ibuprofen?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 32, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Explain how OAuth 2.0 works", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 23, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What is the standard onboarding procedure for new engineers?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 3, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Code reviews posted since April 13, 2026", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 6, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What court decisions were issued in 2025?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 35, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What happened in the 2016 US presidential election?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 27, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What was the carbon tax rate in France in 2024?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 22, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What is the 2025 strategic roadmap?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 15, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Design specs authored on January 15, 2026", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 30, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What were the findings in the 2024 annual sustainability report?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 29, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the key clauses in the 2023 French labor reform?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 37, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the main feature differences between Python 3.11 and Python 3.12?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 34, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the effects of climate change on Arctic sea ice between 2010 and 2020?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 7, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Lab results from April 12, 2026 to April 19, 2026", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 18, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Commits pushed since 2026-04-13T00:00:00+00:00", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 17, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Emails written from the Paris office in March 2026", + "temporal_filters": null + }, + { + "query": "Emails written from the Berlin office in March 2026", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 26, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What was the role of subprime mortgages in the 2008 financial crisis?", + "temporal_filters": null + }, + { + "query": "What caused the 2020 economic downturn?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 8, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Patient records updated in the past 30 days", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-03-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 10, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Legal briefs submitted since January 1, 2026", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-01-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 28, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Trends in AI research from 2020 to 2021", + "temporal_filters": null + }, + { + "query": "Trends in AI research from 2022 to 2023", + "temporal_filters": null + }, + { + "query": "Trends in AI research from 2024 to 2025", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 1, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Show design specs authored on April 19, 2026", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 2, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the incident reports filed on April 18, 2026?", + "temporal_filters": [ + { + "field": "created_at", + "operator": "==", + "value": "2026-04-18T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 16, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "incident reports for Checkout service filed on April 18, 2026", + "temporal_filters": [ + { + "field": "created_at", + "operator": "==", + "value": "2026-04-18T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 11, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": false, + "judge_reasoning": "The filter uses an arbitrary date (March 15) instead of the actual cutoff (March 1, 2026). 'Before March 2026' should exclude all records created on or after March 1, 2026, not March 15.", + "generated_queries": [ + { + "query": "Invoices issued before March 2026", + "temporal_filters": [ + { + "field": "created_at", + "operator": "<", + "value": "2026-03-15T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 19, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Show me the Q1 2026 financial filings submitted last week", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-12T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 4, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": false, + "judge_reasoning": "The temporal filter uses a fixed date range from April 12 to April 19, 2024, which is not 'last week' relative to the current date (April 19, 2026). It should cover the period from April 12 to April 19, 2026.", + "generated_queries": [ + { + "query": "Show the quarterly filings submitted last week in the regulatory portal", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2024-04-12T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<=", + "value": "2024-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 14, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Show audit logs from 2025-01-01 to 2025-02-29", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2025-01-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<=", + "value": "2025-02-29T23:59:59+00:00" + } + ] + }, + { + "query": "Show audit logs from 2025-04-01 to 2025-12-31", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2025-04-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<=", + "value": "2025-12-31T23:59:59+00:00" + } + ] + } + ], + "error": null + } + ] + }, + { + "model_name": "Qwen3-VL-8B-Instruct-FP8", + "timestamp": "2026-04-21T16:29:26.382727", + "prompt_path": "prompts/query_contextualizer_tmpl_v0.txt", + "dataset_path": "datasets/temporal_filter.json", + "judge_model": "Qwen3-VL-8B-Instruct-FP8", + "total": 40, + "errors": 0, + "tp": 4, + "fp": 0, + "fn": 16, + "tn": 20, + "filter_detection_accuracy": 0.6, + "filter_detection_precision": 1.0, + "filter_detection_recall": 0.2, + "filter_detection_f1": 0.33333333333333337, + "judge_total": 4, + "judge_correct": 3, + "filter_correctness": 0.75, + "cases": [ + { + "id": 22, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What is the 2025 strategic roadmap?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 37, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the main feature differences between Python 3.11 and Python 3.12?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 18, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "List all commits pushed to the repository since last Monday, April 14, 2026.", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 25, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [], + "error": null + }, + { + "id": 39, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "How to reset password for corporate VPN", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 20, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Latest safety bulletins", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 8, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What patient records were updated in the past 30 days?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 2, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What incident reports were filed yesterday?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 6, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What court decisions were issued last year?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 5, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What were the key terms and conditions included in the contracts drafted last month?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 4, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Show me the quarterly filings filed last week.", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 9, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Recent SRE incident reports", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 13, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Slide decks produced in February", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 30, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What were the key findings in the 2024 annual sustainability report?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 27, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What was the carbon tax rate in France in 2024?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 1, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Show me design specs authored today.", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 32, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "How does OAuth 2.0 work?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 11, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Invoices issued before March 2026", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 33, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Summary of the Paris Olympics 2024", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 23, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What is the standard onboarding procedure for new engineers?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 40, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the best practices for versioning REST APIs?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 31, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What is the current minimum wage in Germany as of April 2026?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 12, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What were the audit logs recorded in 2024?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 36, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the common and serious side effects of ibuprofen?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 21, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What were the revenue figures for Q3 2024?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 26, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What were the key causes and economic impacts of the 2020 economic downturn?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 24, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the must-see historical monuments and local cuisine restaurants in Italy?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 29, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the key clauses in the 2023 French labor reform?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 38, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What were the key outcomes and historical significance of the Nuremberg trials?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 15, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Design specs authored on January 15, 2026.", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 28, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Summarize key trends in AI research from 2020 to 2025.", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 19, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Show me the Q1 2026 financial filings submitted last week.", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 16, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Show me the incident reports filed for the Checkout service in production that were filed yesterday.", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 10, + "expected_filter": true, + "generated_filter": false, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What legal briefs have been submitted since January 1, 2026?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 35, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What were the key events, candidates, and outcomes of the 2016 US presidential election?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 34, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the effects of climate change on Arctic sea ice between 2010 and 2020?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 7, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Lab results from the past 7 days", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-12T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 3, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Code reviews posted this week", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-12T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 14, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": false, + "judge_reasoning": "The filter incorrectly excludes March by using a single closed interval, whereas the user's request requires splitting into two sub-queries: one for March and one for the rest of the year, with disjoint time ranges.", + "generated_queries": [ + { + "query": "Show me audit logs from last year except March.", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2025-04-19T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 17, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Emails written last month from the Paris office", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-03-15T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-15T00:00:00+00:00" + } + ] + }, + { + "query": "Emails written last month from the Berlin office", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-03-15T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-15T00:00:00+00:00" + } + ] + } + ], + "error": null + } + ] + } + ] + }, + { + "prompt": "prompts/query_contextualizer_tmpl_v1.txt", + "models": [ + { + "model_name": "Mistral-Small-3.1-24B-Instruct-2503", + "timestamp": "2026-04-21T16:29:38.593355", + "prompt_path": "prompts/query_contextualizer_tmpl_v1.txt", + "dataset_path": "datasets/temporal_filter.json", + "judge_model": "Qwen3-VL-8B-Instruct-FP8", + "total": 40, + "errors": 0, + "tp": 20, + "fp": 0, + "fn": 0, + "tn": 20, + "filter_detection_accuracy": 1.0, + "filter_detection_precision": 1.0, + "filter_detection_recall": 1.0, + "filter_detection_f1": 1.0, + "judge_total": 20, + "judge_correct": 19, + "filter_correctness": 0.95, + "cases": [ + { + "id": 24, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Hello!", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 22, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "2025 strategic roadmap", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 25, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Thanks, that's very helpful.", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 31, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Current minimum wage in Germany", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 40, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Best practices for REST API versioning", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 21, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Q3 2024 revenue report", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 32, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Explain how OAuth 2.0 works", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 27, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Carbon tax rate in France in 2024", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 38, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What was the outcome of the Nuremberg trials?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 33, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Summary of the Paris Olympics 2024", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 39, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "How do I reset my password on the corporate VPN?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 29, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Key clauses in the 2023 French labor reform", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 23, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What is the standard onboarding procedure for new engineers?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 36, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the side effects of ibuprofen?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 35, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What happened in the 2016 US presidential election", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 30, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What were the findings in the 2024 annual sustainability report", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 37, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Compare the main feature differences between Python 3.11 and 3.12", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 26, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "The role of subprime mortgages in the 2008 financial crisis", + "temporal_filters": null + }, + { + "query": "The 2020 economic downturn", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 18, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Commits pushed since last Monday", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-13T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 11, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Invoices issued before March 2026", + "temporal_filters": [ + { + "field": "created_at", + "operator": "<", + "value": "2026-03-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 10, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Legal briefs submitted since January 1, 2026", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-01-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 1, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Design specs authored today", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-19T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-20T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 5, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Contracts drafted last month", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-03-19T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 20, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Latest safety bulletins", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2025-12-20T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 2, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Incident reports filed yesterday", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-18T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 3, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Code reviews posted this week", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-13T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-20T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 9, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Recent SRE incident reports", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2025-11-19T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 4, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Quarterly filings filed last week", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-13T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 6, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Court decisions issued last year", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2025-01-19T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-01-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 7, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Lab results from the past 7 days", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-12T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 12, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Audit logs recorded in 2024", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2024-01-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2025-01-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 8, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Patient records updated in the past 30 days", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-03-19T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 16, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Incident reports filed for Checkout service flakiness in production yesterday", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-18T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 15, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Design specs authored on January 15, 2026", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-01-15T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-01-16T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 19, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Show me the Q1 2026 financial filings submitted last week", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-13T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-16T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 28, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Trends in AI research in 2020", + "temporal_filters": null + }, + { + "query": "Trends in AI research in 2021", + "temporal_filters": null + }, + { + "query": "Trends in AI research in 2022", + "temporal_filters": null + }, + { + "query": "Trends in AI research in 2023", + "temporal_filters": null + }, + { + "query": "Trends in AI research in 2024", + "temporal_filters": null + }, + { + "query": "Trends in AI research in 2025", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 13, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": false, + "judge_reasoning": "The query asks for slide decks produced in February, but the current date is April 19, 2026. February 2026 has not yet occurred, so no such slide decks can exist. The temporal filter incorrectly assumes February 2026 is in the past, which is invalid.", + "generated_queries": [ + { + "query": "Slide decks produced in February 2026", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-02-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-03-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 17, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Emails written last month from the Paris office", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-03-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-01T00:00:00+00:00" + } + ] + }, + { + "query": "Emails written last month from the Berlin office", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-03-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 14, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Audit logs from January or February 2025", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2025-01-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2025-03-01T00:00:00+00:00" + } + ] + }, + { + "query": "Audit logs from April to December 2025", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2025-04-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-01-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 34, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Effects of climate change on Arctic sea ice in 2010", + "temporal_filters": null + }, + { + "query": "Effects of climate change on Arctic sea ice in 2011", + "temporal_filters": null + }, + { + "query": "Effects of climate change on Arctic sea ice in 2012", + "temporal_filters": null + }, + { + "query": "Effects of climate change on Arctic sea ice in 2013", + "temporal_filters": null + }, + { + "query": "Effects of climate change on Arctic sea ice in 2014", + "temporal_filters": null + }, + { + "query": "Effects of climate change on Arctic sea ice in 2015", + "temporal_filters": null + }, + { + "query": "Effects of climate change on Arctic sea ice in 2016", + "temporal_filters": null + }, + { + "query": "Effects of climate change on Arctic sea ice in 2017", + "temporal_filters": null + }, + { + "query": "Effects of climate change on Arctic sea ice in 2018", + "temporal_filters": null + }, + { + "query": "Effects of climate change on Arctic sea ice in 2019", + "temporal_filters": null + }, + { + "query": "Effects of climate change on Arctic sea ice in 2020", + "temporal_filters": null + } + ], + "error": null + } + ] + }, + { + "model_name": "Qwen3-VL-8B-Instruct-FP8", + "timestamp": "2026-04-21T16:29:51.509525", + "prompt_path": "prompts/query_contextualizer_tmpl_v1.txt", + "dataset_path": "datasets/temporal_filter.json", + "judge_model": "Qwen3-VL-8B-Instruct-FP8", + "total": 40, + "errors": 1, + "tp": 19, + "fp": 2, + "fn": 0, + "tn": 18, + "filter_detection_accuracy": 0.9487179487179487, + "filter_detection_precision": 0.9047619047619048, + "filter_detection_recall": 1.0, + "filter_detection_f1": 0.9500000000000001, + "judge_total": 19, + "judge_correct": 17, + "filter_correctness": 0.8947368421052632, + "cases": [ + { + "id": 22, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "2025 strategic roadmap", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 31, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Current minimum wage in Germany", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 23, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Standard onboarding procedure for new engineers", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 27, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Carbon tax rate in France in 2024", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 32, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Explain how OAuth 2.0 works", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 24, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Hello!", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 21, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Q3 2024 revenue report", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 25, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Thanks, that's very helpful.", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 39, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "How to reset password for corporate VPN", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 40, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Best practices for REST API versioning", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 36, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What are the side effects of ibuprofen?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 33, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Summary of the Paris Olympics 2024", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 35, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "What happened in the 2016 US presidential election?", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 30, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Findings in the 2024 annual sustainability report", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 37, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Compare the main feature differences between Python 3.11 and Python 3.12", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 38, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Outcome of the Nuremberg trials", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 26, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "The role of subprime mortgages in the 2008 financial crisis", + "temporal_filters": null + }, + { + "query": "The role of subprime mortgages in the 2020 economic downturn", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 29, + "expected_filter": false, + "generated_filter": false, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Key clauses in the 2023 French labor reform", + "temporal_filters": null + } + ], + "error": null + }, + { + "id": 10, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Legal briefs submitted since January 1, 2026", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-01-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 3, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Code reviews posted this week", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-13T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-20T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 4, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Quarterly filings filed last week", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-12T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 34, + "expected_filter": false, + "generated_filter": true, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Effects of climate change on Arctic sea ice between 2010 and 2020", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2010-01-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2021-01-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 28, + "expected_filter": false, + "generated_filter": true, + "detection_correct": false, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Trends in AI research from 2020 to 2025", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2020-01-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-01-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 16, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Incident reports filed yesterday for Checkout service", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-18T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 9, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Recent SRE incident reports", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-03-19T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 12, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Audit logs recorded in 2024", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2024-01-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2025-01-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 13, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Slide decks produced in February", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-02-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-03-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 2, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Incident reports filed yesterday", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-18T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 6, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Court decisions issued last year", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2025-01-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-01-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 19, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Q1 financial filings submitted last week", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-12T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 15, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Design specs authored on January 15, 2026", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-01-15T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-01-16T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 11, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Invoices issued before March 2026", + "temporal_filters": [ + { + "field": "created_at", + "operator": "<", + "value": "2026-03-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 1, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "design specs authored today", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-19T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-20T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 7, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Lab results from the past 7 days", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-12T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 18, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": false, + "judge_reasoning": "The filter incorrectly includes an upper bound for 'since last Monday', which implies a closed interval. 'Since last Monday' should only have a lower bound (>=) to capture all commits from that point forward, without an upper limit.", + "generated_queries": [ + { + "query": "Commits pushed since last Monday", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-13T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-20T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 20, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": false, + "judge_reasoning": "The filter incorrectly excludes records created on April 16, 2026, which is the current date. Since the user asked for 'Latest safety bulletins', the filter should include records up to and including today.", + "generated_queries": [ + { + "query": "Latest safety bulletins", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-01-15T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-16T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 5, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Contracts drafted last month", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-03-15T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-16T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 14, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Audit logs from January or February last year", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2025-01-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2025-03-01T00:00:00+00:00" + } + ] + }, + { + "query": "Audit logs from April to December last year", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2025-04-01T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-01-01T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 17, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": true, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Emails written last month from the Paris office", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-03-15T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-16T00:00:00+00:00" + } + ] + }, + { + "query": "Emails written last month from the Berlin office", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-03-15T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-16T00:00:00+00:00" + } + ] + } + ], + "error": null + }, + { + "id": 8, + "expected_filter": true, + "generated_filter": true, + "detection_correct": true, + "judge_verdict": null, + "judge_reasoning": null, + "generated_queries": [ + { + "query": "Patient records updated in the past 30 days", + "temporal_filters": [ + { + "field": "created_at", + "operator": ">=", + "value": "2026-04-00T00:00:00+00:00" + }, + { + "field": "created_at", + "operator": "<", + "value": "2026-04-19T00:00:00+00:00" + } + ] + } + ], + "error": "judge: 'NoneType' object has no attribute 'correct'" + } + ] + } + ] + } + ] +} \ No newline at end of file diff --git a/benchmarks/prompt_eval/results/result_query_decomposition.json b/benchmarks/prompt_eval/results/result_query_decomposition.json new file mode 100644 index 00000000..999df2a0 --- /dev/null +++ b/benchmarks/prompt_eval/results/result_query_decomposition.json @@ -0,0 +1,6163 @@ +{ + "dataset": "datasets/query_decomposition.json", + "judge_model": "Qwen3-VL-8B-Instruct-FP8", + "prompts": [ + { + "prompt": "prompts/query_contextualizer_tmpl_v0.txt", + "models": [ + { + "model_name": "Mistral-Small-3.1-24B-Instruct-2503", + "timestamp": "2026-04-19T16:24:20.289415", + "prompt_path": "prompts/query_contextualizer_tmpl_v0.txt", + "dataset_path": "datasets/query_decomposition.json", + "judge_model": "Qwen3-VL-8B-Instruct-FP8", + "total": 80, + "errors": 0, + "count_match_passed": 66, + "count_match_accuracy": 0.825, + "semantic_coverage_passed": 74, + "semantic_coverage_accuracy": 0.925, + "by_difficulty": { + "1": { + "total": 20, + "errors": 0, + "count_match_passed": 19, + "semantic_coverage_passed": 20, + "count_match_accuracy": 0.95, + "semantic_coverage_accuracy": 1.0 + }, + "2": { + "total": 40, + "errors": 0, + "count_match_passed": 39, + "semantic_coverage_passed": 38, + "count_match_accuracy": 0.975, + "semantic_coverage_accuracy": 0.95 + }, + "3": { + "total": 20, + "errors": 0, + "count_match_passed": 8, + "semantic_coverage_passed": 16, + "count_match_accuracy": 0.4, + "semantic_coverage_accuracy": 0.8 + } + }, + "cases": [ + { + "id": 3, + "difficulty": 1, + "domain": "legal", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Statute of limitations for breach of contract in California" + ], + "generated_queries": [ + "Statute of limitations for breach of contract in California" + ], + "error": null + }, + { + "id": 5, + "difficulty": 1, + "domain": "science", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average mutation rate of human mitochondrial DNA" + ], + "generated_queries": [ + "What is the average mutation rate of human mitochondrial DNA?" + ], + "error": null + }, + { + "id": 13, + "difficulty": 1, + "domain": "agriculture", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Optimal irrigation frequency for winter wheat during the tillering stage" + ], + "generated_queries": [ + "Optimal irrigation frequency for winter wheat during the tillering stage" + ], + "error": null + }, + { + "id": 14, + "difficulty": 1, + "domain": "energy", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Capacity factor of the wind farm in Normandy" + ], + "generated_queries": [ + "What is the capacity factor of the wind farm in Normandy?" + ], + "error": null + }, + { + "id": 9, + "difficulty": 1, + "domain": "real_estate", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average rental yield for residential properties in Lyon" + ], + "generated_queries": [ + "What is the average rental yield for residential properties in Lyon?" + ], + "error": null + }, + { + "id": 1, + "difficulty": 1, + "domain": "finance", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total revenue in Q3 2024" + ], + "generated_queries": [ + "What was the total revenue in Q3 2024?" + ], + "error": null + }, + { + "id": 19, + "difficulty": 1, + "domain": "insurance", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Loss ratio for the commercial property insurance line" + ], + "generated_queries": [ + "What is the loss ratio for the commercial property line of the company?" + ], + "error": null + }, + { + "id": 67, + "difficulty": 3, + "domain": "science", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Known interactions between metformin and ibuprofen" + ], + "generated_queries": [ + "What are the known interactions between metformin and ibuprofen?" + ], + "error": null + }, + { + "id": 16, + "difficulty": 1, + "domain": "public_policy", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current carbon tax rate per ton of CO2 in France" + ], + "generated_queries": [ + "What is the carbon tax rate per ton of CO2 in France?" + ], + "error": null + }, + { + "id": 18, + "difficulty": 1, + "domain": "telecommunications", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Packet loss rate on the fiber backbone between Paris and Marseille" + ], + "generated_queries": [ + "What is the packet loss rate on the fiber backbone between Paris and Marseille?" + ], + "error": null + }, + { + "id": 12, + "difficulty": 1, + "domain": "logistics", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current lead time for ocean freight from Shanghai to Rotterdam" + ], + "generated_queries": [ + "What is the current lead time for ocean freight from Shanghai to Rotterdam?" + ], + "error": null + }, + { + "id": 20, + "difficulty": 1, + "domain": "aviation", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Next C-check due date for aircraft with registration F-GKXA" + ], + "generated_queries": [ + "When is the next C-check due for aircraft registration F-GKXA?" + ], + "error": null + }, + { + "id": 10, + "difficulty": 1, + "domain": "technology", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Recommended steps for migrating a MySQL database to PostgreSQL" + ], + "generated_queries": [ + "What are the recommended steps for migrating a MySQL database to PostgreSQL?" + ], + "error": null + }, + { + "id": 64, + "difficulty": 3, + "domain": "legal", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Differences between trade secrets and patents in terms of protection duration and disclosure requirements" + ], + "generated_queries": [ + "How do trade secrets and patents differ in terms of protection duration and disclosure requirements?" + ], + "error": null + }, + { + "id": 11, + "difficulty": 1, + "domain": "environment", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total energy consumption of the Paris headquarters in 2024" + ], + "generated_queries": [ + "What was the total energy consumption of the Paris headquarters in 2024?" + ], + "error": null + }, + { + "id": 6, + "difficulty": 1, + "domain": "HR", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Parental leave entitlement for employees in the French office" + ], + "generated_queries": [ + "What is the parental leave entitlement for employees in the French office in Paris?" + ], + "error": null + }, + { + "id": 4, + "difficulty": 1, + "domain": "engineering", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Maximum rainfall intensity for a 10-year return period in Toulouse" + ], + "generated_queries": [ + "What is the maximum rainfall intensity for a 10-year return period in Toulouse?" + ], + "error": null + }, + { + "id": 17, + "difficulty": 1, + "domain": "retail", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Safety stock level for the premium wireless headphones" + ], + "generated_queries": [ + "What is the safety stock level for the new line of premium wireless headphones launched last quarter?" + ], + "error": null + }, + { + "id": 8, + "difficulty": 1, + "domain": "marketing", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average engagement rate for sponsored Instagram posts targeting the 25-35 age group" + ], + "generated_queries": [ + "What is the average engagement rate for sponsored posts on Instagram targeting the 25-35 age group?" + ], + "error": null + }, + { + "id": 23, + "difficulty": 2, + "domain": "legal", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Data privacy requirements in the European Union", + "Data privacy requirements in Brazil" + ], + "generated_queries": [ + "What are the data privacy requirements in the EU?", + "What are the data privacy requirements in Brazil?" + ], + "error": null + }, + { + "id": 52, + "difficulty": 2, + "domain": "insurance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Number of flood insurance claims in 2023", + "Number of flood insurance claims in 2024" + ], + "generated_queries": [ + "Number of flood insurance claims in 2023", + "Number of flood insurance claims in 2024" + ], + "error": null + }, + { + "id": 7, + "difficulty": 1, + "domain": "education", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Number of students who graduated from the computer science program in the 2024-2025 academic year" + ], + "generated_queries": [ + "How many students graduated from the computer science program in the 2024-2025 academic year?" + ], + "error": null + }, + { + "id": 73, + "difficulty": 3, + "domain": "education", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Application of differentiated instruction in elementary and middle school classrooms" + ], + "generated_queries": [ + "How is differentiated instruction applied in elementary school classrooms?", + "How is differentiated instruction applied in middle school classrooms?" + ], + "error": null + }, + { + "id": 75, + "difficulty": 3, + "domain": "marketing", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Positioning strategy for new enterprise software targeting both IT decision-makers and end users" + ], + "generated_queries": [ + "How should enterprise software be positioned to IT decision-makers?", + "How should enterprise software be positioned to end users?" + ], + "error": null + }, + { + "id": 69, + "difficulty": 3, + "domain": "engineering", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Thermal conductivity and fire resistance ratings of ceramic fiber insulation" + ], + "generated_queries": [ + "What is the thermal conductivity rating of ceramic fiber insulation?", + "What is the fire resistance rating of ceramic fiber insulation?" + ], + "error": null + }, + { + "id": 79, + "difficulty": 3, + "domain": "real_estate", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Combined effect of rising interest rates and increasing housing supply on residential property investment returns" + ], + "generated_queries": [ + "How do rising interest rates affect residential property investment returns?", + "How does increasing housing supply affect residential property investment returns?" + ], + "error": null + }, + { + "id": 29, + "difficulty": 2, + "domain": "real_estate", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average commercial lease rate in downtown Manhattan", + "Average commercial lease rate in the Miami financial district" + ], + "generated_queries": [ + "What are the average commercial lease rates in downtown Manhattan", + "What are the average commercial lease rates in the Miami financial district" + ], + "error": null + }, + { + "id": 53, + "difficulty": 2, + "domain": "telecommunications", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Monthly churn rate for the Premium plan", + "Monthly churn rate for the Basic plan" + ], + "generated_queries": [ + "What is the monthly churn rate for the Premium subscription tier?", + "What is the monthly churn rate for the Basic subscription tier?" + ], + "error": null + }, + { + "id": 27, + "difficulty": 2, + "domain": "education", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Student outcome results for project-based learning", + "Student outcome results for traditional lecture-based instruction" + ], + "generated_queries": [ + "What are the student outcome results for project-based learning?", + "What are the student outcome results for traditional lecture-based instruction?" + ], + "error": null + }, + { + "id": 26, + "difficulty": 2, + "domain": "HR", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Employee satisfaction score for the London office", + "Employee satisfaction score for the Singapore office" + ], + "generated_queries": [ + "What are the employee satisfaction scores for the London office?", + "What are the employee satisfaction scores for the Singapore office?" + ], + "error": null + }, + { + "id": 62, + "difficulty": 3, + "domain": "healthcare", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Standard discharge procedure for elderly patients", + "Standard discharge procedure for pediatric patients" + ], + "generated_queries": [ + "What is the standard discharge procedure for elderly patients?", + "What is the standard discharge procedure for pediatric patients?" + ], + "error": null + }, + { + "id": 58, + "difficulty": 2, + "domain": "environment", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Annual water consumption at our manufacturing operations", + "Annual water consumption at our data center operations" + ], + "generated_queries": [ + "What is the annual water consumption at the manufacturing operations?", + "What is the annual water consumption at the data center operations?" + ], + "error": null + }, + { + "id": 32, + "difficulty": 2, + "domain": "logistics", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average delivery time for shipments to Southeast Asia", + "Average delivery time for shipments to Western Europe" + ], + "generated_queries": [ + "What are the average delivery times for shipments to Southeast Asia?", + "What are the average delivery times for shipments to Western Europe?" + ], + "error": null + }, + { + "id": 35, + "difficulty": 2, + "domain": "manufacturing", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Overall defect rate for Assembly Line A in Q2 2026", + "Overall defect rate for Assembly Line B in Q2 2026" + ], + "generated_queries": [ + "What are the overall defect rates for Assembly Line A this quarter?", + "What are the overall defect rates for Assembly Line B this quarter?" + ], + "error": null + }, + { + "id": 49, + "difficulty": 2, + "domain": "agriculture", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average soil nitrogen levels in our soybean fields", + "Average soil nitrogen levels in our cotton fields" + ], + "generated_queries": [ + "What are the average soil nitrogen levels in the soybean fields?", + "What are the average soil nitrogen levels in the cotton fields?" + ], + "error": null + }, + { + "id": 74, + "difficulty": 3, + "domain": "aviation", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query 'EASA certification requirements for commercial drone operations' is not covered, as the generated sub-query only compares EASA and FAA requirements without addressing EASA requirements specifically.", + "expected_queries": [ + "EASA certification requirements for commercial drone operations", + "FAA certification requirements for commercial drone operations" + ], + "generated_queries": [ + "What are the differences between EASA and FAA certification requirements for commercial drone operations?" + ], + "error": null + }, + { + "id": 77, + "difficulty": 3, + "domain": "agriculture", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Recommended treatments for aphids in corn crops", + "Recommended treatments for rootworm in corn crops" + ], + "generated_queries": [ + "What are the recommended treatments for aphids in corn crops?", + "What are the recommended treatments for rootworm in corn crops?" + ], + "error": null + }, + { + "id": 39, + "difficulty": 2, + "domain": "insurance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average claim resolution time for auto insurance claims", + "Average claim resolution time for home insurance claims" + ], + "generated_queries": [ + "What is the average claim resolution time for auto insurance claims?", + "What is the average claim resolution time for home insurance claims?" + ], + "error": null + }, + { + "id": 34, + "difficulty": 2, + "domain": "energy", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Solar energy generation at our farm in summer 2024", + "Solar energy generation at our farm in winter 2024" + ], + "generated_queries": [ + "Solar energy generation at our farm in summer 2024", + "Solar energy generation at our farm in winter 2024" + ], + "error": null + }, + { + "id": 15, + "difficulty": 1, + "domain": "manufacturing", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Rework procedure for solder joint failures on Line C" + ], + "generated_queries": [ + "What is the rework procedure for solder joint failures on Line C when batches exceed the acceptable defect threshold of 0.5% of joints per batch?" + ], + "error": null + }, + { + "id": 46, + "difficulty": 2, + "domain": "education", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Number of students enrolled in the fall 2024 semester", + "Number of students enrolled in the spring 2025 semester" + ], + "generated_queries": [ + "How many students enrolled in the fall 2024 semester", + "How many students enrolled in the spring 2025 semester" + ], + "error": null + }, + { + "id": 36, + "difficulty": 2, + "domain": "public_policy", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current rent control measures in Berlin", + "Current rent control measures in New York City" + ], + "generated_queries": [ + "What are the rent control measures currently in place in Berlin?", + "What are the rent control measures currently in place in New York City?" + ], + "error": null + }, + { + "id": 33, + "difficulty": 2, + "domain": "agriculture", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Per-hectare yield for wheat in the 2025 harvest season", + "Per-hectare yield for corn in the 2025 harvest season" + ], + "generated_queries": [ + "What were the per-hectare yields for wheat this harvest season?", + "What were the per-hectare yields for corn this harvest season?" + ], + "error": null + }, + { + "id": 47, + "difficulty": 2, + "domain": "marketing", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Net Promoter Score for the ProLine series", + "Net Promoter Score for the EcoLine series" + ], + "generated_queries": [ + "What is the Net Promoter Score for the ProLine series?", + "What is the Net Promoter Score for the EcoLine series?" + ], + "error": null + }, + { + "id": 24, + "difficulty": 2, + "domain": "engineering", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Fatigue resistance rating for carbon steel under cyclic loading", + "Fatigue resistance rating for stainless steel under cyclic loading" + ], + "generated_queries": [ + "What is the fatigue resistance rating for carbon steel under cyclic loading?", + "What is the fatigue resistance rating for stainless steel under cyclic loading?" + ], + "error": null + }, + { + "id": 72, + "difficulty": 3, + "domain": "manufacturing", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Root causes of surface bubbling on the automotive paint line", + "Root causes of adhesion failure on the automotive paint line" + ], + "generated_queries": [ + "What are the root causes of surface bubbling on our automotive paint line?", + "What are the root causes of adhesion failure on our automotive paint line?" + ], + "error": null + }, + { + "id": 66, + "difficulty": 3, + "domain": "HR", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Onboarding process for sales new hires", + "Onboarding process for customer success new hires" + ], + "generated_queries": [ + "What does the onboarding process look like for sales new hires?", + "What does the onboarding process look like for customer success new hires?" + ], + "error": null + }, + { + "id": 25, + "difficulty": 2, + "domain": "science", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average Arctic sea ice extent in summer 2015", + "Average Arctic sea ice extent in summer 2020" + ], + "generated_queries": [ + "What was the average Arctic sea ice extent in summer 2015", + "What was the average Arctic sea ice extent in summer 2020" + ], + "error": null + }, + { + "id": 51, + "difficulty": 2, + "domain": "retail", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Revenue per square foot for East Coast stores in Q1 2026", + "Revenue per square foot for West Coast stores in Q1 2026" + ], + "generated_queries": [ + "What was the revenue per square foot for our East Coast stores last quarter?", + "What was the revenue per square foot for our West Coast stores last quarter?" + ], + "error": null + }, + { + "id": 22, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total revenue in Q1 2024", + "Total revenue in Q2 2024" + ], + "generated_queries": [ + "What was the total revenue in Q1 of 2024?", + "What was the total revenue in Q2 of 2024?" + ], + "error": null + }, + { + "id": 54, + "difficulty": 2, + "domain": "public_policy", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current minimum wage in France", + "Current minimum wage in Japan", + "Current minimum wage in Australia" + ], + "generated_queries": [ + "What is the current minimum wage in France?", + "What is the current minimum wage in Japan?", + "What is the current minimum wage in Australia?" + ], + "error": null + }, + { + "id": 38, + "difficulty": 2, + "domain": "telecommunications", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average 5G download speeds in urban areas on our network", + "Average 5G download speeds in rural areas on our network" + ], + "generated_queries": [ + "What are the average 5G download speeds in urban areas on our network?", + "What are the average 5G download speeds in rural areas on our network?" + ], + "error": null + }, + { + "id": 56, + "difficulty": 2, + "domain": "science", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Clinical trial results from the past 90 days (since 2026-01-17) for CRISPR treatments for sickle cell disease", + "Clinical trial results from the past 90 days (since 2026-01-17) for CRISPR treatments for beta-thalassemia" + ], + "generated_queries": [ + "What are the latest clinical trial results for CRISPR treatments for sickle cell disease?", + "What are the latest clinical trial results for CRISPR treatments for beta-thalassemia?" + ], + "error": null + }, + { + "id": 71, + "difficulty": 3, + "domain": "public_policy", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Short-term and long-term effects of the Finnish UBI pilot on employment and well-being" + ], + "generated_queries": [ + "What were the short-term effects of the Finnish UBI pilot on employment and well-being?", + "What were the long-term effects of the Finnish UBI pilot on employment and well-being?" + ], + "error": null + }, + { + "id": 31, + "difficulty": 2, + "domain": "environment", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total CO2 emissions for the main campus in Q1 2026", + "Total CO2 emissions for the main campus in Q3 2026" + ], + "generated_queries": [ + "What were the total CO2 emissions for the main campus in Q1 of this year?", + "What were the total CO2 emissions for the main campus in Q3 of this year?" + ], + "error": null + }, + { + "id": 21, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Returns on the S&P 500 index fund in 2025", + "Returns on the emerging markets fund in 2025" + ], + "generated_queries": [ + "What was the return of the S&P 500 index fund in 2025?", + "What was the return of the emerging markets fund in 2025?" + ], + "error": null + }, + { + "id": 68, + "difficulty": 3, + "domain": "retail", + "n_expected_queries": 1, + "n_generated_queries": 3, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Typical comparison of product return rates between summer and winter seasons" + ], + "generated_queries": [ + "How do product return rates typically compare between the summer and winter seasons?", + "What are the average product return rates in January?", + "What are the average product return rates in summer months?" + ], + "error": null + }, + { + "id": 30, + "difficulty": 2, + "domain": "technology", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query about serverless pricing and cold start latency for AWS Lambda is not fully covered, as the generated sub-query only compares AWS Lambda with Google Cloud Functions without focusing specifically on AWS Lambda alone.", + "expected_queries": [ + "Serverless pricing and cold start latency for AWS Lambda", + "Serverless pricing and cold start latency for Google Cloud Functions" + ], + "generated_queries": [ + "Recommended way to manage secrets across environments in Kubernetes", + "Comparison of serverless pricing and cold start latency of AWS Lambda and Google Cloud Functions" + ], + "error": null + }, + { + "id": 2, + "difficulty": 1, + "domain": "healthcare", + "n_expected_queries": 1, + "n_generated_queries": 3, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Recommended flu vaccination schedule for adults over 65" + ], + "generated_queries": [ + "What is the current hospital policy on visitor hours?", + "What is the minimum staffing ratio required for the night shift?", + "What is the recommended flu vaccination schedule for adults over 65?" + ], + "error": null + }, + { + "id": 78, + "difficulty": 3, + "domain": "telecommunications", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Coverage and throughput characteristics of sub-6 GHz bands for 5G deployment", + "Coverage and throughput characteristics of mmWave bands for 5G deployment" + ], + "generated_queries": [ + "What are the coverage trade-offs between sub-6 GHz and mmWave bands for 5G deployment?", + "What are the throughput trade-offs between sub-6 GHz and mmWave bands for 5G deployment?" + ], + "error": null + }, + { + "id": 43, + "difficulty": 2, + "domain": "energy", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Percentage of electricity from renewables at the Texas plant", + "Percentage of electricity from renewables at the Ohio plant", + "Percentage of electricity from renewables at the Oregon plant" + ], + "generated_queries": [ + "What percentage of electricity comes from renewables at the Texas plant?", + "What percentage of electricity comes from renewables at the Ohio plant?", + "What percentage of electricity comes from renewables at the Oregon plant?" + ], + "error": null + }, + { + "id": 28, + "difficulty": 2, + "domain": "marketing", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Click-through rate for the summer campaign on Facebook", + "Click-through rate for the summer campaign on Instagram", + "Click-through rate for the summer campaign on Google Ads" + ], + "generated_queries": [ + "What was the click-through rate for the summer campaign on Facebook?", + "What was the click-through rate for the summer campaign on Instagram?", + "What was the click-through rate for the summer campaign on Google Ads?" + ], + "error": null + }, + { + "id": 76, + "difficulty": 3, + "domain": "insurance", + "n_expected_queries": 1, + "n_generated_queries": 3, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Joint effect of flood zone classification and building age on commercial property insurance premiums" + ], + "generated_queries": [ + "How does flood zone classification affect commercial property insurance premiums?", + "How does building age affect commercial property insurance premiums?", + "How do flood zone classification and building age jointly affect commercial property insurance premiums?" + ], + "error": null + }, + { + "id": 63, + "difficulty": 3, + "domain": "technology", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-queries about the main architectural features of Kafka and RabbitMQ are not covered by any generated sub-query, as the generated query focuses on gRPC vs REST and Kafka vs RabbitMQ differences, not the specific features of each system.", + "expected_queries": [ + "Main architectural features of Kafka", + "Main architectural features of RabbitMQ" + ], + "generated_queries": [ + "What is gRPC and what are the main differences between gRPC and REST?", + "What are the main architectural differences between Kafka and RabbitMQ?" + ], + "error": null + }, + { + "id": 37, + "difficulty": 2, + "domain": "retail", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total online sales figures for Black Friday", + "Total online sales figures for Cyber Monday", + "Total online sales figures for Christmas week" + ], + "generated_queries": [ + "What were the total online sales figures for Black Friday in 2025?", + "What were the total online sales figures for Cyber Monday in 2025?", + "What were the total online sales figures for Christmas week in 2025?" + ], + "error": null + }, + { + "id": 80, + "difficulty": 3, + "domain": "energy", + "n_expected_queries": 1, + "n_generated_queries": 4, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Comparison of lithium-ion battery systems and pumped hydro storage in terms of scalability and round-trip efficiency" + ], + "generated_queries": [ + "What is the scalability of lithium-ion battery systems?", + "What is the round-trip efficiency of lithium-ion battery systems?", + "What is the scalability of pumped hydro storage?", + "What is the round-trip efficiency of pumped hydro storage?" + ], + "error": null + }, + { + "id": 42, + "difficulty": 2, + "domain": "healthcare", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average ER wait time at the Boston hospital", + "Average ER wait time at the Philadelphia hospital", + "Cardiology appointment lead time at the Boston hospital", + "Cardiology appointment lead time at the Philadelphia hospital" + ], + "generated_queries": [ + "What are the average ER wait times at the Boston hospital?", + "What are the average ER wait times at the Philadelphia hospital?", + "What are the cardiology appointment lead times at the Boston hospital?", + "What are the cardiology appointment lead times at the Philadelphia hospital?" + ], + "error": null + }, + { + "id": 61, + "difficulty": 3, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query 'Operating costs in 2024' is not covered by the generated sub-query, as it does not explicitly target the year 2024.", + "expected_queries": [ + "Operating costs in 2024", + "Operating costs in 2025" + ], + "generated_queries": [ + "Evolution of the company's operating costs over the past two years" + ], + "error": null + }, + { + "id": 70, + "difficulty": 3, + "domain": "logistics", + "n_expected_queries": 2, + "n_generated_queries": 4, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Cost and lead time for air freight transatlantic shipments", + "Cost and lead time for sea freight transatlantic shipments" + ], + "generated_queries": [ + "What is the cost of air freight for transatlantic shipments?", + "What is the cost of sea freight for transatlantic shipments?", + "What is the lead time of air freight for transatlantic shipments?", + "What is the lead time of sea freight for transatlantic shipments?" + ], + "error": null + }, + { + "id": 65, + "difficulty": 3, + "domain": "environment", + "n_expected_queries": 5, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query for Amazon deforestation rates in 2025 is not covered by the generated sub-query, which only refers to a general trend over the last five years without specifying the exact year 2025.", + "expected_queries": [ + "Amazon deforestation rates in 2021", + "Amazon deforestation rates in 2022", + "Amazon deforestation rates in 2023", + "Amazon deforestation rates in 2024", + "Amazon deforestation rates in 2025" + ], + "generated_queries": [ + "Trend in Amazon deforestation rates over the last five years" + ], + "error": null + }, + { + "id": 55, + "difficulty": 2, + "domain": "aviation", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Load factor for the New York-London route", + "Load factor for the Los Angeles-Tokyo route", + "Fuel cost per seat-mile for the New York-London route", + "Fuel cost per seat-mile for the Los Angeles-Tokyo route" + ], + "generated_queries": [ + "What is the load factor for the New York-London route?", + "What is the fuel cost per seat-mile for the New York-London route?", + "What is the load factor for the Los Angeles-Tokyo route?", + "What is the fuel cost per seat-mile for the Los Angeles-Tokyo route?" + ], + "error": null + }, + { + "id": 57, + "difficulty": 2, + "domain": "HR", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Voluntary turnover rate for engineering in 2023", + "Voluntary turnover rate for engineering in 2024", + "Voluntary turnover rate for marketing in 2023", + "Voluntary turnover rate for marketing in 2024" + ], + "generated_queries": [ + "What were the voluntary turnover rates for engineering in 2023?", + "What were the voluntary turnover rates for engineering in 2024?", + "What were the voluntary turnover rates for marketing in 2023?", + "What were the voluntary turnover rates for marketing in 2024?" + ], + "error": null + }, + { + "id": 41, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Returns of the Growth Fund in Q3 2024", + "Returns of the Growth Fund in Q4 2024", + "Returns of the Value Fund in Q3 2024", + "Returns of the Value Fund in Q4 2024" + ], + "generated_queries": [ + "What were the returns of the Growth Fund in Q3 of 2024?", + "What were the returns of the Growth Fund in Q4 of 2024?", + "What were the returns of the Value Fund in Q3 of 2024?", + "What were the returns of the Value Fund in Q4 of 2024?" + ], + "error": null + }, + { + "id": 48, + "difficulty": 2, + "domain": "logistics", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Daily order processing volume at the Memphis hub in March 2025", + "Daily order processing volume at the Memphis hub in April 2025", + "Daily order processing volume at the Rotterdam hub in March 2025", + "Daily order processing volume at the Rotterdam hub in April 2025" + ], + "generated_queries": [ + "What was the daily order processing volume at the Memphis hub in March 2025?", + "What was the daily order processing volume at the Memphis hub in April 2025?", + "What was the daily order processing volume at the Rotterdam hub in March 2025?", + "What was the daily order processing volume at the Rotterdam hub in April 2025?" + ], + "error": null + }, + { + "id": 45, + "difficulty": 2, + "domain": "manufacturing", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Overall equipment effectiveness for the Shenzhen factory in H1 2024", + "Overall equipment effectiveness for the Shenzhen factory in H2 2024", + "Overall equipment effectiveness for the Munich factory in H1 2024", + "Overall equipment effectiveness for the Munich factory in H2 2024" + ], + "generated_queries": [ + "What was the overall equipment effectiveness for the Shenzhen factory in H1 of 2024?", + "What was the overall equipment effectiveness for the Munich factory in H1 of 2024?", + "What was the overall equipment effectiveness for the Shenzhen factory in H2 of 2024?", + "What was the overall equipment effectiveness for the Munich factory in H2 of 2024?" + ], + "error": null + }, + { + "id": 44, + "difficulty": 2, + "domain": "legal", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Antitrust cases filed against Google between 2023 and 2026", + "Antitrust cases filed against Apple between 2023 and 2026" + ], + "generated_queries": [ + "What antitrust cases have been filed against Google in the last three years?", + "What antitrust cases have been filed against Apple in the last three years?" + ], + "error": null + }, + { + "id": 60, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query for financial documents from January or February 2025 is not covered, as the generated sub-query refers to 'last year' without specifying the exact months or confirming they include January and February.", + "expected_queries": [ + "Financial documents from January or February 2025", + "Financial documents from April to December 2025" + ], + "generated_queries": [ + "Show the financial documents from last year except for those filed in March" + ], + "error": null + }, + { + "id": 40, + "difficulty": 2, + "domain": "aviation", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "On-time departure rate in January 2025", + "On-time departure rate in February 2025" + ], + "generated_queries": [ + "What was the on-time departure rate in January 2025?", + "What was the on-time departure rate in February 2025?" + ], + "error": null + }, + { + "id": 50, + "difficulty": 2, + "domain": "technology", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "API uptime percentage for January 2025", + "API uptime percentage for February 2025", + "API uptime percentage for March 2025" + ], + "generated_queries": [ + "What was the API uptime percentage for January 2025?", + "What was the API uptime percentage for February 2025?", + "What was the API uptime percentage for March 2025?" + ], + "error": null + }, + { + "id": 59, + "difficulty": 2, + "domain": "real_estate", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Median home sale price in the Austin metro area in Q1 2025", + "Median home sale price in the Austin metro area in Q2 2025", + "Median home sale price in the Austin metro area in Q3 2025" + ], + "generated_queries": [ + "What was the median home sale price in the Austin metro area in Q1 of 2025?", + "What was the median home sale price in the Austin metro area in Q2 of 2025?", + "What was the median home sale price in the Austin metro area in Q3 of 2025?" + ], + "error": null + } + ] + }, + { + "model_name": "Qwen3-VL-8B-Instruct-FP8", + "timestamp": "2026-04-19T16:24:30.905311", + "prompt_path": "prompts/query_contextualizer_tmpl_v0.txt", + "dataset_path": "datasets/query_decomposition.json", + "judge_model": "Qwen3-VL-8B-Instruct-FP8", + "total": 80, + "errors": 0, + "count_match_passed": 58, + "count_match_accuracy": 0.725, + "semantic_coverage_passed": 69, + "semantic_coverage_accuracy": 0.8625, + "by_difficulty": { + "1": { + "total": 20, + "errors": 0, + "count_match_passed": 20, + "semantic_coverage_passed": 20, + "count_match_accuracy": 1.0, + "semantic_coverage_accuracy": 1.0 + }, + "2": { + "total": 40, + "errors": 0, + "count_match_passed": 28, + "semantic_coverage_passed": 35, + "count_match_accuracy": 0.7, + "semantic_coverage_accuracy": 0.875 + }, + "3": { + "total": 20, + "errors": 0, + "count_match_passed": 10, + "semantic_coverage_passed": 14, + "count_match_accuracy": 0.5, + "semantic_coverage_accuracy": 0.7 + } + }, + "cases": [ + { + "id": 79, + "difficulty": 3, + "domain": "real_estate", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Combined effect of rising interest rates and increasing housing supply on residential property investment returns" + ], + "generated_queries": [ + "How do rising interest rates and increasing housing supply together affect residential property investment returns?" + ], + "error": null + }, + { + "id": 15, + "difficulty": 1, + "domain": "manufacturing", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Rework procedure for solder joint failures on Line C" + ], + "generated_queries": [ + "What is the rework procedure for solder joint failures on Line C?" + ], + "error": null + }, + { + "id": 14, + "difficulty": 1, + "domain": "energy", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Capacity factor of the wind farm in Normandy" + ], + "generated_queries": [ + "What is the capacity factor of the wind farm located in Normandy?" + ], + "error": null + }, + { + "id": 2, + "difficulty": 1, + "domain": "healthcare", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Recommended flu vaccination schedule for adults over 65" + ], + "generated_queries": [ + "What is the recommended flu vaccination schedule for adults over 65 years old?" + ], + "error": null + }, + { + "id": 16, + "difficulty": 1, + "domain": "public_policy", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current carbon tax rate per ton of CO2 in France" + ], + "generated_queries": [ + "What is the current carbon tax rate per ton of CO2 in France?" + ], + "error": null + }, + { + "id": 76, + "difficulty": 3, + "domain": "insurance", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Joint effect of flood zone classification and building age on commercial property insurance premiums" + ], + "generated_queries": [ + "How do flood zone classification and building age jointly affect commercial property insurance premiums?" + ], + "error": null + }, + { + "id": 20, + "difficulty": 1, + "domain": "aviation", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Next C-check due date for aircraft with registration F-GKXA" + ], + "generated_queries": [ + "When is the next C-check due for the aircraft with registration F-GKXA?" + ], + "error": null + }, + { + "id": 9, + "difficulty": 1, + "domain": "real_estate", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average rental yield for residential properties in Lyon" + ], + "generated_queries": [ + "What is the average rental yield for residential properties in Lyon?" + ], + "error": null + }, + { + "id": 69, + "difficulty": 3, + "domain": "engineering", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Thermal conductivity and fire resistance ratings of ceramic fiber insulation" + ], + "generated_queries": [ + "What are the thermal conductivity and fire resistance ratings of ceramic fiber insulation?" + ], + "error": null + }, + { + "id": 3, + "difficulty": 1, + "domain": "legal", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Statute of limitations for breach of contract in California" + ], + "generated_queries": [ + "What is the statute of limitations for breach of contract in California?" + ], + "error": null + }, + { + "id": 5, + "difficulty": 1, + "domain": "science", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average mutation rate of human mitochondrial DNA" + ], + "generated_queries": [ + "What is the average mutation rate of human mitochondrial DNA?" + ], + "error": null + }, + { + "id": 11, + "difficulty": 1, + "domain": "environment", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total energy consumption of the Paris headquarters in 2024" + ], + "generated_queries": [ + "What was the total energy consumption of the Paris headquarters in 2024?" + ], + "error": null + }, + { + "id": 10, + "difficulty": 1, + "domain": "technology", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Recommended steps for migrating a MySQL database to PostgreSQL" + ], + "generated_queries": [ + "What are the recommended steps for migrating a MySQL database to PostgreSQL?" + ], + "error": null + }, + { + "id": 66, + "difficulty": 3, + "domain": "HR", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Onboarding process for sales new hires", + "Onboarding process for customer success new hires" + ], + "generated_queries": [ + "What is the onboarding process for new hires in sales and customer success roles?" + ], + "error": null + }, + { + "id": 67, + "difficulty": 3, + "domain": "science", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Known interactions between metformin and ibuprofen" + ], + "generated_queries": [ + "What are the known drug interactions between metformin and ibuprofen?" + ], + "error": null + }, + { + "id": 64, + "difficulty": 3, + "domain": "legal", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Differences between trade secrets and patents in terms of protection duration and disclosure requirements" + ], + "generated_queries": [ + "How do trade secrets and patents differ in terms of protection duration and disclosure requirements?" + ], + "error": null + }, + { + "id": 37, + "difficulty": 2, + "domain": "retail", + "n_expected_queries": 3, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total online sales figures for Black Friday", + "Total online sales figures for Cyber Monday", + "Total online sales figures for Christmas week" + ], + "generated_queries": [ + "What were the total online sales figures for Black Friday, Cyber Monday, and Christmas week?" + ], + "error": null + }, + { + "id": 30, + "difficulty": 2, + "domain": "technology", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Serverless pricing and cold start latency for AWS Lambda", + "Serverless pricing and cold start latency for Google Cloud Functions" + ], + "generated_queries": [ + "Compare the serverless pricing models and cold start latency performance of AWS Lambda and Google Cloud Functions." + ], + "error": null + }, + { + "id": 75, + "difficulty": 3, + "domain": "marketing", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Positioning strategy for new enterprise software targeting both IT decision-makers and end users" + ], + "generated_queries": [ + "How should we position our new enterprise software to appeal to both IT decision-makers and end users?" + ], + "error": null + }, + { + "id": 1, + "difficulty": 1, + "domain": "finance", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total revenue in Q3 2024" + ], + "generated_queries": [ + "What was the total revenue for the company in Q3 2024?" + ], + "error": null + }, + { + "id": 8, + "difficulty": 1, + "domain": "marketing", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average engagement rate for sponsored Instagram posts targeting the 25-35 age group" + ], + "generated_queries": [ + "What is the average engagement rate for sponsored posts on Instagram targeting the 25-35 age group?" + ], + "error": null + }, + { + "id": 80, + "difficulty": 3, + "domain": "energy", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Comparison of lithium-ion battery systems and pumped hydro storage in terms of scalability and round-trip efficiency" + ], + "generated_queries": [ + "Compare lithium-ion battery systems with pumped hydro storage in terms of scalability and round-trip efficiency." + ], + "error": null + }, + { + "id": 13, + "difficulty": 1, + "domain": "agriculture", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Optimal irrigation frequency for winter wheat during the tillering stage" + ], + "generated_queries": [ + "What is the optimal irrigation frequency for winter wheat during the tillering stage?" + ], + "error": null + }, + { + "id": 19, + "difficulty": 1, + "domain": "insurance", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Loss ratio for the commercial property insurance line" + ], + "generated_queries": [ + "What is the loss ratio for the commercial property line as reported in the annual report?" + ], + "error": null + }, + { + "id": 78, + "difficulty": 3, + "domain": "telecommunications", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Coverage and throughput characteristics of sub-6 GHz bands for 5G deployment", + "Coverage and throughput characteristics of mmWave bands for 5G deployment" + ], + "generated_queries": [ + "What are the coverage and throughput trade-offs between sub-6 GHz and mmWave frequency bands for 5G network deployment?" + ], + "error": null + }, + { + "id": 32, + "difficulty": 2, + "domain": "logistics", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average delivery time for shipments to Southeast Asia", + "Average delivery time for shipments to Western Europe" + ], + "generated_queries": [ + "What are the average delivery times for shipments to Southeast Asia?", + "What are the average delivery times for shipments to Western Europe?" + ], + "error": null + }, + { + "id": 77, + "difficulty": 3, + "domain": "agriculture", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Recommended treatments for aphids in corn crops", + "Recommended treatments for rootworm in corn crops" + ], + "generated_queries": [ + "What are the recommended treatments for aphids in corn crops?", + "What are the recommended treatments for rootworm in corn crops?" + ], + "error": null + }, + { + "id": 73, + "difficulty": 3, + "domain": "education", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Application of differentiated instruction in elementary and middle school classrooms" + ], + "generated_queries": [ + "How is differentiated instruction applied in elementary school classrooms?", + "How is differentiated instruction applied in middle school classrooms?" + ], + "error": null + }, + { + "id": 63, + "difficulty": 3, + "domain": "technology", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query about the main architectural features of Kafka is not covered, as the generated sub-query focuses on differences between Kafka and RabbitMQ rather than detailing Kafka's features alone.", + "expected_queries": [ + "Main architectural features of Kafka", + "Main architectural features of RabbitMQ" + ], + "generated_queries": [ + "What are the main architectural differences between Apache Kafka and RabbitMQ?" + ], + "error": null + }, + { + "id": 54, + "difficulty": 2, + "domain": "public_policy", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current minimum wage in France", + "Current minimum wage in Japan", + "Current minimum wage in Australia" + ], + "generated_queries": [ + "What is the current minimum wage in France?", + "What is the current minimum wage in Japan?", + "What is the current minimum wage in Australia?" + ], + "error": null + }, + { + "id": 26, + "difficulty": 2, + "domain": "HR", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Employee satisfaction score for the London office", + "Employee satisfaction score for the Singapore office" + ], + "generated_queries": [ + "What are the employee satisfaction scores for the London office?", + "What are the employee satisfaction scores for the Singapore office?" + ], + "error": null + }, + { + "id": 29, + "difficulty": 2, + "domain": "real_estate", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average commercial lease rate in downtown Manhattan", + "Average commercial lease rate in the Miami financial district" + ], + "generated_queries": [ + "What are the average commercial lease rates in downtown Manhattan?", + "What are the average commercial lease rates in the Miami financial district?" + ], + "error": null + }, + { + "id": 49, + "difficulty": 2, + "domain": "agriculture", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query 'Average soil nitrogen levels in our cotton fields' is not covered by the generated sub-query, which only compares soybean and cotton fields without isolating cotton field data.", + "expected_queries": [ + "Average soil nitrogen levels in our soybean fields", + "Average soil nitrogen levels in our cotton fields" + ], + "generated_queries": [ + "What are the average soil nitrogen levels in soybean fields compared to cotton fields?" + ], + "error": null + }, + { + "id": 62, + "difficulty": 3, + "domain": "healthcare", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Standard discharge procedure for elderly patients", + "Standard discharge procedure for pediatric patients" + ], + "generated_queries": [ + "What is the standard discharge procedure for elderly patients in a hospital setting?", + "What is the standard discharge procedure for pediatric patients in a hospital setting?" + ], + "error": null + }, + { + "id": 4, + "difficulty": 1, + "domain": "engineering", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Maximum rainfall intensity for a 10-year return period in Toulouse" + ], + "generated_queries": [ + "What is the maximum rainfall intensity for a 10-year return period in Toulouse, France?" + ], + "error": null + }, + { + "id": 17, + "difficulty": 1, + "domain": "retail", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Safety stock level for the premium wireless headphones" + ], + "generated_queries": [ + "What is the safety stock level for the new line of premium wireless headphones launched last quarter?" + ], + "error": null + }, + { + "id": 18, + "difficulty": 1, + "domain": "telecommunications", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Packet loss rate on the fiber backbone between Paris and Marseille" + ], + "generated_queries": [ + "What is the packet loss rate on the fiber backbone between Paris and Marseille?" + ], + "error": null + }, + { + "id": 38, + "difficulty": 2, + "domain": "telecommunications", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average 5G download speeds in urban areas on our network", + "Average 5G download speeds in rural areas on our network" + ], + "generated_queries": [ + "What are the average 5G download speeds in urban areas on our network?", + "What are the average 5G download speeds in rural areas on our network?" + ], + "error": null + }, + { + "id": 12, + "difficulty": 1, + "domain": "logistics", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current lead time for ocean freight from Shanghai to Rotterdam" + ], + "generated_queries": [ + "What is the current lead time for ocean freight shipments from Shanghai to Rotterdam?" + ], + "error": null + }, + { + "id": 23, + "difficulty": 2, + "domain": "legal", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Data privacy requirements in the European Union", + "Data privacy requirements in Brazil" + ], + "generated_queries": [ + "What are the data privacy requirements in the European Union?", + "What are the data privacy requirements in Brazil?" + ], + "error": null + }, + { + "id": 27, + "difficulty": 2, + "domain": "education", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Student outcome results for project-based learning", + "Student outcome results for traditional lecture-based instruction" + ], + "generated_queries": [ + "What are the student outcome results for project-based learning compared to traditional lecture-based instruction?" + ], + "error": null + }, + { + "id": 42, + "difficulty": 2, + "domain": "healthcare", + "n_expected_queries": 4, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average ER wait time at the Boston hospital", + "Average ER wait time at the Philadelphia hospital", + "Cardiology appointment lead time at the Boston hospital", + "Cardiology appointment lead time at the Philadelphia hospital" + ], + "generated_queries": [ + "What are the average emergency room wait times and cardiology appointment lead times at the Boston hospital?", + "What are the average emergency room wait times and cardiology appointment lead times at the Philadelphia hospital?" + ], + "error": null + }, + { + "id": 28, + "difficulty": 2, + "domain": "marketing", + "n_expected_queries": 3, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Click-through rate for the summer campaign on Facebook", + "Click-through rate for the summer campaign on Instagram", + "Click-through rate for the summer campaign on Google Ads" + ], + "generated_queries": [ + "What was the click-through rate for the summer campaign on Facebook, Instagram, and Google Ads?" + ], + "error": null + }, + { + "id": 55, + "difficulty": 2, + "domain": "aviation", + "n_expected_queries": 4, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Load factor for the New York-London route", + "Load factor for the Los Angeles-Tokyo route", + "Fuel cost per seat-mile for the New York-London route", + "Fuel cost per seat-mile for the Los Angeles-Tokyo route" + ], + "generated_queries": [ + "What are the load factor and fuel cost per seat-mile for the New York-London route?", + "What are the load factor and fuel cost per seat-mile for the Los Angeles-Tokyo route?" + ], + "error": null + }, + { + "id": 6, + "difficulty": 1, + "domain": "HR", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Parental leave entitlement for employees in the French office" + ], + "generated_queries": [ + "What is the parental leave entitlement for employees in the French office of the company, considering national labor frameworks in France?" + ], + "error": null + }, + { + "id": 34, + "difficulty": 2, + "domain": "energy", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query 'Solar energy generation at our farm in summer 2024' is not covered by any generated sub-query, as the generated query only compares summer and winter data without isolating either season.", + "expected_queries": [ + "Solar energy generation at our farm in summer 2024", + "Solar energy generation at our farm in winter 2024" + ], + "generated_queries": [ + "Compare solar energy generation at the farm between summer 2024 and winter 2024." + ], + "error": null + }, + { + "id": 7, + "difficulty": 1, + "domain": "education", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Number of students who graduated from the computer science program in the 2024-2025 academic year" + ], + "generated_queries": [ + "How many students graduated from the computer science program in the 2024-2025 academic year?" + ], + "error": null + }, + { + "id": 65, + "difficulty": 3, + "domain": "environment", + "n_expected_queries": 5, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query 'Amazon deforestation rates in 2021' is not covered by the generated sub-query, as it does not specify a particular year but rather asks for a trend over a period.", + "expected_queries": [ + "Amazon deforestation rates in 2021", + "Amazon deforestation rates in 2022", + "Amazon deforestation rates in 2023", + "Amazon deforestation rates in 2024", + "Amazon deforestation rates in 2025" + ], + "generated_queries": [ + "What has been the trend in Amazon deforestation rates over the last five years?" + ], + "error": null + }, + { + "id": 39, + "difficulty": 2, + "domain": "insurance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average claim resolution time for auto insurance claims", + "Average claim resolution time for home insurance claims" + ], + "generated_queries": [ + "What is the average claim resolution time for auto insurance claims?", + "What is the average claim resolution time for home insurance claims?" + ], + "error": null + }, + { + "id": 72, + "difficulty": 3, + "domain": "manufacturing", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query about root causes of adhesion failure is not covered by the generated sub-query, which combines both surface bubbling and adhesion failure into a single query without addressing them separately.", + "expected_queries": [ + "Root causes of surface bubbling on the automotive paint line", + "Root causes of adhesion failure on the automotive paint line" + ], + "generated_queries": [ + "What are the root causes of surface bubbling and adhesion failure defects in the automotive paint line?" + ], + "error": null + }, + { + "id": 68, + "difficulty": 3, + "domain": "retail", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Typical comparison of product return rates between summer and winter seasons" + ], + "generated_queries": [ + "How do product return rates typically compare between the summer and winter seasons in retail?" + ], + "error": null + }, + { + "id": 43, + "difficulty": 2, + "domain": "energy", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Percentage of electricity from renewables at the Texas plant", + "Percentage of electricity from renewables at the Ohio plant", + "Percentage of electricity from renewables at the Oregon plant" + ], + "generated_queries": [ + "What percentage of electricity comes from renewable sources at the Texas plant?", + "What percentage of electricity comes from renewable sources at the Ohio plant?", + "What percentage of electricity comes from renewable sources at the Oregon plant?" + ], + "error": null + }, + { + "id": 70, + "difficulty": 3, + "domain": "logistics", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query about the cost and lead time for air freight transatlantic shipments is not covered, nor is the one for sea freight transatlantic shipments. The generated sub-query only compares the two, without addressing each individually.", + "expected_queries": [ + "Cost and lead time for air freight transatlantic shipments", + "Cost and lead time for sea freight transatlantic shipments" + ], + "generated_queries": [ + "What are the cost and lead time differences between air freight and sea freight for transatlantic shipments?" + ], + "error": null + }, + { + "id": 44, + "difficulty": 2, + "domain": "legal", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query about antitrust cases filed against Apple between 2023 and 2026 is not covered, as the generated sub-query does not specifically target Apple or isolate its cases.", + "expected_queries": [ + "Antitrust cases filed against Google between 2023 and 2026", + "Antitrust cases filed against Apple between 2023 and 2026" + ], + "generated_queries": [ + "What antitrust cases have been filed against Google and Apple in the tech sector in the EU and the US in the past three years?" + ], + "error": null + }, + { + "id": 58, + "difficulty": 2, + "domain": "environment", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query 'Annual water consumption at our manufacturing operations' is not covered, as the generated sub-query only compares the two operations rather than isolating the manufacturing one.", + "expected_queries": [ + "Annual water consumption at our manufacturing operations", + "Annual water consumption at our data center operations" + ], + "generated_queries": [ + "What is the annual water consumption at manufacturing operations compared to data center operations?" + ], + "error": null + }, + { + "id": 53, + "difficulty": 2, + "domain": "telecommunications", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Monthly churn rate for the Premium plan", + "Monthly churn rate for the Basic plan" + ], + "generated_queries": [ + "What is the monthly churn rate for the Premium subscription plan?", + "What is the monthly churn rate for the Basic subscription plan?" + ], + "error": null + }, + { + "id": 61, + "difficulty": 3, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query 'Operating costs in 2024' is not explicitly covered, as the generated query only asks for the past two years without specifying individual years.", + "expected_queries": [ + "Operating costs in 2024", + "Operating costs in 2025" + ], + "generated_queries": [ + "What are the operating costs for the company over the past two years?" + ], + "error": null + }, + { + "id": 24, + "difficulty": 2, + "domain": "engineering", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Fatigue resistance rating for carbon steel under cyclic loading", + "Fatigue resistance rating for stainless steel under cyclic loading" + ], + "generated_queries": [ + "What are the fatigue resistance ratings for carbon steel under cyclic loading for structural beams in bridge projects?", + "What are the fatigue resistance ratings for stainless steel under cyclic loading for structural beams in bridge projects?" + ], + "error": null + }, + { + "id": 74, + "difficulty": 3, + "domain": "aviation", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query about EASA certification requirements for commercial drone operations is not covered, as the generated sub-query only compares EASA and FAA requirements without focusing on EASA alone.", + "expected_queries": [ + "EASA certification requirements for commercial drone operations", + "FAA certification requirements for commercial drone operations" + ], + "generated_queries": [ + "What are the key differences in certification requirements for commercial drone operations between EASA and FAA?" + ], + "error": null + }, + { + "id": 60, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Financial documents from January or February 2025", + "Financial documents from April to December 2025" + ], + "generated_queries": [ + "financial documents filed in the last year excluding those filed in March" + ], + "error": null + }, + { + "id": 33, + "difficulty": 2, + "domain": "agriculture", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Per-hectare yield for wheat in the 2025 harvest season", + "Per-hectare yield for corn in the 2025 harvest season" + ], + "generated_queries": [ + "What were the per-hectare yields for wheat this harvest season?", + "What were the per-hectare yields for corn this harvest season?" + ], + "error": null + }, + { + "id": 36, + "difficulty": 2, + "domain": "public_policy", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current rent control measures in Berlin", + "Current rent control measures in New York City" + ], + "generated_queries": [ + "What rent control measures are currently in place in Berlin, Germany?", + "What rent control measures are currently in place in New York City, USA?" + ], + "error": null + }, + { + "id": 35, + "difficulty": 2, + "domain": "manufacturing", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Overall defect rate for Assembly Line A in Q2 2026", + "Overall defect rate for Assembly Line B in Q2 2026" + ], + "generated_queries": [ + "What are the overall defect rates for Assembly Line A this quarter?", + "What are the overall defect rates for Assembly Line B this quarter?" + ], + "error": null + }, + { + "id": 51, + "difficulty": 2, + "domain": "retail", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Revenue per square foot for East Coast stores in Q1 2026", + "Revenue per square foot for West Coast stores in Q1 2026" + ], + "generated_queries": [ + "What was the revenue per square foot for East Coast stores last quarter?", + "What was the revenue per square foot for West Coast stores last quarter?" + ], + "error": null + }, + { + "id": 56, + "difficulty": 2, + "domain": "science", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Clinical trial results from the past 90 days (since 2026-01-17) for CRISPR treatments for sickle cell disease", + "Clinical trial results from the past 90 days (since 2026-01-17) for CRISPR treatments for beta-thalassemia" + ], + "generated_queries": [ + "What are the latest clinical trial results for CRISPR-based therapies for sickle cell disease as of April 2026?", + "What are the latest clinical trial results for CRISPR-based therapies for beta-thalassemia as of April 2026?" + ], + "error": null + }, + { + "id": 21, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Returns on the S&P 500 index fund in 2025", + "Returns on the emerging markets fund in 2025" + ], + "generated_queries": [ + "What were the annual returns for the S&P 500 index fund last year?", + "What were the annual returns for the emerging markets fund last year?" + ], + "error": null + }, + { + "id": 47, + "difficulty": 2, + "domain": "marketing", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Net Promoter Score for the ProLine series", + "Net Promoter Score for the EcoLine series" + ], + "generated_queries": [ + "What are the Net Promoter Scores for the ProLine series this year?", + "What are the Net Promoter Scores for the EcoLine series this year?" + ], + "error": null + }, + { + "id": 31, + "difficulty": 2, + "domain": "environment", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total CO2 emissions for the main campus in Q1 2026", + "Total CO2 emissions for the main campus in Q3 2026" + ], + "generated_queries": [ + "What were the total CO2 emissions for the main campus in Q1 of this year?", + "What were the total CO2 emissions for the main campus in Q3 of this year?" + ], + "error": null + }, + { + "id": 71, + "difficulty": 3, + "domain": "public_policy", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Short-term and long-term effects of the Finnish UBI pilot on employment and well-being" + ], + "generated_queries": [ + "What were the short-term effects of the Finnish UBI pilot on employment and well-being?", + "What were the long-term effects of the Finnish UBI pilot on employment and well-being?" + ], + "error": null + }, + { + "id": 46, + "difficulty": 2, + "domain": "education", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Number of students enrolled in the fall 2024 semester", + "Number of students enrolled in the spring 2025 semester" + ], + "generated_queries": [ + "How many students enrolled in the fall 2024 semester?", + "How many students enrolled in the spring 2025 semester?" + ], + "error": null + }, + { + "id": 22, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total revenue in Q1 2024", + "Total revenue in Q2 2024" + ], + "generated_queries": [ + "What was the total revenue for Q1 2024?", + "What was the total revenue for Q2 2024?" + ], + "error": null + }, + { + "id": 25, + "difficulty": 2, + "domain": "science", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average Arctic sea ice extent in summer 2015", + "Average Arctic sea ice extent in summer 2020" + ], + "generated_queries": [ + "What was the average Arctic sea ice extent in summer 2015?", + "What was the average Arctic sea ice extent in summer 2020?" + ], + "error": null + }, + { + "id": 40, + "difficulty": 2, + "domain": "aviation", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "On-time departure rate in January 2025", + "On-time departure rate in February 2025" + ], + "generated_queries": [ + "What was the on-time departure rate for the airline in January 2025?", + "What was the on-time departure rate for the airline in February 2025?" + ], + "error": null + }, + { + "id": 41, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 4, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Returns of the Growth Fund in Q3 2024", + "Returns of the Growth Fund in Q4 2024", + "Returns of the Value Fund in Q3 2024", + "Returns of the Value Fund in Q4 2024" + ], + "generated_queries": [ + "What were the returns of the Growth Fund in Q3 and Q4 of 2024?", + "What were the returns of the Value Fund in Q3 and Q4 of 2024?" + ], + "error": null + }, + { + "id": 52, + "difficulty": 2, + "domain": "insurance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query 'Number of flood insurance claims in 2023' is not covered by any generated sub-query.", + "expected_queries": [ + "Number of flood insurance claims in 2023", + "Number of flood insurance claims in 2024" + ], + "generated_queries": [ + "Compare the number of flood insurance claims between 2023 and 2024", + "Compare the number of flood insurance claims between 2024 and 2023" + ], + "error": null + }, + { + "id": 50, + "difficulty": 2, + "domain": "technology", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "API uptime percentage for January 2025", + "API uptime percentage for February 2025", + "API uptime percentage for March 2025" + ], + "generated_queries": [ + "What was the API uptime percentage for January 2025?", + "What was the API uptime percentage for February 2025?", + "What was the API uptime percentage for March 2025?" + ], + "error": null + }, + { + "id": 59, + "difficulty": 2, + "domain": "real_estate", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Median home sale price in the Austin metro area in Q1 2025", + "Median home sale price in the Austin metro area in Q2 2025", + "Median home sale price in the Austin metro area in Q3 2025" + ], + "generated_queries": [ + "What was the median home sale price in the Austin metro area in Q1 of 2025?", + "What was the median home sale price in the Austin metro area in Q2 of 2025?", + "What was the median home sale price in the Austin metro area in Q3 of 2025?" + ], + "error": null + }, + { + "id": 57, + "difficulty": 2, + "domain": "HR", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Voluntary turnover rate for engineering in 2023", + "Voluntary turnover rate for engineering in 2024", + "Voluntary turnover rate for marketing in 2023", + "Voluntary turnover rate for marketing in 2024" + ], + "generated_queries": [ + "What were the voluntary turnover rates for engineering in 2023?", + "What were the voluntary turnover rates for engineering in 2024?", + "What were the voluntary turnover rates for marketing in 2023?", + "What were the voluntary turnover rates for marketing in 2024?" + ], + "error": null + }, + { + "id": 48, + "difficulty": 2, + "domain": "logistics", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Daily order processing volume at the Memphis hub in March 2025", + "Daily order processing volume at the Memphis hub in April 2025", + "Daily order processing volume at the Rotterdam hub in March 2025", + "Daily order processing volume at the Rotterdam hub in April 2025" + ], + "generated_queries": [ + "What was the daily order processing volume at the Memphis hub in March 2025?", + "What was the daily order processing volume at the Memphis hub in April 2025?", + "What was the daily order processing volume at the Rotterdam hub in March 2025?", + "What was the daily order processing volume at the Rotterdam hub in April 2025?" + ], + "error": null + }, + { + "id": 45, + "difficulty": 2, + "domain": "manufacturing", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Overall equipment effectiveness for the Shenzhen factory in H1 2024", + "Overall equipment effectiveness for the Shenzhen factory in H2 2024", + "Overall equipment effectiveness for the Munich factory in H1 2024", + "Overall equipment effectiveness for the Munich factory in H2 2024" + ], + "generated_queries": [ + "What was the overall equipment effectiveness for the Shenzhen factory in H1 of 2024?", + "What was the overall equipment effectiveness for the Shenzhen factory in H2 of 2024?", + "What was the overall equipment effectiveness for the Munich factory in H1 of 2024?", + "What was the overall equipment effectiveness for the Munich factory in H2 of 2024?" + ], + "error": null + } + ] + } + ] + }, + { + "prompt": "prompts/query_contextualizer_tmpl_v1.txt", + "models": [ + { + "model_name": "Mistral-Small-3.1-24B-Instruct-2503", + "timestamp": "2026-04-19T16:24:45.948773", + "prompt_path": "prompts/query_contextualizer_tmpl_v1.txt", + "dataset_path": "datasets/query_decomposition.json", + "judge_model": "Qwen3-VL-8B-Instruct-FP8", + "total": 80, + "errors": 0, + "count_match_passed": 69, + "count_match_accuracy": 0.8625, + "semantic_coverage_passed": 76, + "semantic_coverage_accuracy": 0.95, + "by_difficulty": { + "1": { + "total": 20, + "errors": 0, + "count_match_passed": 16, + "semantic_coverage_passed": 20, + "count_match_accuracy": 0.8, + "semantic_coverage_accuracy": 1.0 + }, + "2": { + "total": 40, + "errors": 0, + "count_match_passed": 38, + "semantic_coverage_passed": 38, + "count_match_accuracy": 0.95, + "semantic_coverage_accuracy": 0.95 + }, + "3": { + "total": 20, + "errors": 0, + "count_match_passed": 15, + "semantic_coverage_passed": 18, + "count_match_accuracy": 0.75, + "semantic_coverage_accuracy": 0.9 + } + }, + "cases": [ + { + "id": 14, + "difficulty": 1, + "domain": "energy", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Capacity factor of the wind farm in Normandy" + ], + "generated_queries": [ + "Capacity factor of the wind farm in Normandy" + ], + "error": null + }, + { + "id": 3, + "difficulty": 1, + "domain": "legal", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Statute of limitations for breach of contract in California" + ], + "generated_queries": [ + "Statute of limitations for breach of contract in California" + ], + "error": null + }, + { + "id": 1, + "difficulty": 1, + "domain": "finance", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total revenue in Q3 2024" + ], + "generated_queries": [ + "Total revenue for Q3 2024" + ], + "error": null + }, + { + "id": 67, + "difficulty": 3, + "domain": "science", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Known interactions between metformin and ibuprofen" + ], + "generated_queries": [ + "Known interactions between metformin and ibuprofen" + ], + "error": null + }, + { + "id": 16, + "difficulty": 1, + "domain": "public_policy", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current carbon tax rate per ton of CO2 in France" + ], + "generated_queries": [ + "Carbon tax rate per ton of CO2 in France" + ], + "error": null + }, + { + "id": 19, + "difficulty": 1, + "domain": "insurance", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Loss ratio for the commercial property insurance line" + ], + "generated_queries": [ + "What is the loss ratio for our commercial property line" + ], + "error": null + }, + { + "id": 12, + "difficulty": 1, + "domain": "logistics", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current lead time for ocean freight from Shanghai to Rotterdam" + ], + "generated_queries": [ + "Current lead time for ocean freight from Shanghai to Rotterdam" + ], + "error": null + }, + { + "id": 20, + "difficulty": 1, + "domain": "aviation", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Next C-check due date for aircraft with registration F-GKXA" + ], + "generated_queries": [ + "Next C-check due date for aircraft registration F-GKXA" + ], + "error": null + }, + { + "id": 69, + "difficulty": 3, + "domain": "engineering", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Thermal conductivity and fire resistance ratings of ceramic fiber insulation" + ], + "generated_queries": [ + "Thermal conductivity and fire resistance ratings of ceramic fiber insulation" + ], + "error": null + }, + { + "id": 7, + "difficulty": 1, + "domain": "education", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Number of students who graduated from the computer science program in the 2024-2025 academic year" + ], + "generated_queries": [ + "Number of students who graduated from the computer science program last year" + ], + "error": null + }, + { + "id": 10, + "difficulty": 1, + "domain": "technology", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Recommended steps for migrating a MySQL database to PostgreSQL" + ], + "generated_queries": [ + "Recommended steps for migrating a MySQL database to PostgreSQL" + ], + "error": null + }, + { + "id": 13, + "difficulty": 1, + "domain": "agriculture", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Optimal irrigation frequency for winter wheat during the tillering stage" + ], + "generated_queries": [ + "Optimal irrigation frequency for winter wheat during the tillering stage" + ], + "error": null + }, + { + "id": 9, + "difficulty": 1, + "domain": "real_estate", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average rental yield for residential properties in Lyon" + ], + "generated_queries": [ + "What is the average rental yield for residential properties in Lyon?" + ], + "error": null + }, + { + "id": 6, + "difficulty": 1, + "domain": "HR", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Parental leave entitlement for employees in the French office" + ], + "generated_queries": [ + "Parental leave entitlement for employees in our French office" + ], + "error": null + }, + { + "id": 4, + "difficulty": 1, + "domain": "engineering", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Maximum rainfall intensity for a 10-year return period in Toulouse" + ], + "generated_queries": [ + "Maximum rainfall intensity for a 10-year return period in Toulouse" + ], + "error": null + }, + { + "id": 75, + "difficulty": 3, + "domain": "marketing", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Positioning strategy for new enterprise software targeting both IT decision-makers and end users" + ], + "generated_queries": [ + "Positioning new enterprise software for IT decision-makers and end users" + ], + "error": null + }, + { + "id": 68, + "difficulty": 3, + "domain": "retail", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Typical comparison of product return rates between summer and winter seasons" + ], + "generated_queries": [ + "How do product return rates typically compare between the summer and winter seasons?" + ], + "error": null + }, + { + "id": 79, + "difficulty": 3, + "domain": "real_estate", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Combined effect of rising interest rates and increasing housing supply on residential property investment returns" + ], + "generated_queries": [ + "Impact of rising interest rates and increasing housing supply on residential property investment returns" + ], + "error": null + }, + { + "id": 76, + "difficulty": 3, + "domain": "insurance", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Joint effect of flood zone classification and building age on commercial property insurance premiums" + ], + "generated_queries": [ + "How do flood zone classification and building age jointly affect commercial property insurance premiums" + ], + "error": null + }, + { + "id": 80, + "difficulty": 3, + "domain": "energy", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Comparison of lithium-ion battery systems and pumped hydro storage in terms of scalability and round-trip efficiency" + ], + "generated_queries": [ + "Scalability and round-trip efficiency of lithium-ion battery systems compared to pumped hydro storage" + ], + "error": null + }, + { + "id": 18, + "difficulty": 1, + "domain": "telecommunications", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Packet loss rate on the fiber backbone between Paris and Marseille" + ], + "generated_queries": [ + "What is the packet loss rate on our fiber backbone between Paris and Marseille?" + ], + "error": null + }, + { + "id": 66, + "difficulty": 3, + "domain": "HR", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Onboarding process for sales new hires", + "Onboarding process for customer success new hires" + ], + "generated_queries": [ + "What does the onboarding process look like for sales and customer success new hires?" + ], + "error": null + }, + { + "id": 17, + "difficulty": 1, + "domain": "retail", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Safety stock level for the premium wireless headphones" + ], + "generated_queries": [ + "What is the safety stock level for the new line of premium wireless headphones launched last quarter" + ], + "error": null + }, + { + "id": 64, + "difficulty": 3, + "domain": "legal", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Differences between trade secrets and patents in terms of protection duration and disclosure requirements" + ], + "generated_queries": [ + "How do trade secrets and patents differ in terms of protection duration and disclosure requirements?" + ], + "error": null + }, + { + "id": 8, + "difficulty": 1, + "domain": "marketing", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average engagement rate for sponsored Instagram posts targeting the 25-35 age group" + ], + "generated_queries": [ + "Average engagement rate for sponsored posts on Instagram targeting the 25-35 age group" + ], + "error": null + }, + { + "id": 78, + "difficulty": 3, + "domain": "telecommunications", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Coverage and throughput characteristics of sub-6 GHz bands for 5G deployment", + "Coverage and throughput characteristics of mmWave bands for 5G deployment" + ], + "generated_queries": [ + "Coverage and throughput trade-offs between sub-6 GHz and mmWave bands for 5G deployment" + ], + "error": null + }, + { + "id": 23, + "difficulty": 2, + "domain": "legal", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Data privacy requirements in the European Union", + "Data privacy requirements in Brazil" + ], + "generated_queries": [ + "Data privacy requirements in the EU", + "Data privacy requirements in Brazil" + ], + "error": null + }, + { + "id": 62, + "difficulty": 3, + "domain": "healthcare", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Standard discharge procedure for elderly patients", + "Standard discharge procedure for pediatric patients" + ], + "generated_queries": [ + "Standard discharge procedure for elderly patients", + "Standard discharge procedure for pediatric patients" + ], + "error": null + }, + { + "id": 49, + "difficulty": 2, + "domain": "agriculture", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average soil nitrogen levels in our soybean fields", + "Average soil nitrogen levels in our cotton fields" + ], + "generated_queries": [ + "Average soil nitrogen levels in soybean fields", + "Average soil nitrogen levels in cotton fields" + ], + "error": null + }, + { + "id": 53, + "difficulty": 2, + "domain": "telecommunications", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Monthly churn rate for the Premium plan", + "Monthly churn rate for the Basic plan" + ], + "generated_queries": [ + "Monthly churn rate for the Premium plan", + "Monthly churn rate for the Basic plan" + ], + "error": null + }, + { + "id": 58, + "difficulty": 2, + "domain": "environment", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Annual water consumption at our manufacturing operations", + "Annual water consumption at our data center operations" + ], + "generated_queries": [ + "Annual water consumption at manufacturing operations", + "Annual water consumption at data center operations" + ], + "error": null + }, + { + "id": 26, + "difficulty": 2, + "domain": "HR", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Employee satisfaction score for the London office", + "Employee satisfaction score for the Singapore office" + ], + "generated_queries": [ + "Employee satisfaction scores for the London office", + "Employee satisfaction scores for the Singapore office" + ], + "error": null + }, + { + "id": 32, + "difficulty": 2, + "domain": "logistics", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average delivery time for shipments to Southeast Asia", + "Average delivery time for shipments to Western Europe" + ], + "generated_queries": [ + "Average delivery times for shipments to Southeast Asia", + "Average delivery times for shipments to Western Europe" + ], + "error": null + }, + { + "id": 39, + "difficulty": 2, + "domain": "insurance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average claim resolution time for auto insurance claims", + "Average claim resolution time for home insurance claims" + ], + "generated_queries": [ + "Average claim resolution time for auto insurance claims", + "Average claim resolution time for home insurance claims" + ], + "error": null + }, + { + "id": 73, + "difficulty": 3, + "domain": "education", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Application of differentiated instruction in elementary and middle school classrooms" + ], + "generated_queries": [ + "Application of differentiated instruction in elementary school classrooms", + "Application of differentiated instruction in middle school classrooms" + ], + "error": null + }, + { + "id": 29, + "difficulty": 2, + "domain": "real_estate", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average commercial lease rate in downtown Manhattan", + "Average commercial lease rate in the Miami financial district" + ], + "generated_queries": [ + "Average commercial lease rates in downtown Manhattan", + "Average commercial lease rates in the Miami financial district" + ], + "error": null + }, + { + "id": 27, + "difficulty": 2, + "domain": "education", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Student outcome results for project-based learning", + "Student outcome results for traditional lecture-based instruction" + ], + "generated_queries": [ + "Student outcome results for project-based learning", + "Student outcome results for traditional lecture-based instruction" + ], + "error": null + }, + { + "id": 74, + "difficulty": 3, + "domain": "aviation", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query about EASA certification requirements for commercial drone operations is not covered, as the generated sub-query only compares EASA and FAA requirements rather than addressing EASA requirements specifically.", + "expected_queries": [ + "EASA certification requirements for commercial drone operations", + "FAA certification requirements for commercial drone operations" + ], + "generated_queries": [ + "How do EASA and FAA certification requirements differ for commercial drone operations?" + ], + "error": null + }, + { + "id": 47, + "difficulty": 2, + "domain": "marketing", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Net Promoter Score for the ProLine series", + "Net Promoter Score for the EcoLine series" + ], + "generated_queries": [ + "Net Promoter Score for the ProLine series", + "Net Promoter Score for the EcoLine series" + ], + "error": null + }, + { + "id": 77, + "difficulty": 3, + "domain": "agriculture", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Recommended treatments for aphids in corn crops", + "Recommended treatments for rootworm in corn crops" + ], + "generated_queries": [ + "Recommended treatments for aphids in corn crops", + "Recommended treatments for rootworm in corn crops" + ], + "error": null + }, + { + "id": 36, + "difficulty": 2, + "domain": "public_policy", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current rent control measures in Berlin", + "Current rent control measures in New York City" + ], + "generated_queries": [ + "Rent control measures currently in place in Berlin", + "Rent control measures currently in place in New York City" + ], + "error": null + }, + { + "id": 5, + "difficulty": 1, + "domain": "science", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average mutation rate of human mitochondrial DNA" + ], + "generated_queries": [ + "CRISPR-Cas9 is primarily used for in gene editing", + "Average mutation rate of human mitochondrial DNA" + ], + "error": null + }, + { + "id": 33, + "difficulty": 2, + "domain": "agriculture", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Per-hectare yield for wheat in the 2025 harvest season", + "Per-hectare yield for corn in the 2025 harvest season" + ], + "generated_queries": [ + "Per-hectare yields for wheat this harvest season", + "Per-hectare yields for corn this harvest season" + ], + "error": null + }, + { + "id": 24, + "difficulty": 2, + "domain": "engineering", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Fatigue resistance rating for carbon steel under cyclic loading", + "Fatigue resistance rating for stainless steel under cyclic loading" + ], + "generated_queries": [ + "Fatigue resistance ratings for carbon steel under cyclic loading", + "Fatigue resistance ratings for stainless steel under cyclic loading" + ], + "error": null + }, + { + "id": 11, + "difficulty": 1, + "domain": "environment", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total energy consumption of the Paris headquarters in 2024" + ], + "generated_queries": [ + "Current headcount at the Paris HQ", + "Total energy consumption of the Paris headquarters in 2024" + ], + "error": null + }, + { + "id": 72, + "difficulty": 3, + "domain": "manufacturing", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Root causes of surface bubbling on the automotive paint line", + "Root causes of adhesion failure on the automotive paint line" + ], + "generated_queries": [ + "Root causes of surface bubbling on our automotive paint line", + "Root causes of adhesion failure on our automotive paint line" + ], + "error": null + }, + { + "id": 40, + "difficulty": 2, + "domain": "aviation", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "On-time departure rate in January 2025", + "On-time departure rate in February 2025" + ], + "generated_queries": [ + "On-time departure rate in January 2025", + "On-time departure rate in February 2025" + ], + "error": null + }, + { + "id": 61, + "difficulty": 3, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Operating costs in 2024", + "Operating costs in 2025" + ], + "generated_queries": [ + "Evolution of operating costs in 2024", + "Evolution of operating costs in 2025" + ], + "error": null + }, + { + "id": 52, + "difficulty": 2, + "domain": "insurance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Number of flood insurance claims in 2023", + "Number of flood insurance claims in 2024" + ], + "generated_queries": [ + "Number of flood insurance claims in 2023", + "Number of flood insurance claims in 2024" + ], + "error": null + }, + { + "id": 38, + "difficulty": 2, + "domain": "telecommunications", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average 5G download speeds in urban areas on our network", + "Average 5G download speeds in rural areas on our network" + ], + "generated_queries": [ + "Average 5G download speeds in urban areas on our network", + "Average 5G download speeds in rural areas on our network" + ], + "error": null + }, + { + "id": 25, + "difficulty": 2, + "domain": "science", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average Arctic sea ice extent in summer 2015", + "Average Arctic sea ice extent in summer 2020" + ], + "generated_queries": [ + "Average Arctic sea ice extent in summer 2015", + "Average Arctic sea ice extent in summer 2020" + ], + "error": null + }, + { + "id": 15, + "difficulty": 1, + "domain": "manufacturing", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Rework procedure for solder joint failures on Line C" + ], + "generated_queries": [ + "What is the acceptable defect threshold for solder joint inspections on Line C", + "What is the rework procedure for solder joint failures on Line C" + ], + "error": null + }, + { + "id": 54, + "difficulty": 2, + "domain": "public_policy", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current minimum wage in France", + "Current minimum wage in Japan", + "Current minimum wage in Australia" + ], + "generated_queries": [ + "Current minimum wage in France", + "Current minimum wage in Japan", + "Current minimum wage in Australia" + ], + "error": null + }, + { + "id": 34, + "difficulty": 2, + "domain": "energy", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Solar energy generation at our farm in summer 2024", + "Solar energy generation at our farm in winter 2024" + ], + "generated_queries": [ + "Solar energy generation at our farm in summer 2024", + "Solar energy generation at our farm in winter 2024" + ], + "error": null + }, + { + "id": 71, + "difficulty": 3, + "domain": "public_policy", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Short-term and long-term effects of the Finnish UBI pilot on employment and well-being" + ], + "generated_queries": [ + "Short-term effects of the Finnish UBI pilot on employment and well-being", + "Long-term effects of the Finnish UBI pilot on employment and well-being" + ], + "error": null + }, + { + "id": 55, + "difficulty": 2, + "domain": "aviation", + "n_expected_queries": 4, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Load factor for the New York-London route", + "Load factor for the Los Angeles-Tokyo route", + "Fuel cost per seat-mile for the New York-London route", + "Fuel cost per seat-mile for the Los Angeles-Tokyo route" + ], + "generated_queries": [ + "Load factor and fuel cost per seat-mile for New York-London route", + "Load factor and fuel cost per seat-mile for Los Angeles-Tokyo route" + ], + "error": null + }, + { + "id": 70, + "difficulty": 3, + "domain": "logistics", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Cost and lead time for air freight transatlantic shipments", + "Cost and lead time for sea freight transatlantic shipments" + ], + "generated_queries": [ + "Cost differences between air freight and sea freight for transatlantic shipments", + "Lead time differences between air freight and sea freight for transatlantic shipments" + ], + "error": null + }, + { + "id": 2, + "difficulty": 1, + "domain": "healthcare", + "n_expected_queries": 1, + "n_generated_queries": 3, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Recommended flu vaccination schedule for adults over 65" + ], + "generated_queries": [ + "Current hospital policy on visitor hours", + "Minimum staffing ratio required for the night shift", + "Recommended flu vaccination schedule for adults over 65" + ], + "error": null + }, + { + "id": 43, + "difficulty": 2, + "domain": "energy", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Percentage of electricity from renewables at the Texas plant", + "Percentage of electricity from renewables at the Ohio plant", + "Percentage of electricity from renewables at the Oregon plant" + ], + "generated_queries": [ + "Percentage of electricity from renewables at Texas plants", + "Percentage of electricity from renewables at Ohio plants", + "Percentage of electricity from renewables at Oregon plants" + ], + "error": null + }, + { + "id": 30, + "difficulty": 2, + "domain": "technology", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query about serverless pricing and cold start latency for AWS Lambda is not fully covered, as the generated sub-query only compares AWS Lambda with Google Cloud Functions without focusing specifically on AWS Lambda alone.", + "expected_queries": [ + "Serverless pricing and cold start latency for AWS Lambda", + "Serverless pricing and cold start latency for Google Cloud Functions" + ], + "generated_queries": [ + "Recommended way to manage secrets across environments in Kubernetes", + "Comparison of serverless pricing and cold start latency of AWS Lambda and Google Cloud Functions" + ], + "error": null + }, + { + "id": 63, + "difficulty": 3, + "domain": "technology", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-queries about the main architectural features of Kafka and RabbitMQ are not covered by any generated sub-query, as the generated one focuses on gRPC vs REST and architectural differences between Kafka and RabbitMQ without detailing their individual features.", + "expected_queries": [ + "Main architectural features of Kafka", + "Main architectural features of RabbitMQ" + ], + "generated_queries": [ + "Explain what gRPC is and where it's usually a better choice than REST", + "Main architectural differences between Kafka and RabbitMQ" + ], + "error": null + }, + { + "id": 50, + "difficulty": 2, + "domain": "technology", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "API uptime percentage for January 2025", + "API uptime percentage for February 2025", + "API uptime percentage for March 2025" + ], + "generated_queries": [ + "API uptime percentage for January 2025", + "API uptime percentage for February 2025", + "API uptime percentage for March 2025" + ], + "error": null + }, + { + "id": 42, + "difficulty": 2, + "domain": "healthcare", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average ER wait time at the Boston hospital", + "Average ER wait time at the Philadelphia hospital", + "Cardiology appointment lead time at the Boston hospital", + "Cardiology appointment lead time at the Philadelphia hospital" + ], + "generated_queries": [ + "Average ER wait times at our Boston hospital", + "Average ER wait times at our Philadelphia hospital", + "Cardiology appointment lead times at our Boston hospital", + "Cardiology appointment lead times at our Philadelphia hospital" + ], + "error": null + }, + { + "id": 59, + "difficulty": 2, + "domain": "real_estate", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Median home sale price in the Austin metro area in Q1 2025", + "Median home sale price in the Austin metro area in Q2 2025", + "Median home sale price in the Austin metro area in Q3 2025" + ], + "generated_queries": [ + "Median home sale price in the Austin metro area in Q1 2025", + "Median home sale price in the Austin metro area in Q2 2025", + "Median home sale price in the Austin metro area in Q3 2025" + ], + "error": null + }, + { + "id": 41, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Returns of the Growth Fund in Q3 2024", + "Returns of the Growth Fund in Q4 2024", + "Returns of the Value Fund in Q3 2024", + "Returns of the Value Fund in Q4 2024" + ], + "generated_queries": [ + "Returns of the Growth Fund in Q3 2024", + "Returns of the Growth Fund in Q4 2024", + "Returns of the Value Fund in Q3 2024", + "Returns of the Value Fund in Q4 2024" + ], + "error": null + }, + { + "id": 57, + "difficulty": 2, + "domain": "HR", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Voluntary turnover rate for engineering in 2023", + "Voluntary turnover rate for engineering in 2024", + "Voluntary turnover rate for marketing in 2023", + "Voluntary turnover rate for marketing in 2024" + ], + "generated_queries": [ + "Voluntary turnover rates for engineering in 2023", + "Voluntary turnover rates for engineering in 2024", + "Voluntary turnover rates for marketing in 2023", + "Voluntary turnover rates for marketing in 2024" + ], + "error": null + }, + { + "id": 22, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 4, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total revenue in Q1 2024", + "Total revenue in Q2 2024" + ], + "generated_queries": [ + "Total revenue in Q1 2024", + "Total revenue in Q2 2024", + "Total revenue in Q3 2024", + "Total revenue in Q4 2024" + ], + "error": null + }, + { + "id": 48, + "difficulty": 2, + "domain": "logistics", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Daily order processing volume at the Memphis hub in March 2025", + "Daily order processing volume at the Memphis hub in April 2025", + "Daily order processing volume at the Rotterdam hub in March 2025", + "Daily order processing volume at the Rotterdam hub in April 2025" + ], + "generated_queries": [ + "Daily order processing volume at the Memphis hub in March 2025", + "Daily order processing volume at the Rotterdam hub in March 2025", + "Daily order processing volume at the Memphis hub in April 2025", + "Daily order processing volume at the Rotterdam hub in April 2025" + ], + "error": null + }, + { + "id": 45, + "difficulty": 2, + "domain": "manufacturing", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Overall equipment effectiveness for the Shenzhen factory in H1 2024", + "Overall equipment effectiveness for the Shenzhen factory in H2 2024", + "Overall equipment effectiveness for the Munich factory in H1 2024", + "Overall equipment effectiveness for the Munich factory in H2 2024" + ], + "generated_queries": [ + "Overall equipment effectiveness for the Shenzhen factory in H1 2024", + "Overall equipment effectiveness for the Shenzhen factory in H2 2024", + "Overall equipment effectiveness for the Munich factory in H1 2024", + "Overall equipment effectiveness for the Munich factory in H2 2024" + ], + "error": null + }, + { + "id": 65, + "difficulty": 3, + "domain": "environment", + "n_expected_queries": 5, + "n_generated_queries": 5, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Amazon deforestation rates in 2021", + "Amazon deforestation rates in 2022", + "Amazon deforestation rates in 2023", + "Amazon deforestation rates in 2024", + "Amazon deforestation rates in 2025" + ], + "generated_queries": [ + "Trend in Amazon deforestation rates in 2021", + "Trend in Amazon deforestation rates in 2022", + "Trend in Amazon deforestation rates in 2023", + "Trend in Amazon deforestation rates in 2024", + "Trend in Amazon deforestation rates in 2025" + ], + "error": null + }, + { + "id": 35, + "difficulty": 2, + "domain": "manufacturing", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Overall defect rate for Assembly Line A in Q2 2026", + "Overall defect rate for Assembly Line B in Q2 2026" + ], + "generated_queries": [ + "Overall defect rates for Assembly Line A this quarter", + "Overall defect rates for Assembly Line B this quarter" + ], + "error": null + }, + { + "id": 51, + "difficulty": 2, + "domain": "retail", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Revenue per square foot for East Coast stores in Q1 2026", + "Revenue per square foot for West Coast stores in Q1 2026" + ], + "generated_queries": [ + "Revenue per square foot for East Coast stores last quarter", + "Revenue per square foot for West Coast stores last quarter" + ], + "error": null + }, + { + "id": 56, + "difficulty": 2, + "domain": "science", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Clinical trial results from the past 90 days (since 2026-01-17) for CRISPR treatments for sickle cell disease", + "Clinical trial results from the past 90 days (since 2026-01-17) for CRISPR treatments for beta-thalassemia" + ], + "generated_queries": [ + "Latest clinical trial results for CRISPR treatments for sickle cell disease", + "Latest clinical trial results for CRISPR treatments for beta-thalassemia" + ], + "error": null + }, + { + "id": 60, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Financial documents from January or February 2025", + "Financial documents from April to December 2025" + ], + "generated_queries": [ + "Financial documents filed in January or February 2025", + "Financial documents filed in April to December 2025" + ], + "error": null + }, + { + "id": 44, + "difficulty": 2, + "domain": "legal", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Antitrust cases filed against Google between 2023 and 2026", + "Antitrust cases filed against Apple between 2023 and 2026" + ], + "generated_queries": [ + "Antitrust cases filed against Google in the past three years", + "Antitrust cases filed against Apple in the past three years" + ], + "error": null + }, + { + "id": 46, + "difficulty": 2, + "domain": "education", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Number of students enrolled in the fall 2024 semester", + "Number of students enrolled in the spring 2025 semester" + ], + "generated_queries": [ + "Number of students enrolled in the fall 2024 semester", + "Number of students enrolled in the spring 2025 semester" + ], + "error": null + }, + { + "id": 31, + "difficulty": 2, + "domain": "environment", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total CO2 emissions for the main campus in Q1 2026", + "Total CO2 emissions for the main campus in Q3 2026" + ], + "generated_queries": [ + "Total CO2 emissions for the main campus in Q1 2026", + "Total CO2 emissions for the main campus in Q3 2026" + ], + "error": null + }, + { + "id": 21, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The generated sub-queries refer to 'last year,' which is ambiguous and does not specifically target 2025 as required by the expected sub-queries.", + "expected_queries": [ + "Returns on the S&P 500 index fund in 2025", + "Returns on the emerging markets fund in 2025" + ], + "generated_queries": [ + "Returns of the S&P 500 index fund last year", + "Returns of the emerging markets fund last year" + ], + "error": null + }, + { + "id": 28, + "difficulty": 2, + "domain": "marketing", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Click-through rate for the summer campaign on Facebook", + "Click-through rate for the summer campaign on Instagram", + "Click-through rate for the summer campaign on Google Ads" + ], + "generated_queries": [ + "Click-through rate for our summer campaign on Facebook", + "Click-through rate for our summer campaign on Instagram", + "Click-through rate for our summer campaign on Google Ads" + ], + "error": null + }, + { + "id": 37, + "difficulty": 2, + "domain": "retail", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total online sales figures for Black Friday", + "Total online sales figures for Cyber Monday", + "Total online sales figures for Christmas week" + ], + "generated_queries": [ + "Total online sales figures for Black Friday 2025", + "Total online sales figures for Cyber Monday 2025", + "Total online sales figures for Christmas week 2025" + ], + "error": null + } + ] + }, + { + "model_name": "Qwen3-VL-8B-Instruct-FP8", + "timestamp": "2026-04-19T16:24:56.702790", + "prompt_path": "prompts/query_contextualizer_tmpl_v1.txt", + "dataset_path": "datasets/query_decomposition.json", + "judge_model": "Qwen3-VL-8B-Instruct-FP8", + "total": 80, + "errors": 0, + "count_match_passed": 66, + "count_match_accuracy": 0.825, + "semantic_coverage_passed": 71, + "semantic_coverage_accuracy": 0.8875, + "by_difficulty": { + "1": { + "total": 20, + "errors": 0, + "count_match_passed": 20, + "semantic_coverage_passed": 20, + "count_match_accuracy": 1.0, + "semantic_coverage_accuracy": 1.0 + }, + "2": { + "total": 40, + "errors": 0, + "count_match_passed": 35, + "semantic_coverage_passed": 36, + "count_match_accuracy": 0.875, + "semantic_coverage_accuracy": 0.9 + }, + "3": { + "total": 20, + "errors": 0, + "count_match_passed": 11, + "semantic_coverage_passed": 15, + "count_match_accuracy": 0.55, + "semantic_coverage_accuracy": 0.75 + } + }, + "cases": [ + { + "id": 15, + "difficulty": 1, + "domain": "manufacturing", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Rework procedure for solder joint failures on Line C" + ], + "generated_queries": [ + "Rework procedure for solder joint failures on Line C" + ], + "error": null + }, + { + "id": 16, + "difficulty": 1, + "domain": "public_policy", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current carbon tax rate per ton of CO2 in France" + ], + "generated_queries": [ + "Carbon tax rate per ton of CO2 in France" + ], + "error": null + }, + { + "id": 18, + "difficulty": 1, + "domain": "telecommunications", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Packet loss rate on the fiber backbone between Paris and Marseille" + ], + "generated_queries": [ + "Packet loss rate on the fiber backbone between Paris and Marseille" + ], + "error": null + }, + { + "id": 3, + "difficulty": 1, + "domain": "legal", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Statute of limitations for breach of contract in California" + ], + "generated_queries": [ + "Statute of limitations for breach of contract in California" + ], + "error": null + }, + { + "id": 79, + "difficulty": 3, + "domain": "real_estate", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Combined effect of rising interest rates and increasing housing supply on residential property investment returns" + ], + "generated_queries": [ + "How do rising interest rates and increasing housing supply together affect residential property investment returns?" + ], + "error": null + }, + { + "id": 5, + "difficulty": 1, + "domain": "science", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average mutation rate of human mitochondrial DNA" + ], + "generated_queries": [ + "Average mutation rate of human mitochondrial DNA" + ], + "error": null + }, + { + "id": 6, + "difficulty": 1, + "domain": "HR", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Parental leave entitlement for employees in the French office" + ], + "generated_queries": [ + "Parental leave entitlement for employees in the French office" + ], + "error": null + }, + { + "id": 19, + "difficulty": 1, + "domain": "insurance", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Loss ratio for the commercial property insurance line" + ], + "generated_queries": [ + "Loss ratio for commercial property line" + ], + "error": null + }, + { + "id": 14, + "difficulty": 1, + "domain": "energy", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Capacity factor of the wind farm in Normandy" + ], + "generated_queries": [ + "Capacity factor of the wind farm in Normandy" + ], + "error": null + }, + { + "id": 72, + "difficulty": 3, + "domain": "manufacturing", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Root causes of surface bubbling on the automotive paint line", + "Root causes of adhesion failure on the automotive paint line" + ], + "generated_queries": [ + "Root causes of surface bubbling and adhesion failure in the automotive paint line" + ], + "error": null + }, + { + "id": 7, + "difficulty": 1, + "domain": "education", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Number of students who graduated from the computer science program in the 2024-2025 academic year" + ], + "generated_queries": [ + "Number of students who graduated from the computer science program last year" + ], + "error": null + }, + { + "id": 10, + "difficulty": 1, + "domain": "technology", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Recommended steps for migrating a MySQL database to PostgreSQL" + ], + "generated_queries": [ + "Recommended steps for migrating a MySQL database to PostgreSQL" + ], + "error": null + }, + { + "id": 9, + "difficulty": 1, + "domain": "real_estate", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average rental yield for residential properties in Lyon" + ], + "generated_queries": [ + "Average rental yield for residential properties in Lyon" + ], + "error": null + }, + { + "id": 77, + "difficulty": 3, + "domain": "agriculture", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Recommended treatments for aphids in corn crops", + "Recommended treatments for rootworm in corn crops" + ], + "generated_queries": [ + "Recommended treatments for aphids in corn crops", + "Recommended treatments for rootworm in corn crops" + ], + "error": null + }, + { + "id": 4, + "difficulty": 1, + "domain": "engineering", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Maximum rainfall intensity for a 10-year return period in Toulouse" + ], + "generated_queries": [ + "Maximum rainfall intensity for a 10-year return period in Toulouse" + ], + "error": null + }, + { + "id": 68, + "difficulty": 3, + "domain": "retail", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Typical comparison of product return rates between summer and winter seasons" + ], + "generated_queries": [ + "Comparison of product return rates between summer and winter seasons" + ], + "error": null + }, + { + "id": 17, + "difficulty": 1, + "domain": "retail", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Safety stock level for the premium wireless headphones" + ], + "generated_queries": [ + "Safety stock level for the new line of premium wireless headphones" + ], + "error": null + }, + { + "id": 32, + "difficulty": 2, + "domain": "logistics", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average delivery time for shipments to Southeast Asia", + "Average delivery time for shipments to Western Europe" + ], + "generated_queries": [ + "Average delivery times for shipments to Southeast Asia", + "Average delivery times for shipments to Western Europe" + ], + "error": null + }, + { + "id": 67, + "difficulty": 3, + "domain": "science", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Known interactions between metformin and ibuprofen" + ], + "generated_queries": [ + "Known interactions between metformin and ibuprofen" + ], + "error": null + }, + { + "id": 2, + "difficulty": 1, + "domain": "healthcare", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Recommended flu vaccination schedule for adults over 65" + ], + "generated_queries": [ + "Recommended flu vaccination schedule for adults over 65" + ], + "error": null + }, + { + "id": 12, + "difficulty": 1, + "domain": "logistics", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current lead time for ocean freight from Shanghai to Rotterdam" + ], + "generated_queries": [ + "Current lead time for ocean freight from Shanghai to Rotterdam" + ], + "error": null + }, + { + "id": 75, + "difficulty": 3, + "domain": "marketing", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Positioning strategy for new enterprise software targeting both IT decision-makers and end users" + ], + "generated_queries": [ + "How to position new enterprise software for IT decision-makers and end users" + ], + "error": null + }, + { + "id": 13, + "difficulty": 1, + "domain": "agriculture", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Optimal irrigation frequency for winter wheat during the tillering stage" + ], + "generated_queries": [ + "Optimal irrigation frequency for winter wheat during the tillering stage" + ], + "error": null + }, + { + "id": 69, + "difficulty": 3, + "domain": "engineering", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Thermal conductivity and fire resistance ratings of ceramic fiber insulation" + ], + "generated_queries": [ + "Thermal conductivity and fire resistance ratings of ceramic fiber insulation" + ], + "error": null + }, + { + "id": 64, + "difficulty": 3, + "domain": "legal", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Differences between trade secrets and patents in terms of protection duration and disclosure requirements" + ], + "generated_queries": [ + "Differences between trade secrets and patents regarding protection duration and disclosure requirements" + ], + "error": null + }, + { + "id": 35, + "difficulty": 2, + "domain": "manufacturing", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Overall defect rate for Assembly Line A in Q2 2026", + "Overall defect rate for Assembly Line B in Q2 2026" + ], + "generated_queries": [ + "Overall defect rates for Assembly Line A this quarter", + "Overall defect rates for Assembly Line B this quarter" + ], + "error": null + }, + { + "id": 27, + "difficulty": 2, + "domain": "education", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Student outcome results for project-based learning", + "Student outcome results for traditional lecture-based instruction" + ], + "generated_queries": [ + "Student outcome results comparing project-based learning to traditional lecture-based instruction" + ], + "error": null + }, + { + "id": 20, + "difficulty": 1, + "domain": "aviation", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Next C-check due date for aircraft with registration F-GKXA" + ], + "generated_queries": [ + "When is the next C-check due for aircraft registration F-GKXA?" + ], + "error": null + }, + { + "id": 58, + "difficulty": 2, + "domain": "environment", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query 'Annual water consumption at our data center operations' is not covered by any generated sub-query.", + "expected_queries": [ + "Annual water consumption at our manufacturing operations", + "Annual water consumption at our data center operations" + ], + "generated_queries": [ + "Annual water consumption at manufacturing operations versus data center operations" + ], + "error": null + }, + { + "id": 76, + "difficulty": 3, + "domain": "insurance", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Joint effect of flood zone classification and building age on commercial property insurance premiums" + ], + "generated_queries": [ + "How do flood zone classification and building age jointly affect commercial property insurance premiums?" + ], + "error": null + }, + { + "id": 30, + "difficulty": 2, + "domain": "technology", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Serverless pricing and cold start latency for AWS Lambda", + "Serverless pricing and cold start latency for Google Cloud Functions" + ], + "generated_queries": [ + "Compare serverless pricing and cold start latency of AWS Lambda and Google Cloud Functions" + ], + "error": null + }, + { + "id": 80, + "difficulty": 3, + "domain": "energy", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Comparison of lithium-ion battery systems and pumped hydro storage in terms of scalability and round-trip efficiency" + ], + "generated_queries": [ + "Comparison of lithium-ion battery systems and pumped hydro storage regarding scalability and round-trip efficiency" + ], + "error": null + }, + { + "id": 8, + "difficulty": 1, + "domain": "marketing", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average engagement rate for sponsored Instagram posts targeting the 25-35 age group" + ], + "generated_queries": [ + "Average engagement rate for sponsored posts on Instagram targeting the 25-35 age group" + ], + "error": null + }, + { + "id": 78, + "difficulty": 3, + "domain": "telecommunications", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Coverage and throughput characteristics of sub-6 GHz bands for 5G deployment", + "Coverage and throughput characteristics of mmWave bands for 5G deployment" + ], + "generated_queries": [ + "Coverage and throughput trade-offs between sub-6 GHz and mmWave bands for 5G deployment" + ], + "error": null + }, + { + "id": 62, + "difficulty": 3, + "domain": "healthcare", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Standard discharge procedure for elderly patients", + "Standard discharge procedure for pediatric patients" + ], + "generated_queries": [ + "Standard discharge procedure for elderly patients", + "Standard discharge procedure for pediatric patients" + ], + "error": null + }, + { + "id": 36, + "difficulty": 2, + "domain": "public_policy", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current rent control measures in Berlin", + "Current rent control measures in New York City" + ], + "generated_queries": [ + "Rent control measures currently in place in Berlin", + "Rent control measures currently in place in New York City" + ], + "error": null + }, + { + "id": 24, + "difficulty": 2, + "domain": "engineering", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Fatigue resistance rating for carbon steel under cyclic loading", + "Fatigue resistance rating for stainless steel under cyclic loading" + ], + "generated_queries": [ + "Fatigue resistance ratings for carbon steel under cyclic loading", + "Fatigue resistance ratings for stainless steel under cyclic loading" + ], + "error": null + }, + { + "id": 23, + "difficulty": 2, + "domain": "legal", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Data privacy requirements in the European Union", + "Data privacy requirements in Brazil" + ], + "generated_queries": [ + "Data privacy requirements in the EU", + "Data privacy requirements in Brazil" + ], + "error": null + }, + { + "id": 26, + "difficulty": 2, + "domain": "HR", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Employee satisfaction score for the London office", + "Employee satisfaction score for the Singapore office" + ], + "generated_queries": [ + "Employee satisfaction scores for the London office", + "Employee satisfaction scores for the Singapore office" + ], + "error": null + }, + { + "id": 66, + "difficulty": 3, + "domain": "HR", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Onboarding process for sales new hires", + "Onboarding process for customer success new hires" + ], + "generated_queries": [ + "Onboarding process for sales new hires", + "Onboarding process for customer success new hires" + ], + "error": null + }, + { + "id": 53, + "difficulty": 2, + "domain": "telecommunications", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Monthly churn rate for the Premium plan", + "Monthly churn rate for the Basic plan" + ], + "generated_queries": [ + "Monthly churn rate for the Premium subscription plan", + "Monthly churn rate for the Basic subscription plan" + ], + "error": null + }, + { + "id": 63, + "difficulty": 3, + "domain": "technology", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query about the main architectural features of Kafka is not covered, as the generated sub-query focuses on differences rather than detailing Kafka's features alone.", + "expected_queries": [ + "Main architectural features of Kafka", + "Main architectural features of RabbitMQ" + ], + "generated_queries": [ + "Main architectural differences between Kafka and RabbitMQ" + ], + "error": null + }, + { + "id": 29, + "difficulty": 2, + "domain": "real_estate", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average commercial lease rate in downtown Manhattan", + "Average commercial lease rate in the Miami financial district" + ], + "generated_queries": [ + "Average commercial lease rates in downtown Manhattan", + "Average commercial lease rates in the Miami financial district" + ], + "error": null + }, + { + "id": 47, + "difficulty": 2, + "domain": "marketing", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Net Promoter Score for the ProLine series", + "Net Promoter Score for the EcoLine series" + ], + "generated_queries": [ + "Net Promoter Score for the ProLine series", + "Net Promoter Score for the EcoLine series" + ], + "error": null + }, + { + "id": 49, + "difficulty": 2, + "domain": "agriculture", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average soil nitrogen levels in our soybean fields", + "Average soil nitrogen levels in our cotton fields" + ], + "generated_queries": [ + "Average soil nitrogen levels in soybean fields", + "Average soil nitrogen levels in cotton fields" + ], + "error": null + }, + { + "id": 73, + "difficulty": 3, + "domain": "education", + "n_expected_queries": 1, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Application of differentiated instruction in elementary and middle school classrooms" + ], + "generated_queries": [ + "How is differentiated instruction applied in elementary school classrooms?", + "How is differentiated instruction applied in middle school classrooms?" + ], + "error": null + }, + { + "id": 39, + "difficulty": 2, + "domain": "insurance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average claim resolution time for auto insurance claims", + "Average claim resolution time for home insurance claims" + ], + "generated_queries": [ + "Average claim resolution time for auto insurance claims", + "Average claim resolution time for home insurance claims" + ], + "error": null + }, + { + "id": 38, + "difficulty": 2, + "domain": "telecommunications", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average 5G download speeds in urban areas on our network", + "Average 5G download speeds in rural areas on our network" + ], + "generated_queries": [ + "Average 5G download speeds in urban areas on our network", + "Average 5G download speeds in rural areas on our network" + ], + "error": null + }, + { + "id": 33, + "difficulty": 2, + "domain": "agriculture", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Per-hectare yield for wheat in the 2025 harvest season", + "Per-hectare yield for corn in the 2025 harvest season" + ], + "generated_queries": [ + "Per-hectare yields for wheat this harvest season", + "Per-hectare yields for corn this harvest season" + ], + "error": null + }, + { + "id": 54, + "difficulty": 2, + "domain": "public_policy", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Current minimum wage in France", + "Current minimum wage in Japan", + "Current minimum wage in Australia" + ], + "generated_queries": [ + "Current minimum wage in France", + "Current minimum wage in Japan", + "Current minimum wage in Australia" + ], + "error": null + }, + { + "id": 37, + "difficulty": 2, + "domain": "retail", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total online sales figures for Black Friday", + "Total online sales figures for Cyber Monday", + "Total online sales figures for Christmas week" + ], + "generated_queries": [ + "Total online sales figures for Black Friday", + "Total online sales figures for Cyber Monday", + "Total online sales figures for Christmas week" + ], + "error": null + }, + { + "id": 52, + "difficulty": 2, + "domain": "insurance", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query 'Number of flood insurance claims in 2023' is not covered by any generated sub-query.", + "expected_queries": [ + "Number of flood insurance claims in 2023", + "Number of flood insurance claims in 2024" + ], + "generated_queries": [ + "Comparison of flood insurance claims between 2023 and 2024" + ], + "error": null + }, + { + "id": 55, + "difficulty": 2, + "domain": "aviation", + "n_expected_queries": 4, + "n_generated_queries": 2, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Load factor for the New York-London route", + "Load factor for the Los Angeles-Tokyo route", + "Fuel cost per seat-mile for the New York-London route", + "Fuel cost per seat-mile for the Los Angeles-Tokyo route" + ], + "generated_queries": [ + "Load factor and fuel cost per seat-mile for the New York-London route", + "Load factor and fuel cost per seat-mile for the Los Angeles-Tokyo route" + ], + "error": null + }, + { + "id": 28, + "difficulty": 2, + "domain": "marketing", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Click-through rate for the summer campaign on Facebook", + "Click-through rate for the summer campaign on Instagram", + "Click-through rate for the summer campaign on Google Ads" + ], + "generated_queries": [ + "Click-through rate for summer campaign on Facebook", + "Click-through rate for summer campaign on Instagram", + "Click-through rate for summer campaign on Google Ads" + ], + "error": null + }, + { + "id": 43, + "difficulty": 2, + "domain": "energy", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Percentage of electricity from renewables at the Texas plant", + "Percentage of electricity from renewables at the Ohio plant", + "Percentage of electricity from renewables at the Oregon plant" + ], + "generated_queries": [ + "Percentage of electricity from renewables at Texas plant", + "Percentage of electricity from renewables at Ohio plant", + "Percentage of electricity from renewables at Oregon plant" + ], + "error": null + }, + { + "id": 74, + "difficulty": 3, + "domain": "aviation", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query about EASA certification requirements for commercial drone operations is not covered, as the generated sub-query only compares EASA and FAA requirements rather than detailing EASA's specific requirements.", + "expected_queries": [ + "EASA certification requirements for commercial drone operations", + "FAA certification requirements for commercial drone operations" + ], + "generated_queries": [ + "Differences in EASA and FAA certification requirements for commercial drone operations" + ], + "error": null + }, + { + "id": 70, + "difficulty": 3, + "domain": "logistics", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query about cost and lead time for air freight transatlantic shipments is not covered, as the generated sub-query only compares air and sea freight without isolating air freight specifics.", + "expected_queries": [ + "Cost and lead time for air freight transatlantic shipments", + "Cost and lead time for sea freight transatlantic shipments" + ], + "generated_queries": [ + "Cost and lead time differences between air freight and sea freight for transatlantic shipments" + ], + "error": null + }, + { + "id": 42, + "difficulty": 2, + "domain": "healthcare", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average ER wait time at the Boston hospital", + "Average ER wait time at the Philadelphia hospital", + "Cardiology appointment lead time at the Boston hospital", + "Cardiology appointment lead time at the Philadelphia hospital" + ], + "generated_queries": [ + "Average ER wait times at Boston hospital", + "Average ER wait times at Philadelphia hospital", + "Cardiology appointment lead times at Boston hospital", + "Cardiology appointment lead times at Philadelphia hospital" + ], + "error": null + }, + { + "id": 21, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The generated sub-queries refer to 'last year,' which is ambiguous and does not specifically target 2025 as required by the expected sub-queries.", + "expected_queries": [ + "Returns on the S&P 500 index fund in 2025", + "Returns on the emerging markets fund in 2025" + ], + "generated_queries": [ + "Returns of the S&P 500 index fund last year", + "Returns of the emerging markets fund last year" + ], + "error": null + }, + { + "id": 71, + "difficulty": 3, + "domain": "public_policy", + "n_expected_queries": 1, + "n_generated_queries": 4, + "decomposition_count_match": false, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Short-term and long-term effects of the Finnish UBI pilot on employment and well-being" + ], + "generated_queries": [ + "Short-term effects of the Finnish UBI pilot on employment", + "Short-term effects of the Finnish UBI pilot on well-being", + "Long-term effects of the Finnish UBI pilot on employment", + "Long-term effects of the Finnish UBI pilot on well-being" + ], + "error": null + }, + { + "id": 1, + "difficulty": 1, + "domain": "finance", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total revenue in Q3 2024" + ], + "generated_queries": [ + "Total revenue in Q3 2024" + ], + "error": null + }, + { + "id": 11, + "difficulty": 1, + "domain": "environment", + "n_expected_queries": 1, + "n_generated_queries": 1, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total energy consumption of the Paris headquarters in 2024" + ], + "generated_queries": [ + "Total energy consumption of the Paris headquarters in 2024" + ], + "error": null + }, + { + "id": 65, + "difficulty": 3, + "domain": "environment", + "n_expected_queries": 5, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query for Amazon deforestation rates in 2025 is not covered by the generated sub-query, as it does not specify a particular year but rather a general trend over the last five years.", + "expected_queries": [ + "Amazon deforestation rates in 2021", + "Amazon deforestation rates in 2022", + "Amazon deforestation rates in 2023", + "Amazon deforestation rates in 2024", + "Amazon deforestation rates in 2025" + ], + "generated_queries": [ + "Trend in Amazon deforestation rates over the last five years" + ], + "error": null + }, + { + "id": 61, + "difficulty": 3, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 1, + "decomposition_count_match": false, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query 'Operating costs in 2024' is not covered by the generated sub-query 'Evolution of operating costs over the past two years', as it does not explicitly target the specific year 2024.", + "expected_queries": [ + "Operating costs in 2024", + "Operating costs in 2025" + ], + "generated_queries": [ + "Evolution of operating costs over the past two years" + ], + "error": null + }, + { + "id": 34, + "difficulty": 2, + "domain": "energy", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Solar energy generation at our farm in summer 2024", + "Solar energy generation at our farm in winter 2024" + ], + "generated_queries": [ + "Solar energy generation at the farm in summer 2024", + "Solar energy generation at the farm in winter 2024" + ], + "error": null + }, + { + "id": 46, + "difficulty": 2, + "domain": "education", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Number of students enrolled in the fall 2024 semester", + "Number of students enrolled in the spring 2025 semester" + ], + "generated_queries": [ + "Number of students enrolled in the fall 2024 semester", + "Number of students enrolled in the spring 2025 semester" + ], + "error": null + }, + { + "id": 31, + "difficulty": 2, + "domain": "environment", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total CO2 emissions for the main campus in Q1 2026", + "Total CO2 emissions for the main campus in Q3 2026" + ], + "generated_queries": [ + "Total CO2 emissions for the main campus in Q1 of this year", + "Total CO2 emissions for the main campus in Q3 of this year" + ], + "error": null + }, + { + "id": 51, + "difficulty": 2, + "domain": "retail", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Revenue per square foot for East Coast stores in Q1 2026", + "Revenue per square foot for West Coast stores in Q1 2026" + ], + "generated_queries": [ + "Revenue per square foot for East Coast stores last quarter", + "Revenue per square foot for West Coast stores last quarter" + ], + "error": null + }, + { + "id": 22, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Total revenue in Q1 2024", + "Total revenue in Q2 2024" + ], + "generated_queries": [ + "Total revenue for Q1 2024", + "Total revenue for Q2 2024" + ], + "error": null + }, + { + "id": 44, + "difficulty": 2, + "domain": "legal", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Antitrust cases filed against Google between 2023 and 2026", + "Antitrust cases filed against Apple between 2023 and 2026" + ], + "generated_queries": [ + "Antitrust cases filed against Google in the last three years", + "Antitrust cases filed against Apple in the last three years" + ], + "error": null + }, + { + "id": 56, + "difficulty": 2, + "domain": "science", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Clinical trial results from the past 90 days (since 2026-01-17) for CRISPR treatments for sickle cell disease", + "Clinical trial results from the past 90 days (since 2026-01-17) for CRISPR treatments for beta-thalassemia" + ], + "generated_queries": [ + "Latest clinical trial results for CRISPR treatments for sickle cell disease", + "Latest clinical trial results for CRISPR treatments for beta-thalassemia" + ], + "error": null + }, + { + "id": 25, + "difficulty": 2, + "domain": "science", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Average Arctic sea ice extent in summer 2015", + "Average Arctic sea ice extent in summer 2020" + ], + "generated_queries": [ + "Average Arctic sea ice extent in summer 2015", + "Average Arctic sea ice extent in summer 2020" + ], + "error": null + }, + { + "id": 40, + "difficulty": 2, + "domain": "aviation", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "On-time departure rate in January 2025", + "On-time departure rate in February 2025" + ], + "generated_queries": [ + "On-time departure rate for January 2025", + "On-time departure rate for February 2025" + ], + "error": null + }, + { + "id": 60, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 2, + "n_generated_queries": 2, + "decomposition_count_match": true, + "decomposition_semantic_coverage": false, + "coverage_reasoning": "The expected sub-query for financial documents from January or February 2025 is not covered, as the generated sub-query refers to 'last year' which is ambiguous and does not specifically match 2025.", + "expected_queries": [ + "Financial documents from January or February 2025", + "Financial documents from April to December 2025" + ], + "generated_queries": [ + "Financial documents filed in January or February last year", + "Financial documents filed from April to December last year" + ], + "error": null + }, + { + "id": 50, + "difficulty": 2, + "domain": "technology", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "API uptime percentage for January 2025", + "API uptime percentage for February 2025", + "API uptime percentage for March 2025" + ], + "generated_queries": [ + "API uptime percentage for January 2025", + "API uptime percentage for February 2025", + "API uptime percentage for March 2025" + ], + "error": null + }, + { + "id": 59, + "difficulty": 2, + "domain": "real_estate", + "n_expected_queries": 3, + "n_generated_queries": 3, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Median home sale price in the Austin metro area in Q1 2025", + "Median home sale price in the Austin metro area in Q2 2025", + "Median home sale price in the Austin metro area in Q3 2025" + ], + "generated_queries": [ + "Median home sale price in the Austin metro area in Q1 2025", + "Median home sale price in the Austin metro area in Q2 2025", + "Median home sale price in the Austin metro area in Q3 2025" + ], + "error": null + }, + { + "id": 57, + "difficulty": 2, + "domain": "HR", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Voluntary turnover rate for engineering in 2023", + "Voluntary turnover rate for engineering in 2024", + "Voluntary turnover rate for marketing in 2023", + "Voluntary turnover rate for marketing in 2024" + ], + "generated_queries": [ + "Voluntary turnover rates for engineering in 2023", + "Voluntary turnover rates for engineering in 2024", + "Voluntary turnover rates for marketing in 2023", + "Voluntary turnover rates for marketing in 2024" + ], + "error": null + }, + { + "id": 41, + "difficulty": 2, + "domain": "finance", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Returns of the Growth Fund in Q3 2024", + "Returns of the Growth Fund in Q4 2024", + "Returns of the Value Fund in Q3 2024", + "Returns of the Value Fund in Q4 2024" + ], + "generated_queries": [ + "Returns of the Growth Fund in Q3 2024", + "Returns of the Growth Fund in Q4 2024", + "Returns of the Value Fund in Q3 2024", + "Returns of the Value Fund in Q4 2024" + ], + "error": null + }, + { + "id": 45, + "difficulty": 2, + "domain": "manufacturing", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Overall equipment effectiveness for the Shenzhen factory in H1 2024", + "Overall equipment effectiveness for the Shenzhen factory in H2 2024", + "Overall equipment effectiveness for the Munich factory in H1 2024", + "Overall equipment effectiveness for the Munich factory in H2 2024" + ], + "generated_queries": [ + "Overall equipment effectiveness for the Shenzhen factory in H1 2024", + "Overall equipment effectiveness for the Shenzhen factory in H2 2024", + "Overall equipment effectiveness for the Munich factory in H1 2024", + "Overall equipment effectiveness for the Munich factory in H2 2024" + ], + "error": null + }, + { + "id": 48, + "difficulty": 2, + "domain": "logistics", + "n_expected_queries": 4, + "n_generated_queries": 4, + "decomposition_count_match": true, + "decomposition_semantic_coverage": true, + "coverage_reasoning": null, + "expected_queries": [ + "Daily order processing volume at the Memphis hub in March 2025", + "Daily order processing volume at the Memphis hub in April 2025", + "Daily order processing volume at the Rotterdam hub in March 2025", + "Daily order processing volume at the Rotterdam hub in April 2025" + ], + "generated_queries": [ + "Daily order processing volume at the Memphis hub in March 2025", + "Daily order processing volume at the Memphis hub in April 2025", + "Daily order processing volume at the Rotterdam hub in March 2025", + "Daily order processing volume at the Rotterdam hub in April 2025" + ], + "error": null + } + ] + } + ] + } + ] +} \ No newline at end of file diff --git a/benchmarks/prompt_eval/uv.lock b/benchmarks/prompt_eval/uv.lock new file mode 100644 index 00000000..bd925206 --- /dev/null +++ b/benchmarks/prompt_eval/uv.lock @@ -0,0 +1,958 @@ +version = 1 +revision = 3 +requires-python = ">=3.12" + +[[package]] +name = "annotated-types" +version = "0.7.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/ee/67/531ea369ba64dcff5ec9c3402f9f51bf748cec26dde048a2f973a4eea7f5/annotated_types-0.7.0.tar.gz", hash = "sha256:aff07c09a53a08bc8cfccb9c85b05f1aa9a2a6f23728d790723543408344ce89", size = 16081, upload-time = "2024-05-20T21:33:25.928Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl", hash = "sha256:1f02e8b43a8fbbc3f3e0d4f0f4bfc8131bcb4eebe8849b8e5c773f3a1c582a53", size = 13643, upload-time = "2024-05-20T21:33:24.1Z" }, +] + +[[package]] +name = "anyio" +version = "4.13.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "idna" }, + { name = "typing-extensions", marker = "python_full_version < '3.13'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/19/14/2c5dd9f512b66549ae92767a9c7b330ae88e1932ca57876909410251fe13/anyio-4.13.0.tar.gz", hash = "sha256:334b70e641fd2221c1505b3890c69882fe4a2df910cba14d97019b90b24439dc", size = 231622, upload-time = "2026-03-24T12:59:09.671Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/da/42/e921fccf5015463e32a3cf6ee7f980a6ed0f395ceeaa45060b61d86486c2/anyio-4.13.0-py3-none-any.whl", hash = "sha256:08b310f9e24a9594186fd75b4f73f4a4152069e3853f1ed8bfbf58369f4ad708", size = 114353, upload-time = "2026-03-24T12:59:08.246Z" }, +] + +[[package]] +name = "certifi" +version = "2026.2.25" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/af/2d/7bf41579a8986e348fa033a31cdd0e4121114f6bce2457e8876010b092dd/certifi-2026.2.25.tar.gz", hash = "sha256:e887ab5cee78ea814d3472169153c2d12cd43b14bd03329a39a9c6e2e80bfba7", size = 155029, upload-time = "2026-02-25T02:54:17.342Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/9a/3c/c17fb3ca2d9c3acff52e30b309f538586f9f5b9c9cf454f3845fc9af4881/certifi-2026.2.25-py3-none-any.whl", hash = "sha256:027692e4402ad994f1c42e52a4997a9763c646b73e4096e4d5d6db8af1d6f0fa", size = 153684, upload-time = "2026-02-25T02:54:15.766Z" }, +] + +[[package]] +name = "charset-normalizer" +version = "3.4.7" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/e7/a1/67fe25fac3c7642725500a3f6cfe5821ad557c3abb11c9d20d12c7008d3e/charset_normalizer-3.4.7.tar.gz", hash = "sha256:ae89db9e5f98a11a4bf50407d4363e7b09b31e55bc117b4f7d80aab97ba009e5", size = 144271, upload-time = "2026-04-02T09:28:39.342Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0c/eb/4fc8d0a7110eb5fc9cc161723a34a8a6c200ce3b4fbf681bc86feee22308/charset_normalizer-3.4.7-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:eca9705049ad3c7345d574e3510665cb2cf844c2f2dcfe675332677f081cbd46", size = 311328, upload-time = "2026-04-02T09:26:24.331Z" }, + { url = "https://files.pythonhosted.org/packages/f8/e3/0fadc706008ac9d7b9b5be6dc767c05f9d3e5df51744ce4cc9605de7b9f4/charset_normalizer-3.4.7-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6178f72c5508bfc5fd446a5905e698c6212932f25bcdd4b47a757a50605a90e2", size = 208061, upload-time = "2026-04-02T09:26:25.568Z" }, + { url = "https://files.pythonhosted.org/packages/42/f0/3dd1045c47f4a4604df85ec18ad093912ae1344ac706993aff91d38773a2/charset_normalizer-3.4.7-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:e1421b502d83040e6d7fb2fb18dff63957f720da3d77b2fbd3187ceb63755d7b", size = 229031, upload-time = "2026-04-02T09:26:26.865Z" }, + { url = "https://files.pythonhosted.org/packages/dc/67/675a46eb016118a2fbde5a277a5d15f4f69d5f3f5f338e5ee2f8948fcf43/charset_normalizer-3.4.7-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:edac0f1ab77644605be2cbba52e6b7f630731fc42b34cb0f634be1a6eface56a", size = 225239, upload-time = "2026-04-02T09:26:28.044Z" }, + { url = "https://files.pythonhosted.org/packages/4b/f8/d0118a2f5f23b02cd166fa385c60f9b0d4f9194f574e2b31cef350ad7223/charset_normalizer-3.4.7-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5649fd1c7bade02f320a462fdefd0b4bd3ce036065836d4f42e0de958038e116", size = 216589, upload-time = "2026-04-02T09:26:29.239Z" }, + { url = "https://files.pythonhosted.org/packages/b1/f1/6d2b0b261b6c4ceef0fcb0d17a01cc5bc53586c2d4796fa04b5c540bc13d/charset_normalizer-3.4.7-cp312-cp312-manylinux_2_31_armv7l.whl", hash = "sha256:203104ed3e428044fd943bc4bf45fa73c0730391f9621e37fe39ecf477b128cb", size = 202733, upload-time = "2026-04-02T09:26:30.5Z" }, + { url = "https://files.pythonhosted.org/packages/6f/c0/7b1f943f7e87cc3db9626ba17807d042c38645f0a1d4415c7a14afb5591f/charset_normalizer-3.4.7-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:298930cec56029e05497a76988377cbd7457ba864beeea92ad7e844fe74cd1f1", size = 212652, upload-time = "2026-04-02T09:26:31.709Z" }, + { url = "https://files.pythonhosted.org/packages/38/dd/5a9ab159fe45c6e72079398f277b7d2b523e7f716acc489726115a910097/charset_normalizer-3.4.7-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:708838739abf24b2ceb208d0e22403dd018faeef86ddac04319a62ae884c4f15", size = 211229, upload-time = "2026-04-02T09:26:33.282Z" }, + { url = "https://files.pythonhosted.org/packages/d5/ff/531a1cad5ca855d1c1a8b69cb71abfd6d85c0291580146fda7c82857caa1/charset_normalizer-3.4.7-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:0f7eb884681e3938906ed0434f20c63046eacd0111c4ba96f27b76084cd679f5", size = 203552, upload-time = "2026-04-02T09:26:34.845Z" }, + { url = "https://files.pythonhosted.org/packages/c1/4c/a5fb52d528a8ca41f7598cb619409ece30a169fbdf9cdce592e53b46c3a6/charset_normalizer-3.4.7-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:4dc1e73c36828f982bfe79fadf5919923f8a6f4df2860804db9a98c48824ce8d", size = 230806, upload-time = "2026-04-02T09:26:36.152Z" }, + { url = "https://files.pythonhosted.org/packages/59/7a/071feed8124111a32b316b33ae4de83d36923039ef8cf48120266844285b/charset_normalizer-3.4.7-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:aed52fea0513bac0ccde438c188c8a471c4e0f457c2dd20cdbf6ea7a450046c7", size = 212316, upload-time = "2026-04-02T09:26:37.672Z" }, + { url = "https://files.pythonhosted.org/packages/fd/35/f7dba3994312d7ba508e041eaac39a36b120f32d4c8662b8814dab876431/charset_normalizer-3.4.7-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:fea24543955a6a729c45a73fe90e08c743f0b3334bbf3201e6c4bc1b0c7fa464", size = 227274, upload-time = "2026-04-02T09:26:38.93Z" }, + { url = "https://files.pythonhosted.org/packages/8a/2d/a572df5c9204ab7688ec1edc895a73ebded3b023bb07364710b05dd1c9be/charset_normalizer-3.4.7-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:bb6d88045545b26da47aa879dd4a89a71d1dce0f0e549b1abcb31dfe4a8eac49", size = 218468, upload-time = "2026-04-02T09:26:40.17Z" }, + { url = "https://files.pythonhosted.org/packages/86/eb/890922a8b03a568ca2f336c36585a4713c55d4d67bf0f0c78924be6315ca/charset_normalizer-3.4.7-cp312-cp312-win32.whl", hash = "sha256:2257141f39fe65a3fdf38aeccae4b953e5f3b3324f4ff0daf9f15b8518666a2c", size = 148460, upload-time = "2026-04-02T09:26:41.416Z" }, + { url = "https://files.pythonhosted.org/packages/35/d9/0e7dffa06c5ab081f75b1b786f0aefc88365825dfcd0ac544bdb7b2b6853/charset_normalizer-3.4.7-cp312-cp312-win_amd64.whl", hash = "sha256:5ed6ab538499c8644b8a3e18debabcd7ce684f3fa91cf867521a7a0279cab2d6", size = 159330, upload-time = "2026-04-02T09:26:42.554Z" }, + { url = "https://files.pythonhosted.org/packages/9e/5d/481bcc2a7c88ea6b0878c299547843b2521ccbc40980cb406267088bc701/charset_normalizer-3.4.7-cp312-cp312-win_arm64.whl", hash = "sha256:56be790f86bfb2c98fb742ce566dfb4816e5a83384616ab59c49e0604d49c51d", size = 147828, upload-time = "2026-04-02T09:26:44.075Z" }, + { url = "https://files.pythonhosted.org/packages/c1/3b/66777e39d3ae1ddc77ee606be4ec6d8cbd4c801f65e5a1b6f2b11b8346dd/charset_normalizer-3.4.7-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:f496c9c3cc02230093d8330875c4c3cdfc3b73612a5fd921c65d39cbcef08063", size = 309627, upload-time = "2026-04-02T09:26:45.198Z" }, + { url = "https://files.pythonhosted.org/packages/2e/4e/b7f84e617b4854ade48a1b7915c8ccfadeba444d2a18c291f696e37f0d3b/charset_normalizer-3.4.7-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0ea948db76d31190bf08bd371623927ee1339d5f2a0b4b1b4a4439a65298703c", size = 207008, upload-time = "2026-04-02T09:26:46.824Z" }, + { url = "https://files.pythonhosted.org/packages/c4/bb/ec73c0257c9e11b268f018f068f5d00aa0ef8c8b09f7753ebd5f2880e248/charset_normalizer-3.4.7-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:a277ab8928b9f299723bc1a2dabb1265911b1a76341f90a510368ca44ad9ab66", size = 228303, upload-time = "2026-04-02T09:26:48.397Z" }, + { url = "https://files.pythonhosted.org/packages/85/fb/32d1f5033484494619f701e719429c69b766bfc4dbc61aa9e9c8c166528b/charset_normalizer-3.4.7-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:3bec022aec2c514d9cf199522a802bd007cd588ab17ab2525f20f9c34d067c18", size = 224282, upload-time = "2026-04-02T09:26:49.684Z" }, + { url = "https://files.pythonhosted.org/packages/fa/07/330e3a0dda4c404d6da83b327270906e9654a24f6c546dc886a0eb0ffb23/charset_normalizer-3.4.7-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e044c39e41b92c845bc815e5ae4230804e8e7bc29e399b0437d64222d92809dd", size = 215595, upload-time = "2026-04-02T09:26:50.915Z" }, + { url = "https://files.pythonhosted.org/packages/e3/7c/fc890655786e423f02556e0216d4b8c6bcb6bdfa890160dc66bf52dee468/charset_normalizer-3.4.7-cp313-cp313-manylinux_2_31_armv7l.whl", hash = "sha256:f495a1652cf3fbab2eb0639776dad966c2fb874d79d87ca07f9d5f059b8bd215", size = 201986, upload-time = "2026-04-02T09:26:52.197Z" }, + { url = "https://files.pythonhosted.org/packages/d8/97/bfb18b3db2aed3b90cf54dc292ad79fdd5ad65c4eae454099475cbeadd0d/charset_normalizer-3.4.7-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:e712b419df8ba5e42b226c510472b37bd57b38e897d3eca5e8cfd410a29fa859", size = 211711, upload-time = "2026-04-02T09:26:53.49Z" }, + { url = "https://files.pythonhosted.org/packages/6f/a5/a581c13798546a7fd557c82614a5c65a13df2157e9ad6373166d2a3e645d/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:7804338df6fcc08105c7745f1502ba68d900f45fd770d5bdd5288ddccb8a42d8", size = 210036, upload-time = "2026-04-02T09:26:54.975Z" }, + { url = "https://files.pythonhosted.org/packages/8c/bf/b3ab5bcb478e4193d517644b0fb2bf5497fbceeaa7a1bc0f4d5b50953861/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:481551899c856c704d58119b5025793fa6730adda3571971af568f66d2424bb5", size = 202998, upload-time = "2026-04-02T09:26:56.303Z" }, + { url = "https://files.pythonhosted.org/packages/e7/4e/23efd79b65d314fa320ec6017b4b5834d5c12a58ba4610aa353af2e2f577/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:f59099f9b66f0d7145115e6f80dd8b1d847176df89b234a5a6b3f00437aa0832", size = 230056, upload-time = "2026-04-02T09:26:57.554Z" }, + { url = "https://files.pythonhosted.org/packages/b9/9f/1e1941bc3f0e01df116e68dc37a55c4d249df5e6fa77f008841aef68264f/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:f59ad4c0e8f6bba240a9bb85504faa1ab438237199d4cce5f622761507b8f6a6", size = 211537, upload-time = "2026-04-02T09:26:58.843Z" }, + { url = "https://files.pythonhosted.org/packages/80/0f/088cbb3020d44428964a6c97fe1edfb1b9550396bf6d278330281e8b709c/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:3dedcc22d73ec993f42055eff4fcfed9318d1eeb9a6606c55892a26964964e48", size = 226176, upload-time = "2026-04-02T09:27:00.437Z" }, + { url = "https://files.pythonhosted.org/packages/6a/9f/130394f9bbe06f4f63e22641d32fc9b202b7e251c9aef4db044324dac493/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:64f02c6841d7d83f832cd97ccf8eb8a906d06eb95d5276069175c696b024b60a", size = 217723, upload-time = "2026-04-02T09:27:02.021Z" }, + { url = "https://files.pythonhosted.org/packages/73/55/c469897448a06e49f8fa03f6caae97074fde823f432a98f979cc42b90e69/charset_normalizer-3.4.7-cp313-cp313-win32.whl", hash = "sha256:4042d5c8f957e15221d423ba781e85d553722fc4113f523f2feb7b188cc34c5e", size = 148085, upload-time = "2026-04-02T09:27:03.192Z" }, + { url = "https://files.pythonhosted.org/packages/5d/78/1b74c5bbb3f99b77a1715c91b3e0b5bdb6fe302d95ace4f5b1bec37b0167/charset_normalizer-3.4.7-cp313-cp313-win_amd64.whl", hash = "sha256:3946fa46a0cf3e4c8cb1cc52f56bb536310d34f25f01ca9b6c16afa767dab110", size = 158819, upload-time = "2026-04-02T09:27:04.454Z" }, + { url = "https://files.pythonhosted.org/packages/68/86/46bd42279d323deb8687c4a5a811fd548cb7d1de10cf6535d099877a9a9f/charset_normalizer-3.4.7-cp313-cp313-win_arm64.whl", hash = "sha256:80d04837f55fc81da168b98de4f4b797ef007fc8a79ab71c6ec9bc4dd662b15b", size = 147915, upload-time = "2026-04-02T09:27:05.971Z" }, + { url = "https://files.pythonhosted.org/packages/97/c8/c67cb8c70e19ef1960b97b22ed2a1567711de46c4ddf19799923adc836c2/charset_normalizer-3.4.7-cp314-cp314-macosx_10_15_universal2.whl", hash = "sha256:c36c333c39be2dbca264d7803333c896ab8fa7d4d6f0ab7edb7dfd7aea6e98c0", size = 309234, upload-time = "2026-04-02T09:27:07.194Z" }, + { url = "https://files.pythonhosted.org/packages/99/85/c091fdee33f20de70d6c8b522743b6f831a2f1cd3ff86de4c6a827c48a76/charset_normalizer-3.4.7-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1c2aed2e5e41f24ea8ef1590b8e848a79b56f3a5564a65ceec43c9d692dc7d8a", size = 208042, upload-time = "2026-04-02T09:27:08.749Z" }, + { url = "https://files.pythonhosted.org/packages/87/1c/ab2ce611b984d2fd5d86a5a8a19c1ae26acac6bad967da4967562c75114d/charset_normalizer-3.4.7-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:54523e136b8948060c0fa0bc7b1b50c32c186f2fceee897a495406bb6e311d2b", size = 228706, upload-time = "2026-04-02T09:27:09.951Z" }, + { url = "https://files.pythonhosted.org/packages/a8/29/2b1d2cb00bf085f59d29eb773ce58ec2d325430f8c216804a0a5cd83cbca/charset_normalizer-3.4.7-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:715479b9a2802ecac752a3b0efa2b0b60285cf962ee38414211abdfccc233b41", size = 224727, upload-time = "2026-04-02T09:27:11.175Z" }, + { url = "https://files.pythonhosted.org/packages/47/5c/032c2d5a07fe4d4855fea851209cca2b6f03ebeb6d4e3afdb3358386a684/charset_normalizer-3.4.7-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bd6c2a1c7573c64738d716488d2cdd3c00e340e4835707d8fdb8dc1a66ef164e", size = 215882, upload-time = "2026-04-02T09:27:12.446Z" }, + { url = "https://files.pythonhosted.org/packages/2c/c2/356065d5a8b78ed04499cae5f339f091946a6a74f91e03476c33f0ab7100/charset_normalizer-3.4.7-cp314-cp314-manylinux_2_31_armv7l.whl", hash = "sha256:c45e9440fb78f8ddabcf714b68f936737a121355bf59f3907f4e17721b9d1aae", size = 200860, upload-time = "2026-04-02T09:27:13.721Z" }, + { url = "https://files.pythonhosted.org/packages/0c/cd/a32a84217ced5039f53b29f460962abb2d4420def55afabe45b1c3c7483d/charset_normalizer-3.4.7-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:3534e7dcbdcf757da6b85a0bbf5b6868786d5982dd959b065e65481644817a18", size = 211564, upload-time = "2026-04-02T09:27:15.272Z" }, + { url = "https://files.pythonhosted.org/packages/44/86/58e6f13ce26cc3b8f4a36b94a0f22ae2f00a72534520f4ae6857c4b81f89/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:e8ac484bf18ce6975760921bb6148041faa8fef0547200386ea0b52b5d27bf7b", size = 211276, upload-time = "2026-04-02T09:27:16.834Z" }, + { url = "https://files.pythonhosted.org/packages/8f/fe/d17c32dc72e17e155e06883efa84514ca375f8a528ba2546bee73fc4df81/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:a5fe03b42827c13cdccd08e6c0247b6a6d4b5e3cdc53fd1749f5896adcdc2356", size = 201238, upload-time = "2026-04-02T09:27:18.229Z" }, + { url = "https://files.pythonhosted.org/packages/6a/29/f33daa50b06525a237451cdb6c69da366c381a3dadcd833fa5676bc468b3/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:2d6eb928e13016cea4f1f21d1e10c1cebd5a421bc57ddf5b1142ae3f86824fab", size = 230189, upload-time = "2026-04-02T09:27:19.445Z" }, + { url = "https://files.pythonhosted.org/packages/b6/6e/52c84015394a6a0bdcd435210a7e944c5f94ea1055f5cc5d56c5fe368e7b/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:e74327fb75de8986940def6e8dee4f127cc9752bee7355bb323cc5b2659b6d46", size = 211352, upload-time = "2026-04-02T09:27:20.79Z" }, + { url = "https://files.pythonhosted.org/packages/8c/d7/4353be581b373033fb9198bf1da3cf8f09c1082561e8e922aa7b39bf9fe8/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:d6038d37043bced98a66e68d3aa2b6a35505dc01328cd65217cefe82f25def44", size = 227024, upload-time = "2026-04-02T09:27:22.063Z" }, + { url = "https://files.pythonhosted.org/packages/30/45/99d18aa925bd1740098ccd3060e238e21115fffbfdcb8f3ece837d0ace6c/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:7579e913a5339fb8fa133f6bbcfd8e6749696206cf05acdbdca71a1b436d8e72", size = 217869, upload-time = "2026-04-02T09:27:23.486Z" }, + { url = "https://files.pythonhosted.org/packages/5c/05/5ee478aa53f4bb7996482153d4bfe1b89e0f087f0ab6b294fcf92d595873/charset_normalizer-3.4.7-cp314-cp314-win32.whl", hash = "sha256:5b77459df20e08151cd6f8b9ef8ef1f961ef73d85c21a555c7eed5b79410ec10", size = 148541, upload-time = "2026-04-02T09:27:25.146Z" }, + { url = "https://files.pythonhosted.org/packages/48/77/72dcb0921b2ce86420b2d79d454c7022bf5be40202a2a07906b9f2a35c97/charset_normalizer-3.4.7-cp314-cp314-win_amd64.whl", hash = "sha256:92a0a01ead5e668468e952e4238cccd7c537364eb7d851ab144ab6627dbbe12f", size = 159634, upload-time = "2026-04-02T09:27:26.642Z" }, + { url = "https://files.pythonhosted.org/packages/c6/a3/c2369911cd72f02386e4e340770f6e158c7980267da16af8f668217abaa0/charset_normalizer-3.4.7-cp314-cp314-win_arm64.whl", hash = "sha256:67f6279d125ca0046a7fd386d01b311c6363844deac3e5b069b514ba3e63c246", size = 148384, upload-time = "2026-04-02T09:27:28.271Z" }, + { url = "https://files.pythonhosted.org/packages/94/09/7e8a7f73d24dba1f0035fbbf014d2c36828fc1bf9c88f84093e57d315935/charset_normalizer-3.4.7-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:effc3f449787117233702311a1b7d8f59cba9ced946ba727bdc329ec69028e24", size = 330133, upload-time = "2026-04-02T09:27:29.474Z" }, + { url = "https://files.pythonhosted.org/packages/8d/da/96975ddb11f8e977f706f45cddd8540fd8242f71ecdb5d18a80723dcf62c/charset_normalizer-3.4.7-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fbccdc05410c9ee21bbf16a35f4c1d16123dcdeb8a1d38f33654fa21d0234f79", size = 216257, upload-time = "2026-04-02T09:27:30.793Z" }, + { url = "https://files.pythonhosted.org/packages/e5/e8/1d63bf8ef2d388e95c64b2098f45f84758f6d102a087552da1485912637b/charset_normalizer-3.4.7-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:733784b6d6def852c814bce5f318d25da2ee65dd4839a0718641c696e09a2960", size = 234851, upload-time = "2026-04-02T09:27:32.44Z" }, + { url = "https://files.pythonhosted.org/packages/9b/40/e5ff04233e70da2681fa43969ad6f66ca5611d7e669be0246c4c7aaf6dc8/charset_normalizer-3.4.7-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:a89c23ef8d2c6b27fd200a42aa4ac72786e7c60d40efdc76e6011260b6e949c4", size = 233393, upload-time = "2026-04-02T09:27:34.03Z" }, + { url = "https://files.pythonhosted.org/packages/be/c1/06c6c49d5a5450f76899992f1ee40b41d076aee9279b49cf9974d2f313d5/charset_normalizer-3.4.7-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:6c114670c45346afedc0d947faf3c7f701051d2518b943679c8ff88befe14f8e", size = 223251, upload-time = "2026-04-02T09:27:35.369Z" }, + { url = "https://files.pythonhosted.org/packages/2b/9f/f2ff16fb050946169e3e1f82134d107e5d4ae72647ec8a1b1446c148480f/charset_normalizer-3.4.7-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:a180c5e59792af262bf263b21a3c49353f25945d8d9f70628e73de370d55e1e1", size = 206609, upload-time = "2026-04-02T09:27:36.661Z" }, + { url = "https://files.pythonhosted.org/packages/69/d5/a527c0cd8d64d2eab7459784fb4169a0ac76e5a6fc5237337982fd61347e/charset_normalizer-3.4.7-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:3c9a494bc5ec77d43cea229c4f6db1e4d8fe7e1bbffa8b6f0f0032430ff8ab44", size = 220014, upload-time = "2026-04-02T09:27:38.019Z" }, + { url = "https://files.pythonhosted.org/packages/7e/80/8a7b8104a3e203074dc9aa2c613d4b726c0e136bad1cc734594b02867972/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:8d828b6667a32a728a1ad1d93957cdf37489c57b97ae6c4de2860fa749b8fc1e", size = 218979, upload-time = "2026-04-02T09:27:39.37Z" }, + { url = "https://files.pythonhosted.org/packages/02/9a/b759b503d507f375b2b5c153e4d2ee0a75aa215b7f2489cf314f4541f2c0/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:cf1493cd8607bec4d8a7b9b004e699fcf8f9103a9284cc94962cb73d20f9d4a3", size = 209238, upload-time = "2026-04-02T09:27:40.722Z" }, + { url = "https://files.pythonhosted.org/packages/c2/4e/0f3f5d47b86bdb79256e7290b26ac847a2832d9a4033f7eb2cd4bcf4bb5b/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:0c96c3b819b5c3e9e165495db84d41914d6894d55181d2d108cc1a69bfc9cce0", size = 236110, upload-time = "2026-04-02T09:27:42.33Z" }, + { url = "https://files.pythonhosted.org/packages/96/23/bce28734eb3ed2c91dcf93abeb8a5cf393a7b2749725030bb630e554fdd8/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:752a45dc4a6934060b3b0dab47e04edc3326575f82be64bc4fc293914566503e", size = 219824, upload-time = "2026-04-02T09:27:43.924Z" }, + { url = "https://files.pythonhosted.org/packages/2c/6f/6e897c6984cc4d41af319b077f2f600fc8214eb2fe2d6bcb79141b882400/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:8778f0c7a52e56f75d12dae53ae320fae900a8b9b4164b981b9c5ce059cd1fcb", size = 233103, upload-time = "2026-04-02T09:27:45.348Z" }, + { url = "https://files.pythonhosted.org/packages/76/22/ef7bd0fe480a0ae9b656189ec00744b60933f68b4f42a7bb06589f6f576a/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:ce3412fbe1e31eb81ea42f4169ed94861c56e643189e1e75f0041f3fe7020abe", size = 225194, upload-time = "2026-04-02T09:27:46.706Z" }, + { url = "https://files.pythonhosted.org/packages/c5/a7/0e0ab3e0b5bc1219bd80a6a0d4d72ca74d9250cb2382b7c699c147e06017/charset_normalizer-3.4.7-cp314-cp314t-win32.whl", hash = "sha256:c03a41a8784091e67a39648f70c5f97b5b6a37f216896d44d2cdcb82615339a0", size = 159827, upload-time = "2026-04-02T09:27:48.053Z" }, + { url = "https://files.pythonhosted.org/packages/7a/1d/29d32e0fb40864b1f878c7f5a0b343ae676c6e2b271a2d55cc3a152391da/charset_normalizer-3.4.7-cp314-cp314t-win_amd64.whl", hash = "sha256:03853ed82eeebbce3c2abfdbc98c96dc205f32a79627688ac9a27370ea61a49c", size = 174168, upload-time = "2026-04-02T09:27:49.795Z" }, + { url = "https://files.pythonhosted.org/packages/de/32/d92444ad05c7a6e41fb2036749777c163baf7a0301a040cb672d6b2b1ae9/charset_normalizer-3.4.7-cp314-cp314t-win_arm64.whl", hash = "sha256:c35abb8bfff0185efac5878da64c45dafd2b37fb0383add1be155a763c1f083d", size = 153018, upload-time = "2026-04-02T09:27:51.116Z" }, + { url = "https://files.pythonhosted.org/packages/db/8f/61959034484a4a7c527811f4721e75d02d653a35afb0b6054474d8185d4c/charset_normalizer-3.4.7-py3-none-any.whl", hash = "sha256:3dce51d0f5e7951f8bb4900c257dad282f49190fdbebecd4ba99bcc41fef404d", size = 61958, upload-time = "2026-04-02T09:28:37.794Z" }, +] + +[[package]] +name = "colorama" +version = "0.4.6" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697, upload-time = "2022-10-25T02:36:22.414Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335, upload-time = "2022-10-25T02:36:20.889Z" }, +] + +[[package]] +name = "distro" +version = "1.9.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/fc/f8/98eea607f65de6527f8a2e8885fc8015d3e6f5775df186e443e0964a11c3/distro-1.9.0.tar.gz", hash = "sha256:2fa77c6fd8940f116ee1d6b94a2f90b13b5ea8d019b98bc8bafdcabcdd9bdbed", size = 60722, upload-time = "2023-12-24T09:54:32.31Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/12/b3/231ffd4ab1fc9d679809f356cebee130ac7daa00d6d6f3206dd4fd137e9e/distro-1.9.0-py3-none-any.whl", hash = "sha256:7bffd925d65168f85027d8da9af6bddab658135b840670a223589bc0c8ef02b2", size = 20277, upload-time = "2023-12-24T09:54:30.421Z" }, +] + +[[package]] +name = "h11" +version = "0.16.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/01/ee/02a2c011bdab74c6fb3c75474d40b3052059d95df7e73351460c8588d963/h11-0.16.0.tar.gz", hash = "sha256:4e35b956cf45792e4caa5885e69fba00bdbc6ffafbfa020300e549b208ee5ff1", size = 101250, upload-time = "2025-04-24T03:35:25.427Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl", hash = "sha256:63cf8bbe7522de3bf65932fda1d9c2772064ffb3dae62d55932da54b31cb6c86", size = 37515, upload-time = "2025-04-24T03:35:24.344Z" }, +] + +[[package]] +name = "httpcore" +version = "1.0.9" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "certifi" }, + { name = "h11" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/06/94/82699a10bca87a5556c9c59b5963f2d039dbd239f25bc2a63907a05a14cb/httpcore-1.0.9.tar.gz", hash = "sha256:6e34463af53fd2ab5d807f399a9b45ea31c3dfa2276f15a2c3f00afff6e176e8", size = 85484, upload-time = "2025-04-24T22:06:22.219Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl", hash = "sha256:2d400746a40668fc9dec9810239072b40b4484b640a8c38fd654a024c7a1bf55", size = 78784, upload-time = "2025-04-24T22:06:20.566Z" }, +] + +[[package]] +name = "httpx" +version = "0.28.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "anyio" }, + { name = "certifi" }, + { name = "httpcore" }, + { name = "idna" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/b1/df/48c586a5fe32a0f01324ee087459e112ebb7224f646c0b5023f5e79e9956/httpx-0.28.1.tar.gz", hash = "sha256:75e98c5f16b0f35b567856f597f06ff2270a374470a5c2392242528e3e3e42fc", size = 141406, upload-time = "2024-12-06T15:37:23.222Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/2a/39/e50c7c3a983047577ee07d2a9e53faf5a69493943ec3f6a384bdc792deb2/httpx-0.28.1-py3-none-any.whl", hash = "sha256:d909fcccc110f8c7faf814ca82a9a4d816bc5a6dbfea25d6591d6985b8ba59ad", size = 73517, upload-time = "2024-12-06T15:37:21.509Z" }, +] + +[[package]] +name = "idna" +version = "3.11" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/6f/6d/0703ccc57f3a7233505399edb88de3cbd678da106337b9fcde432b65ed60/idna-3.11.tar.gz", hash = "sha256:795dafcc9c04ed0c1fb032c2aa73654d8e8c5023a7df64a53f39190ada629902", size = 194582, upload-time = "2025-10-12T14:55:20.501Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0e/61/66938bbb5fc52dbdf84594873d5b51fb1f7c7794e9c0f5bd885f30bc507b/idna-3.11-py3-none-any.whl", hash = "sha256:771a87f49d9defaf64091e6e6fe9c18d4833f140bd19464795bc32d966ca37ea", size = 71008, upload-time = "2025-10-12T14:55:18.883Z" }, +] + +[[package]] +name = "jiter" +version = "0.13.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/0d/5e/4ec91646aee381d01cdb9974e30882c9cd3b8c5d1079d6b5ff4af522439a/jiter-0.13.0.tar.gz", hash = "sha256:f2839f9c2c7e2dffc1bc5929a510e14ce0a946be9365fd1219e7ef342dae14f4", size = 164847, upload-time = "2026-02-02T12:37:56.441Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/2e/30/7687e4f87086829955013ca12a9233523349767f69653ebc27036313def9/jiter-0.13.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:0a2bd69fc1d902e89925fc34d1da51b2128019423d7b339a45d9e99c894e0663", size = 307958, upload-time = "2026-02-02T12:35:57.165Z" }, + { url = "https://files.pythonhosted.org/packages/c3/27/e57f9a783246ed95481e6749cc5002a8a767a73177a83c63ea71f0528b90/jiter-0.13.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:f917a04240ef31898182f76a332f508f2cc4b57d2b4d7ad2dbfebbfe167eb505", size = 318597, upload-time = "2026-02-02T12:35:58.591Z" }, + { url = "https://files.pythonhosted.org/packages/cf/52/e5719a60ac5d4d7c5995461a94ad5ef962a37c8bf5b088390e6fad59b2ff/jiter-0.13.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c1e2b199f446d3e82246b4fd9236d7cb502dc2222b18698ba0d986d2fecc6152", size = 348821, upload-time = "2026-02-02T12:36:00.093Z" }, + { url = "https://files.pythonhosted.org/packages/61/db/c1efc32b8ba4c740ab3fc2d037d8753f67685f475e26b9d6536a4322bcdd/jiter-0.13.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:04670992b576fa65bd056dbac0c39fe8bd67681c380cb2b48efa885711d9d726", size = 364163, upload-time = "2026-02-02T12:36:01.937Z" }, + { url = "https://files.pythonhosted.org/packages/55/8a/fb75556236047c8806995671a18e4a0ad646ed255276f51a20f32dceaeec/jiter-0.13.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:5a1aff1fbdb803a376d4d22a8f63f8e7ccbce0b4890c26cc7af9e501ab339ef0", size = 483709, upload-time = "2026-02-02T12:36:03.41Z" }, + { url = "https://files.pythonhosted.org/packages/7e/16/43512e6ee863875693a8e6f6d532e19d650779d6ba9a81593ae40a9088ff/jiter-0.13.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:3b3fb8c2053acaef8580809ac1d1f7481a0a0bdc012fd7f5d8b18fb696a5a089", size = 370480, upload-time = "2026-02-02T12:36:04.791Z" }, + { url = "https://files.pythonhosted.org/packages/f8/4c/09b93e30e984a187bc8aaa3510e1ec8dcbdcd71ca05d2f56aac0492453aa/jiter-0.13.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bdaba7d87e66f26a2c45d8cbadcbfc4bf7884182317907baf39cfe9775bb4d93", size = 360735, upload-time = "2026-02-02T12:36:06.994Z" }, + { url = "https://files.pythonhosted.org/packages/1a/1b/46c5e349019874ec5dfa508c14c37e29864ea108d376ae26d90bee238cd7/jiter-0.13.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:7b88d649135aca526da172e48083da915ec086b54e8e73a425ba50999468cc08", size = 391814, upload-time = "2026-02-02T12:36:08.368Z" }, + { url = "https://files.pythonhosted.org/packages/15/9e/26184760e85baee7162ad37b7912797d2077718476bf91517641c92b3639/jiter-0.13.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:e404ea551d35438013c64b4f357b0474c7abf9f781c06d44fcaf7a14c69ff9e2", size = 513990, upload-time = "2026-02-02T12:36:09.993Z" }, + { url = "https://files.pythonhosted.org/packages/e9/34/2c9355247d6debad57a0a15e76ab1566ab799388042743656e566b3b7de1/jiter-0.13.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:1f4748aad1b4a93c8bdd70f604d0f748cdc0e8744c5547798acfa52f10e79228", size = 548021, upload-time = "2026-02-02T12:36:11.376Z" }, + { url = "https://files.pythonhosted.org/packages/ac/4a/9f2c23255d04a834398b9c2e0e665382116911dc4d06b795710503cdad25/jiter-0.13.0-cp312-cp312-win32.whl", hash = "sha256:0bf670e3b1445fc4d31612199f1744f67f889ee1bbae703c4b54dc097e5dd394", size = 203024, upload-time = "2026-02-02T12:36:12.682Z" }, + { url = "https://files.pythonhosted.org/packages/09/ee/f0ae675a957ae5a8f160be3e87acea6b11dc7b89f6b7ab057e77b2d2b13a/jiter-0.13.0-cp312-cp312-win_amd64.whl", hash = "sha256:15db60e121e11fe186c0b15236bd5d18381b9ddacdcf4e659feb96fc6c969c92", size = 205424, upload-time = "2026-02-02T12:36:13.93Z" }, + { url = "https://files.pythonhosted.org/packages/1b/02/ae611edf913d3cbf02c97cdb90374af2082c48d7190d74c1111dde08bcdd/jiter-0.13.0-cp312-cp312-win_arm64.whl", hash = "sha256:41f92313d17989102f3cb5dd533a02787cdb99454d494344b0361355da52fcb9", size = 186818, upload-time = "2026-02-02T12:36:15.308Z" }, + { url = "https://files.pythonhosted.org/packages/91/9c/7ee5a6ff4b9991e1a45263bfc46731634c4a2bde27dfda6c8251df2d958c/jiter-0.13.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:1f8a55b848cbabf97d861495cd65f1e5c590246fabca8b48e1747c4dfc8f85bf", size = 306897, upload-time = "2026-02-02T12:36:16.748Z" }, + { url = "https://files.pythonhosted.org/packages/7c/02/be5b870d1d2be5dd6a91bdfb90f248fbb7dcbd21338f092c6b89817c3dbf/jiter-0.13.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f556aa591c00f2c45eb1b89f68f52441a016034d18b65da60e2d2875bbbf344a", size = 317507, upload-time = "2026-02-02T12:36:18.351Z" }, + { url = "https://files.pythonhosted.org/packages/da/92/b25d2ec333615f5f284f3a4024f7ce68cfa0604c322c6808b2344c7f5d2b/jiter-0.13.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f7e1d61da332ec412350463891923f960c3073cf1aae93b538f0bb4c8cd46efb", size = 350560, upload-time = "2026-02-02T12:36:19.746Z" }, + { url = "https://files.pythonhosted.org/packages/be/ec/74dcb99fef0aca9fbe56b303bf79f6bd839010cb18ad41000bf6cc71eec0/jiter-0.13.0-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:3097d665a27bc96fd9bbf7f86178037db139f319f785e4757ce7ccbf390db6c2", size = 363232, upload-time = "2026-02-02T12:36:21.243Z" }, + { url = "https://files.pythonhosted.org/packages/1b/37/f17375e0bb2f6a812d4dd92d7616e41917f740f3e71343627da9db2824ce/jiter-0.13.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:9d01ecc3a8cbdb6f25a37bd500510550b64ddf9f7d64a107d92f3ccb25035d0f", size = 483727, upload-time = "2026-02-02T12:36:22.688Z" }, + { url = "https://files.pythonhosted.org/packages/77/d2/a71160a5ae1a1e66c1395b37ef77da67513b0adba73b993a27fbe47eb048/jiter-0.13.0-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:ed9bbc30f5d60a3bdf63ae76beb3f9db280d7f195dfcfa61af792d6ce912d159", size = 370799, upload-time = "2026-02-02T12:36:24.106Z" }, + { url = "https://files.pythonhosted.org/packages/01/99/ed5e478ff0eb4e8aa5fd998f9d69603c9fd3f32de3bd16c2b1194f68361c/jiter-0.13.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:98fbafb6e88256f4454de33c1f40203d09fc33ed19162a68b3b257b29ca7f663", size = 359120, upload-time = "2026-02-02T12:36:25.519Z" }, + { url = "https://files.pythonhosted.org/packages/16/be/7ffd08203277a813f732ba897352797fa9493faf8dc7995b31f3d9cb9488/jiter-0.13.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:5467696f6b827f1116556cb0db620440380434591e93ecee7fd14d1a491b6daa", size = 390664, upload-time = "2026-02-02T12:36:26.866Z" }, + { url = "https://files.pythonhosted.org/packages/d1/84/e0787856196d6d346264d6dcccb01f741e5f0bd014c1d9a2ebe149caf4f3/jiter-0.13.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:2d08c9475d48b92892583df9da592a0e2ac49bcd41fae1fec4f39ba6cf107820", size = 513543, upload-time = "2026-02-02T12:36:28.217Z" }, + { url = "https://files.pythonhosted.org/packages/65/50/ecbd258181c4313cf79bca6c88fb63207d04d5bf5e4f65174114d072aa55/jiter-0.13.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:aed40e099404721d7fcaf5b89bd3b4568a4666358bcac7b6b15c09fb6252ab68", size = 547262, upload-time = "2026-02-02T12:36:29.678Z" }, + { url = "https://files.pythonhosted.org/packages/27/da/68f38d12e7111d2016cd198161b36e1f042bd115c169255bcb7ec823a3bf/jiter-0.13.0-cp313-cp313-win32.whl", hash = "sha256:36ebfbcffafb146d0e6ffb3e74d51e03d9c35ce7c625c8066cdbfc7b953bdc72", size = 200630, upload-time = "2026-02-02T12:36:31.808Z" }, + { url = "https://files.pythonhosted.org/packages/25/65/3bd1a972c9a08ecd22eb3b08a95d1941ebe6938aea620c246cf426ae09c2/jiter-0.13.0-cp313-cp313-win_amd64.whl", hash = "sha256:8d76029f077379374cf0dbc78dbe45b38dec4a2eb78b08b5194ce836b2517afc", size = 202602, upload-time = "2026-02-02T12:36:33.679Z" }, + { url = "https://files.pythonhosted.org/packages/15/fe/13bd3678a311aa67686bb303654792c48206a112068f8b0b21426eb6851e/jiter-0.13.0-cp313-cp313-win_arm64.whl", hash = "sha256:bb7613e1a427cfcb6ea4544f9ac566b93d5bf67e0d48c787eca673ff9c9dff2b", size = 185939, upload-time = "2026-02-02T12:36:35.065Z" }, + { url = "https://files.pythonhosted.org/packages/49/19/a929ec002ad3228bc97ca01dbb14f7632fffdc84a95ec92ceaf4145688ae/jiter-0.13.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:fa476ab5dd49f3bf3a168e05f89358c75a17608dbabb080ef65f96b27c19ab10", size = 316616, upload-time = "2026-02-02T12:36:36.579Z" }, + { url = "https://files.pythonhosted.org/packages/52/56/d19a9a194afa37c1728831e5fb81b7722c3de18a3109e8f282bfc23e587a/jiter-0.13.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ade8cb6ff5632a62b7dbd4757d8c5573f7a2e9ae285d6b5b841707d8363205ef", size = 346850, upload-time = "2026-02-02T12:36:38.058Z" }, + { url = "https://files.pythonhosted.org/packages/36/4a/94e831c6bf287754a8a019cb966ed39ff8be6ab78cadecf08df3bb02d505/jiter-0.13.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9950290340acc1adaded363edd94baebcee7dabdfa8bee4790794cd5cfad2af6", size = 358551, upload-time = "2026-02-02T12:36:39.417Z" }, + { url = "https://files.pythonhosted.org/packages/a2/ec/a4c72c822695fa80e55d2b4142b73f0012035d9fcf90eccc56bc060db37c/jiter-0.13.0-cp313-cp313t-win_amd64.whl", hash = "sha256:2b4972c6df33731aac0742b64fd0d18e0a69bc7d6e03108ce7d40c85fd9e3e6d", size = 201950, upload-time = "2026-02-02T12:36:40.791Z" }, + { url = "https://files.pythonhosted.org/packages/b6/00/393553ec27b824fbc29047e9c7cd4a3951d7fbe4a76743f17e44034fa4e4/jiter-0.13.0-cp313-cp313t-win_arm64.whl", hash = "sha256:701a1e77d1e593c1b435315ff625fd071f0998c5f02792038a5ca98899261b7d", size = 185852, upload-time = "2026-02-02T12:36:42.077Z" }, + { url = "https://files.pythonhosted.org/packages/6e/f5/f1997e987211f6f9bd71b8083047b316208b4aca0b529bb5f8c96c89ef3e/jiter-0.13.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:cc5223ab19fe25e2f0bf2643204ad7318896fe3729bf12fde41b77bfc4fafff0", size = 308804, upload-time = "2026-02-02T12:36:43.496Z" }, + { url = "https://files.pythonhosted.org/packages/cd/8f/5482a7677731fd44881f0204981ce2d7175db271f82cba2085dd2212e095/jiter-0.13.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:9776ebe51713acf438fd9b4405fcd86893ae5d03487546dae7f34993217f8a91", size = 318787, upload-time = "2026-02-02T12:36:45.071Z" }, + { url = "https://files.pythonhosted.org/packages/f3/b9/7257ac59778f1cd025b26a23c5520a36a424f7f1b068f2442a5b499b7464/jiter-0.13.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:879e768938e7b49b5e90b7e3fecc0dbec01b8cb89595861fb39a8967c5220d09", size = 353880, upload-time = "2026-02-02T12:36:47.365Z" }, + { url = "https://files.pythonhosted.org/packages/c3/87/719eec4a3f0841dad99e3d3604ee4cba36af4419a76f3cb0b8e2e691ad67/jiter-0.13.0-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:682161a67adea11e3aae9038c06c8b4a9a71023228767477d683f69903ebc607", size = 366702, upload-time = "2026-02-02T12:36:48.871Z" }, + { url = "https://files.pythonhosted.org/packages/d2/65/415f0a75cf6921e43365a1bc227c565cb949caca8b7532776e430cbaa530/jiter-0.13.0-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:a13b68cd1cd8cc9de8f244ebae18ccb3e4067ad205220ef324c39181e23bbf66", size = 486319, upload-time = "2026-02-02T12:36:53.006Z" }, + { url = "https://files.pythonhosted.org/packages/54/a2/9e12b48e82c6bbc6081fd81abf915e1443add1b13d8fc586e1d90bb02bb8/jiter-0.13.0-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:87ce0f14c6c08892b610686ae8be350bf368467b6acd5085a5b65441e2bf36d2", size = 372289, upload-time = "2026-02-02T12:36:54.593Z" }, + { url = "https://files.pythonhosted.org/packages/4e/c1/e4693f107a1789a239c759a432e9afc592366f04e901470c2af89cfd28e1/jiter-0.13.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0c365005b05505a90d1c47856420980d0237adf82f70c4aff7aebd3c1cc143ad", size = 360165, upload-time = "2026-02-02T12:36:56.112Z" }, + { url = "https://files.pythonhosted.org/packages/17/08/91b9ea976c1c758240614bd88442681a87672eebc3d9a6dde476874e706b/jiter-0.13.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1317fdffd16f5873e46ce27d0e0f7f4f90f0cdf1d86bf6abeaea9f63ca2c401d", size = 389634, upload-time = "2026-02-02T12:36:57.495Z" }, + { url = "https://files.pythonhosted.org/packages/18/23/58325ef99390d6d40427ed6005bf1ad54f2577866594bcf13ce55675f87d/jiter-0.13.0-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:c05b450d37ba0c9e21c77fef1f205f56bcee2330bddca68d344baebfc55ae0df", size = 514933, upload-time = "2026-02-02T12:36:58.909Z" }, + { url = "https://files.pythonhosted.org/packages/5b/25/69f1120c7c395fd276c3996bb8adefa9c6b84c12bb7111e5c6ccdcd8526d/jiter-0.13.0-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:775e10de3849d0631a97c603f996f518159272db00fdda0a780f81752255ee9d", size = 548842, upload-time = "2026-02-02T12:37:00.433Z" }, + { url = "https://files.pythonhosted.org/packages/18/05/981c9669d86850c5fbb0d9e62bba144787f9fba84546ba43d624ee27ef29/jiter-0.13.0-cp314-cp314-win32.whl", hash = "sha256:632bf7c1d28421c00dd8bbb8a3bac5663e1f57d5cd5ed962bce3c73bf62608e6", size = 202108, upload-time = "2026-02-02T12:37:01.718Z" }, + { url = "https://files.pythonhosted.org/packages/8d/96/cdcf54dd0b0341db7d25413229888a346c7130bd20820530905fdb65727b/jiter-0.13.0-cp314-cp314-win_amd64.whl", hash = "sha256:f22ef501c3f87ede88f23f9b11e608581c14f04db59b6a801f354397ae13739f", size = 204027, upload-time = "2026-02-02T12:37:03.075Z" }, + { url = "https://files.pythonhosted.org/packages/fb/f9/724bcaaab7a3cd727031fe4f6995cb86c4bd344909177c186699c8dec51a/jiter-0.13.0-cp314-cp314-win_arm64.whl", hash = "sha256:07b75fe09a4ee8e0c606200622e571e44943f47254f95e2436c8bdcaceb36d7d", size = 187199, upload-time = "2026-02-02T12:37:04.414Z" }, + { url = "https://files.pythonhosted.org/packages/62/92/1661d8b9fd6a3d7a2d89831db26fe3c1509a287d83ad7838831c7b7a5c7e/jiter-0.13.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:964538479359059a35fb400e769295d4b315ae61e4105396d355a12f7fef09f0", size = 318423, upload-time = "2026-02-02T12:37:05.806Z" }, + { url = "https://files.pythonhosted.org/packages/4f/3b/f77d342a54d4ebcd128e520fc58ec2f5b30a423b0fd26acdfc0c6fef8e26/jiter-0.13.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e104da1db1c0991b3eaed391ccd650ae8d947eab1480c733e5a3fb28d4313e40", size = 351438, upload-time = "2026-02-02T12:37:07.189Z" }, + { url = "https://files.pythonhosted.org/packages/76/b3/ba9a69f0e4209bd3331470c723c2f5509e6f0482e416b612431a5061ed71/jiter-0.13.0-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:0e3a5f0cde8ff433b8e88e41aa40131455420fb3649a3c7abdda6145f8cb7202", size = 364774, upload-time = "2026-02-02T12:37:08.579Z" }, + { url = "https://files.pythonhosted.org/packages/b3/16/6cdb31fa342932602458dbb631bfbd47f601e03d2e4950740e0b2100b570/jiter-0.13.0-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:57aab48f40be1db920a582b30b116fe2435d184f77f0e4226f546794cedd9cf0", size = 487238, upload-time = "2026-02-02T12:37:10.066Z" }, + { url = "https://files.pythonhosted.org/packages/ed/b1/956cc7abaca8d95c13aa8d6c9b3f3797241c246cd6e792934cc4c8b250d2/jiter-0.13.0-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:7772115877c53f62beeb8fd853cab692dbc04374ef623b30f997959a4c0e7e95", size = 372892, upload-time = "2026-02-02T12:37:11.656Z" }, + { url = "https://files.pythonhosted.org/packages/26/c4/97ecde8b1e74f67b8598c57c6fccf6df86ea7861ed29da84629cdbba76c4/jiter-0.13.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:1211427574b17b633cfceba5040de8081e5abf114f7a7602f73d2e16f9fdaa59", size = 360309, upload-time = "2026-02-02T12:37:13.244Z" }, + { url = "https://files.pythonhosted.org/packages/4b/d7/eabe3cf46715854ccc80be2cd78dd4c36aedeb30751dbf85a1d08c14373c/jiter-0.13.0-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:7beae3a3d3b5212d3a55d2961db3c292e02e302feb43fce6a3f7a31b90ea6dfe", size = 389607, upload-time = "2026-02-02T12:37:14.881Z" }, + { url = "https://files.pythonhosted.org/packages/df/2d/03963fc0804e6109b82decfb9974eb92df3797fe7222428cae12f8ccaa0c/jiter-0.13.0-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:e5562a0f0e90a6223b704163ea28e831bd3a9faa3512a711f031611e6b06c939", size = 514986, upload-time = "2026-02-02T12:37:16.326Z" }, + { url = "https://files.pythonhosted.org/packages/f6/6c/8c83b45eb3eb1c1e18d841fe30b4b5bc5619d781267ca9bc03e005d8fd0a/jiter-0.13.0-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:6c26a424569a59140fb51160a56df13f438a2b0967365e987889186d5fc2f6f9", size = 548756, upload-time = "2026-02-02T12:37:17.736Z" }, + { url = "https://files.pythonhosted.org/packages/47/66/eea81dfff765ed66c68fd2ed8c96245109e13c896c2a5015c7839c92367e/jiter-0.13.0-cp314-cp314t-win32.whl", hash = "sha256:24dc96eca9f84da4131cdf87a95e6ce36765c3b156fc9ae33280873b1c32d5f6", size = 201196, upload-time = "2026-02-02T12:37:19.101Z" }, + { url = "https://files.pythonhosted.org/packages/ff/32/4ac9c7a76402f8f00d00842a7f6b83b284d0cf7c1e9d4227bc95aa6d17fa/jiter-0.13.0-cp314-cp314t-win_amd64.whl", hash = "sha256:0a8d76c7524087272c8ae913f5d9d608bd839154b62c4322ef65723d2e5bb0b8", size = 204215, upload-time = "2026-02-02T12:37:20.495Z" }, + { url = "https://files.pythonhosted.org/packages/f9/8e/7def204fea9f9be8b3c21a6f2dd6c020cf56c7d5ff753e0e23ed7f9ea57e/jiter-0.13.0-cp314-cp314t-win_arm64.whl", hash = "sha256:2c26cf47e2cad140fa23b6d58d435a7c0161f5c514284802f25e87fddfe11024", size = 187152, upload-time = "2026-02-02T12:37:22.124Z" }, + { url = "https://files.pythonhosted.org/packages/80/60/e50fa45dd7e2eae049f0ce964663849e897300433921198aef94b6ffa23a/jiter-0.13.0-graalpy312-graalpy250_312_native-macosx_10_12_x86_64.whl", hash = "sha256:3d744a6061afba08dd7ae375dcde870cffb14429b7477e10f67e9e6d68772a0a", size = 305169, upload-time = "2026-02-02T12:37:50.376Z" }, + { url = "https://files.pythonhosted.org/packages/d2/73/a009f41c5eed71c49bec53036c4b33555afcdee70682a18c6f66e396c039/jiter-0.13.0-graalpy312-graalpy250_312_native-macosx_11_0_arm64.whl", hash = "sha256:ff732bd0a0e778f43d5009840f20b935e79087b4dc65bd36f1cd0f9b04b8ff7f", size = 303808, upload-time = "2026-02-02T12:37:52.092Z" }, + { url = "https://files.pythonhosted.org/packages/c4/10/528b439290763bff3d939268085d03382471b442f212dca4ff5f12802d43/jiter-0.13.0-graalpy312-graalpy250_312_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ab44b178f7981fcaea7e0a5df20e773c663d06ffda0198f1a524e91b2fde7e59", size = 337384, upload-time = "2026-02-02T12:37:53.582Z" }, + { url = "https://files.pythonhosted.org/packages/67/8a/a342b2f0251f3dac4ca17618265d93bf244a2a4d089126e81e4c1056ac50/jiter-0.13.0-graalpy312-graalpy250_312_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7bb00b6d26db67a05fe3e12c76edc75f32077fb51deed13822dc648fa373bc19", size = 343768, upload-time = "2026-02-02T12:37:55.055Z" }, +] + +[[package]] +name = "jsonpatch" +version = "1.33" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "jsonpointer" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/42/78/18813351fe5d63acad16aec57f94ec2b70a09e53ca98145589e185423873/jsonpatch-1.33.tar.gz", hash = "sha256:9fcd4009c41e6d12348b4a0ff2563ba56a2923a7dfee731d004e212e1ee5030c", size = 21699, upload-time = "2023-06-26T12:07:29.144Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/73/07/02e16ed01e04a374e644b575638ec7987ae846d25ad97bcc9945a3ee4b0e/jsonpatch-1.33-py2.py3-none-any.whl", hash = "sha256:0ae28c0cd062bbd8b8ecc26d7d164fbbea9652a1a3693f3b956c1eae5145dade", size = 12898, upload-time = "2023-06-16T21:01:28.466Z" }, +] + +[[package]] +name = "jsonpointer" +version = "3.1.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/18/c7/af399a2e7a67fd18d63c40c5e62d3af4e67b836a2107468b6a5ea24c4304/jsonpointer-3.1.1.tar.gz", hash = "sha256:0b801c7db33a904024f6004d526dcc53bbb8a4a0f4e32bfd10beadf60adf1900", size = 9068, upload-time = "2026-03-23T22:32:32.458Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/9e/6a/a83720e953b1682d2d109d3c2dbb0bc9bf28cc1cbc205be4ef4be5da709d/jsonpointer-3.1.1-py3-none-any.whl", hash = "sha256:8ff8b95779d071ba472cf5bc913028df06031797532f08a7d5b602d8b2a488ca", size = 7659, upload-time = "2026-03-23T22:32:31.568Z" }, +] + +[[package]] +name = "langchain-core" +version = "1.2.28" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "jsonpatch" }, + { name = "langsmith" }, + { name = "packaging" }, + { name = "pydantic" }, + { name = "pyyaml" }, + { name = "tenacity" }, + { name = "typing-extensions" }, + { name = "uuid-utils" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/f8/a4/317a1a3ac1df33a64adb3670bf88bbe3b3d5baa274db6863a979db472897/langchain_core-1.2.28.tar.gz", hash = "sha256:271a3d8bd618f795fdeba112b0753980457fc90537c46a0c11998516a74dc2cb", size = 846119, upload-time = "2026-04-08T18:19:34.867Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a8/92/32f785f077c7e898da97064f113c73fbd9ad55d1e2169cf3a391b183dedb/langchain_core-1.2.28-py3-none-any.whl", hash = "sha256:80764232581eaf8057bcefa71dbf8adc1f6a28d257ebd8b95ba9b8b452e8c6ac", size = 508727, upload-time = "2026-04-08T18:19:32.823Z" }, +] + +[[package]] +name = "langchain-openai" +version = "1.1.12" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "langchain-core" }, + { name = "openai" }, + { name = "tiktoken" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/cc/fd/7dee16e882c4c1577d48db174d85aa3a0ee09ba61eb6a5d41650285ca80c/langchain_openai-1.1.12.tar.gz", hash = "sha256:ccf5ef02c896f6807b4d0e51aaf678a72ce81ae41201cae8d65e11eeff9ecb79", size = 1114119, upload-time = "2026-03-23T18:59:19.211Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/6e/a6/68fb22e3604015e6f546fa1d3677d24378b482855ae74710cbf4aec44132/langchain_openai-1.1.12-py3-none-any.whl", hash = "sha256:da71ca3f2d18c16f7a2443cc306aa195ad2a07054335ac9b0626dcae02b6a0c5", size = 88487, upload-time = "2026-03-23T18:59:17.978Z" }, +] + +[[package]] +name = "langdetect" +version = "1.0.9" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "six" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/0e/72/a3add0e4eec4eb9e2569554f7c70f4a3c27712f40e3284d483e88094cc0e/langdetect-1.0.9.tar.gz", hash = "sha256:cbc1fef89f8d062739774bd51eda3da3274006b3661d199c2655f6b3f6d605a0", size = 981474, upload-time = "2021-05-07T07:54:13.562Z" } + +[[package]] +name = "langsmith" +version = "0.7.29" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "httpx" }, + { name = "orjson", marker = "platform_python_implementation != 'PyPy'" }, + { name = "packaging" }, + { name = "pydantic" }, + { name = "requests" }, + { name = "requests-toolbelt" }, + { name = "uuid-utils" }, + { name = "xxhash" }, + { name = "zstandard" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/eb/b3/b9b2218483400c9c0f84ea781ec4fc92a9afb51c3f16d2b6369356990d47/langsmith-0.7.29.tar.gz", hash = "sha256:bcec464be00b35cdf0ed0087ef9b1f40889fe1017066f11136a02aa0276cedf5", size = 1094512, upload-time = "2026-04-09T03:17:12.961Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/aa/11/8189be47b5d5a64ecd7e19c81ad3fd9cd9f0bf6778abc5ff177db90ebb3d/langsmith-0.7.29-py3-none-any.whl", hash = "sha256:ec61cdca1f2e2add48742f97a4ee1d6894c968ef3d5a50122289dac56170978c", size = 367655, upload-time = "2026-04-09T03:17:10.944Z" }, +] + +[[package]] +name = "openai" +version = "2.31.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "anyio" }, + { name = "distro" }, + { name = "httpx" }, + { name = "jiter" }, + { name = "pydantic" }, + { name = "sniffio" }, + { name = "tqdm" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/94/fe/64b3d035780b3188f86c4f6f1bc202e7bb74757ef028802112273b9dcacf/openai-2.31.0.tar.gz", hash = "sha256:43ca59a88fc973ad1848d86b98d7fac207e265ebbd1828b5e4bdfc85f79427a5", size = 684772, upload-time = "2026-04-08T21:01:41.797Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/66/bc/a8f7c3aa03452fedbb9af8be83e959adba96a6b4a35e416faffcc959c568/openai-2.31.0-py3-none-any.whl", hash = "sha256:44e1344d87e56a493d649b17e2fac519d1368cbb0745f59f1957c4c26de50a0a", size = 1153479, upload-time = "2026-04-08T21:01:39.217Z" }, +] + +[[package]] +name = "orjson" +version = "3.11.8" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/9d/1b/2024d06792d0779f9dbc51531b61c24f76c75b9f4ce05e6f3377a1814cea/orjson-3.11.8.tar.gz", hash = "sha256:96163d9cdc5a202703e9ad1b9ae757d5f0ca62f4fa0cc93d1f27b0e180cc404e", size = 5603832, upload-time = "2026-03-31T16:16:27.878Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/01/f6/8d58b32ab32d9215973a1688aebd098252ee8af1766c0e4e36e7831f0295/orjson-3.11.8-cp312-cp312-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:1cd0b77e77c95758f8e1100139844e99f3ccc87e71e6fc8e1c027e55807c549f", size = 229233, upload-time = "2026-03-31T16:15:12.762Z" }, + { url = "https://files.pythonhosted.org/packages/a9/8b/2ffe35e71f6b92622e8ea4607bf33ecf7dfb51b3619dcfabfd36cbe2d0a5/orjson-3.11.8-cp312-cp312-macosx_15_0_arm64.whl", hash = "sha256:6a3d159d5ffa0e3961f353c4b036540996bf8b9697ccc38261c0eac1fd3347a6", size = 128772, upload-time = "2026-03-31T16:15:14.237Z" }, + { url = "https://files.pythonhosted.org/packages/27/d2/1f8682ae50d5c6897a563cb96bc106da8c9cb5b7b6e81a52e4cc086679b9/orjson-3.11.8-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:76070a76e9c5ae661e2d9848f216980d8d533e0f8143e6ed462807b242e3c5e8", size = 131946, upload-time = "2026-03-31T16:15:15.607Z" }, + { url = "https://files.pythonhosted.org/packages/52/4b/5500f76f0eece84226e0689cb48dcde081104c2fa6e2483d17ca13685ffb/orjson-3.11.8-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:54153d21520a71a4c82a0dbb4523e468941d549d221dc173de0f019678cf3813", size = 130368, upload-time = "2026-03-31T16:15:17.066Z" }, + { url = "https://files.pythonhosted.org/packages/da/4e/58b927e08fbe9840e6c920d9e299b051ea667463b1f39a56e668669f8508/orjson-3.11.8-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:469ac2125611b7c5741a0b3798cd9e5786cbad6345f9f400c77212be89563bec", size = 135540, upload-time = "2026-03-31T16:15:18.404Z" }, + { url = "https://files.pythonhosted.org/packages/56/7c/ba7cb871cba1bcd5cd02ee34f98d894c6cea96353ad87466e5aef2429c60/orjson-3.11.8-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:14778ffd0f6896aa613951a7fbf4690229aa7a543cb2bfbe9f358e08aafa9546", size = 146877, upload-time = "2026-03-31T16:15:19.833Z" }, + { url = "https://files.pythonhosted.org/packages/0b/5d/eb9c25fc1386696c6a342cd361c306452c75e0b55e86ad602dd4827a7fd7/orjson-3.11.8-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:ea56a955056a6d6c550cf18b3348656a9d9a4f02e2d0c02cabf3c73f1055d506", size = 132837, upload-time = "2026-03-31T16:15:21.282Z" }, + { url = "https://files.pythonhosted.org/packages/37/87/5ddeb7fc1fbd9004aeccab08426f34c81a5b4c25c7061281862b015fce2b/orjson-3.11.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:53a0f57e59a530d18a142f4d4ba6dfc708dc5fdedce45e98ff06b44930a2a48f", size = 133624, upload-time = "2026-03-31T16:15:22.641Z" }, + { url = "https://files.pythonhosted.org/packages/22/09/90048793db94ee4b2fcec4ac8e5ddb077367637d6650be896b3494b79bb7/orjson-3.11.8-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:9b48e274f8824567d74e2158199e269597edf00823a1b12b63d48462bbf5123e", size = 141904, upload-time = "2026-03-31T16:15:24.435Z" }, + { url = "https://files.pythonhosted.org/packages/c0/cf/eb284847487821a5d415e54149a6449ba9bfc5872ce63ab7be41b8ec401c/orjson-3.11.8-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:3f262401086a3960586af06c054609365e98407151f5ea24a62893a40d80dbbb", size = 423742, upload-time = "2026-03-31T16:15:26.155Z" }, + { url = "https://files.pythonhosted.org/packages/44/09/e12423d327071c851c13e76936f144a96adacfc037394dec35ac3fc8d1e8/orjson-3.11.8-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:8e8c6218b614badf8e229b697865df4301afa74b791b6c9ade01d19a9953a942", size = 147806, upload-time = "2026-03-31T16:15:27.909Z" }, + { url = "https://files.pythonhosted.org/packages/b3/6d/37c2589ba864e582ffe7611643314785c6afb1f83c701654ef05daa8fcc7/orjson-3.11.8-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:093d489fa039ddade2db541097dbb484999fcc65fc2b0ff9819141e2ab364f25", size = 136485, upload-time = "2026-03-31T16:15:29.749Z" }, + { url = "https://files.pythonhosted.org/packages/be/c9/135194a02ab76b04ed9a10f68624b7ebd238bbe55548878b11ff15a0f352/orjson-3.11.8-cp312-cp312-win32.whl", hash = "sha256:e0950ed1bcb9893f4293fd5c5a7ee10934fbf82c4101c70be360db23ce24b7d2", size = 131966, upload-time = "2026-03-31T16:15:31.687Z" }, + { url = "https://files.pythonhosted.org/packages/ed/9a/9796f8fbe3cf30ce9cb696748dbb535e5c87be4bf4fe2e9ca498ef1fa8cf/orjson-3.11.8-cp312-cp312-win_amd64.whl", hash = "sha256:3cf17c141617b88ced4536b2135c552490f07799f6ad565948ea07bef0dcb9a6", size = 127441, upload-time = "2026-03-31T16:15:33.333Z" }, + { url = "https://files.pythonhosted.org/packages/cc/47/5aaf54524a7a4a0dd09dd778f3fa65dd2108290615b652e23d944152bc8e/orjson-3.11.8-cp312-cp312-win_arm64.whl", hash = "sha256:48854463b0572cc87dac7d981aa72ed8bf6deedc0511853dc76b8bbd5482d36d", size = 127364, upload-time = "2026-03-31T16:15:34.748Z" }, + { url = "https://files.pythonhosted.org/packages/66/7f/95fba509bb2305fab0073558f1e8c3a2ec4b2afe58ed9fcb7d3b8beafe94/orjson-3.11.8-cp313-cp313-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:3f23426851d98478c8970da5991f84784a76682213cd50eb73a1da56b95239dc", size = 229180, upload-time = "2026-03-31T16:15:36.426Z" }, + { url = "https://files.pythonhosted.org/packages/f6/9d/b237215c743ca073697d759b5503abd2cb8a0d7b9c9e21f524bcf176ab66/orjson-3.11.8-cp313-cp313-macosx_15_0_arm64.whl", hash = "sha256:ebaed4cef74a045b83e23537b52ef19a367c7e3f536751e355a2a394f8648559", size = 128754, upload-time = "2026-03-31T16:15:38.049Z" }, + { url = "https://files.pythonhosted.org/packages/42/3d/27d65b6d11e63f133781425f132807aef793ed25075fec686fc8e46dd528/orjson-3.11.8-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:97c8f5d3b62380b70c36ffacb2a356b7c6becec86099b177f73851ba095ef623", size = 131877, upload-time = "2026-03-31T16:15:39.484Z" }, + { url = "https://files.pythonhosted.org/packages/dd/cc/faee30cd8f00421999e40ef0eba7332e3a625ce91a58200a2f52c7fef235/orjson-3.11.8-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:436c4922968a619fb7fef1ccd4b8b3a76c13b67d607073914d675026e911a65c", size = 130361, upload-time = "2026-03-31T16:15:41.274Z" }, + { url = "https://files.pythonhosted.org/packages/5c/bb/a6c55896197f97b6d4b4e7c7fd77e7235517c34f5d6ad5aadd43c54c6d7c/orjson-3.11.8-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:1ab359aff0436d80bfe8a23b46b5fea69f1e18aaf1760a709b4787f1318b317f", size = 135521, upload-time = "2026-03-31T16:15:42.758Z" }, + { url = "https://files.pythonhosted.org/packages/9c/7c/ca3a3525aa32ff636ebb1778e77e3587b016ab2edb1b618b36ba96f8f2c0/orjson-3.11.8-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f89b6d0b3a8d81e1929d3ab3d92bbc225688bd80a770c49432543928fe09ac55", size = 146862, upload-time = "2026-03-31T16:15:44.341Z" }, + { url = "https://files.pythonhosted.org/packages/3c/0c/18a9d7f18b5edd37344d1fd5be17e94dc652c67826ab749c6e5948a78112/orjson-3.11.8-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:29c009e7a2ca9ad0ed1376ce20dd692146a5d9fe4310848904b6b4fee5c5c137", size = 132847, upload-time = "2026-03-31T16:15:46.368Z" }, + { url = "https://files.pythonhosted.org/packages/23/91/7e722f352ad67ca573cee44de2a58fb810d0f4eb4e33276c6a557979fd8a/orjson-3.11.8-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:705b895b781b3e395c067129d8551655642dfe9437273211d5404e87ac752b53", size = 133637, upload-time = "2026-03-31T16:15:48.123Z" }, + { url = "https://files.pythonhosted.org/packages/af/04/32845ce13ac5bd1046ddb02ac9432ba856cc35f6d74dde95864fe0ad5523/orjson-3.11.8-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:88006eda83858a9fdf73985ce3804e885c2befb2f506c9a3723cdeb5a2880e3e", size = 141906, upload-time = "2026-03-31T16:15:49.626Z" }, + { url = "https://files.pythonhosted.org/packages/02/5e/c551387ddf2d7106d9039369862245c85738b828844d13b99ccb8d61fd06/orjson-3.11.8-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:55120759e61309af7fcf9e961c6f6af3dde5921cdb3ee863ef63fd9db126cae6", size = 423722, upload-time = "2026-03-31T16:15:51.176Z" }, + { url = "https://files.pythonhosted.org/packages/00/a3/ecfe62434096f8a794d4976728cb59bcfc4a643977f21c2040545d37eb4c/orjson-3.11.8-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:98bdc6cb889d19bed01de46e67574a2eab61f5cc6b768ed50e8ac68e9d6ffab6", size = 147801, upload-time = "2026-03-31T16:15:52.939Z" }, + { url = "https://files.pythonhosted.org/packages/18/6d/0dce10b9f6643fdc59d99333871a38fa5a769d8e2fc34a18e5d2bfdee900/orjson-3.11.8-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:708c95f925a43ab9f34625e45dcdadf09ec8a6e7b664a938f2f8d5650f6c090b", size = 136460, upload-time = "2026-03-31T16:15:54.431Z" }, + { url = "https://files.pythonhosted.org/packages/01/d6/6dde4f31842d87099238f1f07b459d24edc1a774d20687187443ab044191/orjson-3.11.8-cp313-cp313-win32.whl", hash = "sha256:01c4e5a6695dc09098f2e6468a251bc4671c50922d4d745aff1a0a33a0cf5b8d", size = 131956, upload-time = "2026-03-31T16:15:56.081Z" }, + { url = "https://files.pythonhosted.org/packages/c1/f9/4e494a56e013db957fb77186b818b916d4695b8fa2aa612364974160e91b/orjson-3.11.8-cp313-cp313-win_amd64.whl", hash = "sha256:c154a35dd1330707450bb4d4e7dd1f17fa6f42267a40c1e8a1daa5e13719b4b8", size = 127410, upload-time = "2026-03-31T16:15:57.54Z" }, + { url = "https://files.pythonhosted.org/packages/57/7f/803203d00d6edb6e9e7eef421d4e1adbb5ea973e40b3533f3cfd9aeb374e/orjson-3.11.8-cp313-cp313-win_arm64.whl", hash = "sha256:4861bde57f4d253ab041e374f44023460e60e71efaa121f3c5f0ed457c3a701e", size = 127338, upload-time = "2026-03-31T16:15:59.106Z" }, + { url = "https://files.pythonhosted.org/packages/6d/35/b01910c3d6b85dc882442afe5060cbf719c7d1fc85749294beda23d17873/orjson-3.11.8-cp314-cp314-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:ec795530a73c269a55130498842aaa762e4a939f6ce481a7e986eeaa790e9da4", size = 229171, upload-time = "2026-03-31T16:16:00.651Z" }, + { url = "https://files.pythonhosted.org/packages/c2/56/c9ec97bd11240abef39b9e5d99a15462809c45f677420fd148a6c5e6295e/orjson-3.11.8-cp314-cp314-macosx_15_0_arm64.whl", hash = "sha256:c492a0e011c0f9066e9ceaa896fbc5b068c54d365fea5f3444b697ee01bc8625", size = 128746, upload-time = "2026-03-31T16:16:02.673Z" }, + { url = "https://files.pythonhosted.org/packages/3b/e4/66d4f30a90de45e2f0cbd9623588e8ae71eef7679dbe2ae954ed6d66a41f/orjson-3.11.8-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:883206d55b1bd5f5679ad5e6ddd3d1a5e3cac5190482927fdb8c78fb699193b5", size = 131867, upload-time = "2026-03-31T16:16:04.342Z" }, + { url = "https://files.pythonhosted.org/packages/19/30/2a645fc9286b928675e43fa2a3a16fb7b6764aa78cc719dc82141e00f30b/orjson-3.11.8-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:5774c1fdcc98b2259800b683b19599c133baeb11d60033e2095fd9d4667b82db", size = 124664, upload-time = "2026-03-31T16:16:05.837Z" }, + { url = "https://files.pythonhosted.org/packages/db/44/77b9a86d84a28d52ba3316d77737f6514e17118119ade3f91b639e859029/orjson-3.11.8-cp314-cp314-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:8ac7381c83dd3d4a6347e6635950aa448f54e7b8406a27c7ecb4a37e9f1ae08b", size = 129701, upload-time = "2026-03-31T16:16:07.407Z" }, + { url = "https://files.pythonhosted.org/packages/b3/ea/eff3d9bfe47e9bc6969c9181c58d9f71237f923f9c86a2d2f490cd898c82/orjson-3.11.8-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:14439063aebcb92401c11afc68ee4e407258d2752e62d748b6942dad20d2a70d", size = 141202, upload-time = "2026-03-31T16:16:09.48Z" }, + { url = "https://files.pythonhosted.org/packages/52/c8/90d4b4c60c84d62068d0cf9e4d8f0a4e05e76971d133ac0c60d818d4db20/orjson-3.11.8-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:fa72e71977bff96567b0f500fc5bfd2fdf915f34052c782a4c6ebbdaa97aa858", size = 127194, upload-time = "2026-03-31T16:16:11.02Z" }, + { url = "https://files.pythonhosted.org/packages/8d/c7/ea9e08d1f0ba981adffb629811148b44774d935171e7b3d780ae43c4c254/orjson-3.11.8-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7679bc2f01bb0d219758f1a5f87bb7c8a81c0a186824a393b366876b4948e14f", size = 133639, upload-time = "2026-03-31T16:16:13.434Z" }, + { url = "https://files.pythonhosted.org/packages/6c/8c/ddbbfd6ba59453c8fc7fe1d0e5983895864e264c37481b2a791db635f046/orjson-3.11.8-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:14f7b8fcb35ef403b42fa5ecfa4ed032332a91f3dc7368fbce4184d59e1eae0d", size = 141914, upload-time = "2026-03-31T16:16:14.955Z" }, + { url = "https://files.pythonhosted.org/packages/4e/31/dbfbefec9df060d34ef4962cd0afcb6fa7a9ec65884cb78f04a7859526c3/orjson-3.11.8-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:c2bdf7b2facc80b5e34f48a2d557727d5c5c57a8a450de122ae81fa26a81c1bc", size = 423800, upload-time = "2026-03-31T16:16:16.594Z" }, + { url = "https://files.pythonhosted.org/packages/87/cf/f74e9ae9803d4ab46b163494adba636c6d7ea955af5cc23b8aaa94cfd528/orjson-3.11.8-cp314-cp314-musllinux_1_2_i686.whl", hash = "sha256:ccd7ba1b0605813a0715171d39ec4c314cb97a9c85893c2c5c0c3a3729df38bf", size = 147837, upload-time = "2026-03-31T16:16:18.585Z" }, + { url = "https://files.pythonhosted.org/packages/64/e6/9214f017b5db85e84e68602792f742e5dc5249e963503d1b356bee611e01/orjson-3.11.8-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:cdbc8c9c02463fef4d3c53a9ba3336d05496ec8e1f1c53326a1e4acc11f5c600", size = 136441, upload-time = "2026-03-31T16:16:20.151Z" }, + { url = "https://files.pythonhosted.org/packages/24/dd/3590348818f58f837a75fb969b04cdf187ae197e14d60b5e5a794a38b79d/orjson-3.11.8-cp314-cp314-win32.whl", hash = "sha256:0b57f67710a8cd459e4e54eb96d5f77f3624eba0c661ba19a525807e42eccade", size = 131983, upload-time = "2026-03-31T16:16:21.823Z" }, + { url = "https://files.pythonhosted.org/packages/3f/0f/b6cb692116e05d058f31ceee819c70f097fa9167c82f67fabe7516289abc/orjson-3.11.8-cp314-cp314-win_amd64.whl", hash = "sha256:735e2262363dcbe05c35e3a8869898022af78f89dde9e256924dc02e99fe69ca", size = 127396, upload-time = "2026-03-31T16:16:23.685Z" }, + { url = "https://files.pythonhosted.org/packages/c0/d1/facb5b5051fabb0ef9d26c6544d87ef19a939a9a001198655d0d891062dd/orjson-3.11.8-cp314-cp314-win_arm64.whl", hash = "sha256:6ccdea2c213cf9f3d9490cbd5d427693c870753df41e6cb375bd79bcbafc8817", size = 127330, upload-time = "2026-03-31T16:16:25.496Z" }, +] + +[[package]] +name = "packaging" +version = "26.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/65/ee/299d360cdc32edc7d2cf530f3accf79c4fca01e96ffc950d8a52213bd8e4/packaging-26.0.tar.gz", hash = "sha256:00243ae351a257117b6a241061796684b084ed1c516a08c48a3f7e147a9d80b4", size = 143416, upload-time = "2026-01-21T20:50:39.064Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl", hash = "sha256:b36f1fef9334a5588b4166f8bcd26a14e521f2b55e6b9de3aaa80d3ff7a37529", size = 74366, upload-time = "2026-01-21T20:50:37.788Z" }, +] + +[[package]] +name = "prompt-eval" +version = "0.1.0" +source = { virtual = "." } +dependencies = [ + { name = "langchain-openai" }, + { name = "langdetect" }, + { name = "openai" }, + { name = "python-dotenv" }, + { name = "tqdm" }, +] + +[package.metadata] +requires-dist = [ + { name = "langchain-openai", specifier = ">=1.1.12" }, + { name = "langdetect", specifier = ">=1.0.9" }, + { name = "openai", specifier = ">=2.31.0" }, + { name = "python-dotenv", specifier = ">=1.2.2" }, + { name = "tqdm", specifier = ">=4.66" }, +] + +[[package]] +name = "pydantic" +version = "2.12.5" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "annotated-types" }, + { name = "pydantic-core" }, + { name = "typing-extensions" }, + { name = "typing-inspection" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/69/44/36f1a6e523abc58ae5f928898e4aca2e0ea509b5aa6f6f392a5d882be928/pydantic-2.12.5.tar.gz", hash = "sha256:4d351024c75c0f085a9febbb665ce8c0c6ec5d30e903bdb6394b7ede26aebb49", size = 821591, upload-time = "2025-11-26T15:11:46.471Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/5a/87/b70ad306ebb6f9b585f114d0ac2137d792b48be34d732d60e597c2f8465a/pydantic-2.12.5-py3-none-any.whl", hash = "sha256:e561593fccf61e8a20fc46dfc2dfe075b8be7d0188df33f221ad1f0139180f9d", size = 463580, upload-time = "2025-11-26T15:11:44.605Z" }, +] + +[[package]] +name = "pydantic-core" +version = "2.41.5" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/71/70/23b021c950c2addd24ec408e9ab05d59b035b39d97cdc1130e1bce647bb6/pydantic_core-2.41.5.tar.gz", hash = "sha256:08daa51ea16ad373ffd5e7606252cc32f07bc72b28284b6bc9c6df804816476e", size = 460952, upload-time = "2025-11-04T13:43:49.098Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/5f/5d/5f6c63eebb5afee93bcaae4ce9a898f3373ca23df3ccaef086d0233a35a7/pydantic_core-2.41.5-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:f41a7489d32336dbf2199c8c0a215390a751c5b014c2c1c5366e817202e9cdf7", size = 2110990, upload-time = "2025-11-04T13:39:58.079Z" }, + { url = "https://files.pythonhosted.org/packages/aa/32/9c2e8ccb57c01111e0fd091f236c7b371c1bccea0fa85247ac55b1e2b6b6/pydantic_core-2.41.5-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:070259a8818988b9a84a449a2a7337c7f430a22acc0859c6b110aa7212a6d9c0", size = 1896003, upload-time = "2025-11-04T13:39:59.956Z" }, + { url = "https://files.pythonhosted.org/packages/68/b8/a01b53cb0e59139fbc9e4fda3e9724ede8de279097179be4ff31f1abb65a/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e96cea19e34778f8d59fe40775a7a574d95816eb150850a85a7a4c8f4b94ac69", size = 1919200, upload-time = "2025-11-04T13:40:02.241Z" }, + { url = "https://files.pythonhosted.org/packages/38/de/8c36b5198a29bdaade07b5985e80a233a5ac27137846f3bc2d3b40a47360/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ed2e99c456e3fadd05c991f8f437ef902e00eedf34320ba2b0842bd1c3ca3a75", size = 2052578, upload-time = "2025-11-04T13:40:04.401Z" }, + { url = "https://files.pythonhosted.org/packages/00/b5/0e8e4b5b081eac6cb3dbb7e60a65907549a1ce035a724368c330112adfdd/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:65840751b72fbfd82c3c640cff9284545342a4f1eb1586ad0636955b261b0b05", size = 2208504, upload-time = "2025-11-04T13:40:06.072Z" }, + { url = "https://files.pythonhosted.org/packages/77/56/87a61aad59c7c5b9dc8caad5a41a5545cba3810c3e828708b3d7404f6cef/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e536c98a7626a98feb2d3eaf75944ef6f3dbee447e1f841eae16f2f0a72d8ddc", size = 2335816, upload-time = "2025-11-04T13:40:07.835Z" }, + { url = "https://files.pythonhosted.org/packages/0d/76/941cc9f73529988688a665a5c0ecff1112b3d95ab48f81db5f7606f522d3/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:eceb81a8d74f9267ef4081e246ffd6d129da5d87e37a77c9bde550cb04870c1c", size = 2075366, upload-time = "2025-11-04T13:40:09.804Z" }, + { url = "https://files.pythonhosted.org/packages/d3/43/ebef01f69baa07a482844faaa0a591bad1ef129253ffd0cdaa9d8a7f72d3/pydantic_core-2.41.5-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:d38548150c39b74aeeb0ce8ee1d8e82696f4a4e16ddc6de7b1d8823f7de4b9b5", size = 2171698, upload-time = "2025-11-04T13:40:12.004Z" }, + { url = "https://files.pythonhosted.org/packages/b1/87/41f3202e4193e3bacfc2c065fab7706ebe81af46a83d3e27605029c1f5a6/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:c23e27686783f60290e36827f9c626e63154b82b116d7fe9adba1fda36da706c", size = 2132603, upload-time = "2025-11-04T13:40:13.868Z" }, + { url = "https://files.pythonhosted.org/packages/49/7d/4c00df99cb12070b6bccdef4a195255e6020a550d572768d92cc54dba91a/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_armv7l.whl", hash = "sha256:482c982f814460eabe1d3bb0adfdc583387bd4691ef00b90575ca0d2b6fe2294", size = 2329591, upload-time = "2025-11-04T13:40:15.672Z" }, + { url = "https://files.pythonhosted.org/packages/cc/6a/ebf4b1d65d458f3cda6a7335d141305dfa19bdc61140a884d165a8a1bbc7/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:bfea2a5f0b4d8d43adf9d7b8bf019fb46fdd10a2e5cde477fbcb9d1fa08c68e1", size = 2319068, upload-time = "2025-11-04T13:40:17.532Z" }, + { url = "https://files.pythonhosted.org/packages/49/3b/774f2b5cd4192d5ab75870ce4381fd89cf218af999515baf07e7206753f0/pydantic_core-2.41.5-cp312-cp312-win32.whl", hash = "sha256:b74557b16e390ec12dca509bce9264c3bbd128f8a2c376eaa68003d7f327276d", size = 1985908, upload-time = "2025-11-04T13:40:19.309Z" }, + { url = "https://files.pythonhosted.org/packages/86/45/00173a033c801cacf67c190fef088789394feaf88a98a7035b0e40d53dc9/pydantic_core-2.41.5-cp312-cp312-win_amd64.whl", hash = "sha256:1962293292865bca8e54702b08a4f26da73adc83dd1fcf26fbc875b35d81c815", size = 2020145, upload-time = "2025-11-04T13:40:21.548Z" }, + { url = "https://files.pythonhosted.org/packages/f9/22/91fbc821fa6d261b376a3f73809f907cec5ca6025642c463d3488aad22fb/pydantic_core-2.41.5-cp312-cp312-win_arm64.whl", hash = "sha256:1746d4a3d9a794cacae06a5eaaccb4b8643a131d45fbc9af23e353dc0a5ba5c3", size = 1976179, upload-time = "2025-11-04T13:40:23.393Z" }, + { url = "https://files.pythonhosted.org/packages/87/06/8806241ff1f70d9939f9af039c6c35f2360cf16e93c2ca76f184e76b1564/pydantic_core-2.41.5-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:941103c9be18ac8daf7b7adca8228f8ed6bb7a1849020f643b3a14d15b1924d9", size = 2120403, upload-time = "2025-11-04T13:40:25.248Z" }, + { url = "https://files.pythonhosted.org/packages/94/02/abfa0e0bda67faa65fef1c84971c7e45928e108fe24333c81f3bfe35d5f5/pydantic_core-2.41.5-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:112e305c3314f40c93998e567879e887a3160bb8689ef3d2c04b6cc62c33ac34", size = 1896206, upload-time = "2025-11-04T13:40:27.099Z" }, + { url = "https://files.pythonhosted.org/packages/15/df/a4c740c0943e93e6500f9eb23f4ca7ec9bf71b19e608ae5b579678c8d02f/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0cbaad15cb0c90aa221d43c00e77bb33c93e8d36e0bf74760cd00e732d10a6a0", size = 1919307, upload-time = "2025-11-04T13:40:29.806Z" }, + { url = "https://files.pythonhosted.org/packages/9a/e3/6324802931ae1d123528988e0e86587c2072ac2e5394b4bc2bc34b61ff6e/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:03ca43e12fab6023fc79d28ca6b39b05f794ad08ec2feccc59a339b02f2b3d33", size = 2063258, upload-time = "2025-11-04T13:40:33.544Z" }, + { url = "https://files.pythonhosted.org/packages/c9/d4/2230d7151d4957dd79c3044ea26346c148c98fbf0ee6ebd41056f2d62ab5/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:dc799088c08fa04e43144b164feb0c13f9a0bc40503f8df3e9fde58a3c0c101e", size = 2214917, upload-time = "2025-11-04T13:40:35.479Z" }, + { url = "https://files.pythonhosted.org/packages/e6/9f/eaac5df17a3672fef0081b6c1bb0b82b33ee89aa5cec0d7b05f52fd4a1fa/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:97aeba56665b4c3235a0e52b2c2f5ae9cd071b8a8310ad27bddb3f7fb30e9aa2", size = 2332186, upload-time = "2025-11-04T13:40:37.436Z" }, + { url = "https://files.pythonhosted.org/packages/cf/4e/35a80cae583a37cf15604b44240e45c05e04e86f9cfd766623149297e971/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:406bf18d345822d6c21366031003612b9c77b3e29ffdb0f612367352aab7d586", size = 2073164, upload-time = "2025-11-04T13:40:40.289Z" }, + { url = "https://files.pythonhosted.org/packages/bf/e3/f6e262673c6140dd3305d144d032f7bd5f7497d3871c1428521f19f9efa2/pydantic_core-2.41.5-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:b93590ae81f7010dbe380cdeab6f515902ebcbefe0b9327cc4804d74e93ae69d", size = 2179146, upload-time = "2025-11-04T13:40:42.809Z" }, + { url = "https://files.pythonhosted.org/packages/75/c7/20bd7fc05f0c6ea2056a4565c6f36f8968c0924f19b7d97bbfea55780e73/pydantic_core-2.41.5-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:01a3d0ab748ee531f4ea6c3e48ad9dac84ddba4b0d82291f87248f2f9de8d740", size = 2137788, upload-time = "2025-11-04T13:40:44.752Z" }, + { url = "https://files.pythonhosted.org/packages/3a/8d/34318ef985c45196e004bc46c6eab2eda437e744c124ef0dbe1ff2c9d06b/pydantic_core-2.41.5-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:6561e94ba9dacc9c61bce40e2d6bdc3bfaa0259d3ff36ace3b1e6901936d2e3e", size = 2340133, upload-time = "2025-11-04T13:40:46.66Z" }, + { url = "https://files.pythonhosted.org/packages/9c/59/013626bf8c78a5a5d9350d12e7697d3d4de951a75565496abd40ccd46bee/pydantic_core-2.41.5-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:915c3d10f81bec3a74fbd4faebe8391013ba61e5a1a8d48c4455b923bdda7858", size = 2324852, upload-time = "2025-11-04T13:40:48.575Z" }, + { url = "https://files.pythonhosted.org/packages/1a/d9/c248c103856f807ef70c18a4f986693a46a8ffe1602e5d361485da502d20/pydantic_core-2.41.5-cp313-cp313-win32.whl", hash = "sha256:650ae77860b45cfa6e2cdafc42618ceafab3a2d9a3811fcfbd3bbf8ac3c40d36", size = 1994679, upload-time = "2025-11-04T13:40:50.619Z" }, + { url = "https://files.pythonhosted.org/packages/9e/8b/341991b158ddab181cff136acd2552c9f35bd30380422a639c0671e99a91/pydantic_core-2.41.5-cp313-cp313-win_amd64.whl", hash = "sha256:79ec52ec461e99e13791ec6508c722742ad745571f234ea6255bed38c6480f11", size = 2019766, upload-time = "2025-11-04T13:40:52.631Z" }, + { url = "https://files.pythonhosted.org/packages/73/7d/f2f9db34af103bea3e09735bb40b021788a5e834c81eedb541991badf8f5/pydantic_core-2.41.5-cp313-cp313-win_arm64.whl", hash = "sha256:3f84d5c1b4ab906093bdc1ff10484838aca54ef08de4afa9de0f5f14d69639cd", size = 1981005, upload-time = "2025-11-04T13:40:54.734Z" }, + { url = "https://files.pythonhosted.org/packages/ea/28/46b7c5c9635ae96ea0fbb779e271a38129df2550f763937659ee6c5dbc65/pydantic_core-2.41.5-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:3f37a19d7ebcdd20b96485056ba9e8b304e27d9904d233d7b1015db320e51f0a", size = 2119622, upload-time = "2025-11-04T13:40:56.68Z" }, + { url = "https://files.pythonhosted.org/packages/74/1a/145646e5687e8d9a1e8d09acb278c8535ebe9e972e1f162ed338a622f193/pydantic_core-2.41.5-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:1d1d9764366c73f996edd17abb6d9d7649a7eb690006ab6adbda117717099b14", size = 1891725, upload-time = "2025-11-04T13:40:58.807Z" }, + { url = "https://files.pythonhosted.org/packages/23/04/e89c29e267b8060b40dca97bfc64a19b2a3cf99018167ea1677d96368273/pydantic_core-2.41.5-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:25e1c2af0fce638d5f1988b686f3b3ea8cd7de5f244ca147c777769e798a9cd1", size = 1915040, upload-time = "2025-11-04T13:41:00.853Z" }, + { url = "https://files.pythonhosted.org/packages/84/a3/15a82ac7bd97992a82257f777b3583d3e84bdb06ba6858f745daa2ec8a85/pydantic_core-2.41.5-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:506d766a8727beef16b7adaeb8ee6217c64fc813646b424d0804d67c16eddb66", size = 2063691, upload-time = "2025-11-04T13:41:03.504Z" }, + { url = "https://files.pythonhosted.org/packages/74/9b/0046701313c6ef08c0c1cf0e028c67c770a4e1275ca73131563c5f2a310a/pydantic_core-2.41.5-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:4819fa52133c9aa3c387b3328f25c1facc356491e6135b459f1de698ff64d869", size = 2213897, upload-time = "2025-11-04T13:41:05.804Z" }, + { url = "https://files.pythonhosted.org/packages/8a/cd/6bac76ecd1b27e75a95ca3a9a559c643b3afcd2dd62086d4b7a32a18b169/pydantic_core-2.41.5-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:2b761d210c9ea91feda40d25b4efe82a1707da2ef62901466a42492c028553a2", size = 2333302, upload-time = "2025-11-04T13:41:07.809Z" }, + { url = "https://files.pythonhosted.org/packages/4c/d2/ef2074dc020dd6e109611a8be4449b98cd25e1b9b8a303c2f0fca2f2bcf7/pydantic_core-2.41.5-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:22f0fb8c1c583a3b6f24df2470833b40207e907b90c928cc8d3594b76f874375", size = 2064877, upload-time = "2025-11-04T13:41:09.827Z" }, + { url = "https://files.pythonhosted.org/packages/18/66/e9db17a9a763d72f03de903883c057b2592c09509ccfe468187f2a2eef29/pydantic_core-2.41.5-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2782c870e99878c634505236d81e5443092fba820f0373997ff75f90f68cd553", size = 2180680, upload-time = "2025-11-04T13:41:12.379Z" }, + { url = "https://files.pythonhosted.org/packages/d3/9e/3ce66cebb929f3ced22be85d4c2399b8e85b622db77dad36b73c5387f8f8/pydantic_core-2.41.5-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:0177272f88ab8312479336e1d777f6b124537d47f2123f89cb37e0accea97f90", size = 2138960, upload-time = "2025-11-04T13:41:14.627Z" }, + { url = "https://files.pythonhosted.org/packages/a6/62/205a998f4327d2079326b01abee48e502ea739d174f0a89295c481a2272e/pydantic_core-2.41.5-cp314-cp314-musllinux_1_1_armv7l.whl", hash = "sha256:63510af5e38f8955b8ee5687740d6ebf7c2a0886d15a6d65c32814613681bc07", size = 2339102, upload-time = "2025-11-04T13:41:16.868Z" }, + { url = "https://files.pythonhosted.org/packages/3c/0d/f05e79471e889d74d3d88f5bd20d0ed189ad94c2423d81ff8d0000aab4ff/pydantic_core-2.41.5-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:e56ba91f47764cc14f1daacd723e3e82d1a89d783f0f5afe9c364b8bb491ccdb", size = 2326039, upload-time = "2025-11-04T13:41:18.934Z" }, + { url = "https://files.pythonhosted.org/packages/ec/e1/e08a6208bb100da7e0c4b288eed624a703f4d129bde2da475721a80cab32/pydantic_core-2.41.5-cp314-cp314-win32.whl", hash = "sha256:aec5cf2fd867b4ff45b9959f8b20ea3993fc93e63c7363fe6851424c8a7e7c23", size = 1995126, upload-time = "2025-11-04T13:41:21.418Z" }, + { url = "https://files.pythonhosted.org/packages/48/5d/56ba7b24e9557f99c9237e29f5c09913c81eeb2f3217e40e922353668092/pydantic_core-2.41.5-cp314-cp314-win_amd64.whl", hash = "sha256:8e7c86f27c585ef37c35e56a96363ab8de4e549a95512445b85c96d3e2f7c1bf", size = 2015489, upload-time = "2025-11-04T13:41:24.076Z" }, + { url = "https://files.pythonhosted.org/packages/4e/bb/f7a190991ec9e3e0ba22e4993d8755bbc4a32925c0b5b42775c03e8148f9/pydantic_core-2.41.5-cp314-cp314-win_arm64.whl", hash = "sha256:e672ba74fbc2dc8eea59fb6d4aed6845e6905fc2a8afe93175d94a83ba2a01a0", size = 1977288, upload-time = "2025-11-04T13:41:26.33Z" }, + { url = "https://files.pythonhosted.org/packages/92/ed/77542d0c51538e32e15afe7899d79efce4b81eee631d99850edc2f5e9349/pydantic_core-2.41.5-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:8566def80554c3faa0e65ac30ab0932b9e3a5cd7f8323764303d468e5c37595a", size = 2120255, upload-time = "2025-11-04T13:41:28.569Z" }, + { url = "https://files.pythonhosted.org/packages/bb/3d/6913dde84d5be21e284439676168b28d8bbba5600d838b9dca99de0fad71/pydantic_core-2.41.5-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:b80aa5095cd3109962a298ce14110ae16b8c1aece8b72f9dafe81cf597ad80b3", size = 1863760, upload-time = "2025-11-04T13:41:31.055Z" }, + { url = "https://files.pythonhosted.org/packages/5a/f0/e5e6b99d4191da102f2b0eb9687aaa7f5bea5d9964071a84effc3e40f997/pydantic_core-2.41.5-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3006c3dd9ba34b0c094c544c6006cc79e87d8612999f1a5d43b769b89181f23c", size = 1878092, upload-time = "2025-11-04T13:41:33.21Z" }, + { url = "https://files.pythonhosted.org/packages/71/48/36fb760642d568925953bcc8116455513d6e34c4beaa37544118c36aba6d/pydantic_core-2.41.5-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:72f6c8b11857a856bcfa48c86f5368439f74453563f951e473514579d44aa612", size = 2053385, upload-time = "2025-11-04T13:41:35.508Z" }, + { url = "https://files.pythonhosted.org/packages/20/25/92dc684dd8eb75a234bc1c764b4210cf2646479d54b47bf46061657292a8/pydantic_core-2.41.5-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:5cb1b2f9742240e4bb26b652a5aeb840aa4b417c7748b6f8387927bc6e45e40d", size = 2218832, upload-time = "2025-11-04T13:41:37.732Z" }, + { url = "https://files.pythonhosted.org/packages/e2/09/f53e0b05023d3e30357d82eb35835d0f6340ca344720a4599cd663dca599/pydantic_core-2.41.5-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:bd3d54f38609ff308209bd43acea66061494157703364ae40c951f83ba99a1a9", size = 2327585, upload-time = "2025-11-04T13:41:40Z" }, + { url = "https://files.pythonhosted.org/packages/aa/4e/2ae1aa85d6af35a39b236b1b1641de73f5a6ac4d5a7509f77b814885760c/pydantic_core-2.41.5-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2ff4321e56e879ee8d2a879501c8e469414d948f4aba74a2d4593184eb326660", size = 2041078, upload-time = "2025-11-04T13:41:42.323Z" }, + { url = "https://files.pythonhosted.org/packages/cd/13/2e215f17f0ef326fc72afe94776edb77525142c693767fc347ed6288728d/pydantic_core-2.41.5-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:d0d2568a8c11bf8225044aa94409e21da0cb09dcdafe9ecd10250b2baad531a9", size = 2173914, upload-time = "2025-11-04T13:41:45.221Z" }, + { url = "https://files.pythonhosted.org/packages/02/7a/f999a6dcbcd0e5660bc348a3991c8915ce6599f4f2c6ac22f01d7a10816c/pydantic_core-2.41.5-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:a39455728aabd58ceabb03c90e12f71fd30fa69615760a075b9fec596456ccc3", size = 2129560, upload-time = "2025-11-04T13:41:47.474Z" }, + { url = "https://files.pythonhosted.org/packages/3a/b1/6c990ac65e3b4c079a4fb9f5b05f5b013afa0f4ed6780a3dd236d2cbdc64/pydantic_core-2.41.5-cp314-cp314t-musllinux_1_1_armv7l.whl", hash = "sha256:239edca560d05757817c13dc17c50766136d21f7cd0fac50295499ae24f90fdf", size = 2329244, upload-time = "2025-11-04T13:41:49.992Z" }, + { url = "https://files.pythonhosted.org/packages/d9/02/3c562f3a51afd4d88fff8dffb1771b30cfdfd79befd9883ee094f5b6c0d8/pydantic_core-2.41.5-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:2a5e06546e19f24c6a96a129142a75cee553cc018ffee48a460059b1185f4470", size = 2331955, upload-time = "2025-11-04T13:41:54.079Z" }, + { url = "https://files.pythonhosted.org/packages/5c/96/5fb7d8c3c17bc8c62fdb031c47d77a1af698f1d7a406b0f79aaa1338f9ad/pydantic_core-2.41.5-cp314-cp314t-win32.whl", hash = "sha256:b4ececa40ac28afa90871c2cc2b9ffd2ff0bf749380fbdf57d165fd23da353aa", size = 1988906, upload-time = "2025-11-04T13:41:56.606Z" }, + { url = "https://files.pythonhosted.org/packages/22/ed/182129d83032702912c2e2d8bbe33c036f342cc735737064668585dac28f/pydantic_core-2.41.5-cp314-cp314t-win_amd64.whl", hash = "sha256:80aa89cad80b32a912a65332f64a4450ed00966111b6615ca6816153d3585a8c", size = 1981607, upload-time = "2025-11-04T13:41:58.889Z" }, + { url = "https://files.pythonhosted.org/packages/9f/ed/068e41660b832bb0b1aa5b58011dea2a3fe0ba7861ff38c4d4904c1c1a99/pydantic_core-2.41.5-cp314-cp314t-win_arm64.whl", hash = "sha256:35b44f37a3199f771c3eaa53051bc8a70cd7b54f333531c59e29fd4db5d15008", size = 1974769, upload-time = "2025-11-04T13:42:01.186Z" }, + { url = "https://files.pythonhosted.org/packages/09/32/59b0c7e63e277fa7911c2fc70ccfb45ce4b98991e7ef37110663437005af/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-macosx_10_12_x86_64.whl", hash = "sha256:7da7087d756b19037bc2c06edc6c170eeef3c3bafcb8f532ff17d64dc427adfd", size = 2110495, upload-time = "2025-11-04T13:42:49.689Z" }, + { url = "https://files.pythonhosted.org/packages/aa/81/05e400037eaf55ad400bcd318c05bb345b57e708887f07ddb2d20e3f0e98/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-macosx_11_0_arm64.whl", hash = "sha256:aabf5777b5c8ca26f7824cb4a120a740c9588ed58df9b2d196ce92fba42ff8dc", size = 1915388, upload-time = "2025-11-04T13:42:52.215Z" }, + { url = "https://files.pythonhosted.org/packages/6e/0d/e3549b2399f71d56476b77dbf3cf8937cec5cd70536bdc0e374a421d0599/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c007fe8a43d43b3969e8469004e9845944f1a80e6acd47c150856bb87f230c56", size = 1942879, upload-time = "2025-11-04T13:42:56.483Z" }, + { url = "https://files.pythonhosted.org/packages/f7/07/34573da085946b6a313d7c42f82f16e8920bfd730665de2d11c0c37a74b5/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:76d0819de158cd855d1cbb8fcafdf6f5cf1eb8e470abe056d5d161106e38062b", size = 2139017, upload-time = "2025-11-04T13:42:59.471Z" }, +] + +[[package]] +name = "python-dotenv" +version = "1.2.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/82/ed/0301aeeac3e5353ef3d94b6ec08bbcabd04a72018415dcb29e588514bba8/python_dotenv-1.2.2.tar.gz", hash = "sha256:2c371a91fbd7ba082c2c1dc1f8bf89ca22564a087c2c287cd9b662adde799cf3", size = 50135, upload-time = "2026-03-01T16:00:26.196Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0b/d7/1959b9648791274998a9c3526f6d0ec8fd2233e4d4acce81bbae76b44b2a/python_dotenv-1.2.2-py3-none-any.whl", hash = "sha256:1d8214789a24de455a8b8bd8ae6fe3c6b69a5e3d64aa8a8e5d68e694bbcb285a", size = 22101, upload-time = "2026-03-01T16:00:25.09Z" }, +] + +[[package]] +name = "pyyaml" +version = "6.0.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/05/8e/961c0007c59b8dd7729d542c61a4d537767a59645b82a0b521206e1e25c2/pyyaml-6.0.3.tar.gz", hash = "sha256:d76623373421df22fb4cf8817020cbb7ef15c725b9d5e45f17e189bfc384190f", size = 130960, upload-time = "2025-09-25T21:33:16.546Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d1/33/422b98d2195232ca1826284a76852ad5a86fe23e31b009c9886b2d0fb8b2/pyyaml-6.0.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:7f047e29dcae44602496db43be01ad42fc6f1cc0d8cd6c83d342306c32270196", size = 182063, upload-time = "2025-09-25T21:32:11.445Z" }, + { url = "https://files.pythonhosted.org/packages/89/a0/6cf41a19a1f2f3feab0e9c0b74134aa2ce6849093d5517a0c550fe37a648/pyyaml-6.0.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:fc09d0aa354569bc501d4e787133afc08552722d3ab34836a80547331bb5d4a0", size = 173973, upload-time = "2025-09-25T21:32:12.492Z" }, + { url = "https://files.pythonhosted.org/packages/ed/23/7a778b6bd0b9a8039df8b1b1d80e2e2ad78aa04171592c8a5c43a56a6af4/pyyaml-6.0.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9149cad251584d5fb4981be1ecde53a1ca46c891a79788c0df828d2f166bda28", size = 775116, upload-time = "2025-09-25T21:32:13.652Z" }, + { url = "https://files.pythonhosted.org/packages/65/30/d7353c338e12baef4ecc1b09e877c1970bd3382789c159b4f89d6a70dc09/pyyaml-6.0.3-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5fdec68f91a0c6739b380c83b951e2c72ac0197ace422360e6d5a959d8d97b2c", size = 844011, upload-time = "2025-09-25T21:32:15.21Z" }, + { url = "https://files.pythonhosted.org/packages/8b/9d/b3589d3877982d4f2329302ef98a8026e7f4443c765c46cfecc8858c6b4b/pyyaml-6.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ba1cc08a7ccde2d2ec775841541641e4548226580ab850948cbfda66a1befcdc", size = 807870, upload-time = "2025-09-25T21:32:16.431Z" }, + { url = "https://files.pythonhosted.org/packages/05/c0/b3be26a015601b822b97d9149ff8cb5ead58c66f981e04fedf4e762f4bd4/pyyaml-6.0.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:8dc52c23056b9ddd46818a57b78404882310fb473d63f17b07d5c40421e47f8e", size = 761089, upload-time = "2025-09-25T21:32:17.56Z" }, + { url = "https://files.pythonhosted.org/packages/be/8e/98435a21d1d4b46590d5459a22d88128103f8da4c2d4cb8f14f2a96504e1/pyyaml-6.0.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:41715c910c881bc081f1e8872880d3c650acf13dfa8214bad49ed4cede7c34ea", size = 790181, upload-time = "2025-09-25T21:32:18.834Z" }, + { url = "https://files.pythonhosted.org/packages/74/93/7baea19427dcfbe1e5a372d81473250b379f04b1bd3c4c5ff825e2327202/pyyaml-6.0.3-cp312-cp312-win32.whl", hash = "sha256:96b533f0e99f6579b3d4d4995707cf36df9100d67e0c8303a0c55b27b5f99bc5", size = 137658, upload-time = "2025-09-25T21:32:20.209Z" }, + { url = "https://files.pythonhosted.org/packages/86/bf/899e81e4cce32febab4fb42bb97dcdf66bc135272882d1987881a4b519e9/pyyaml-6.0.3-cp312-cp312-win_amd64.whl", hash = "sha256:5fcd34e47f6e0b794d17de1b4ff496c00986e1c83f7ab2fb8fcfe9616ff7477b", size = 154003, upload-time = "2025-09-25T21:32:21.167Z" }, + { url = "https://files.pythonhosted.org/packages/1a/08/67bd04656199bbb51dbed1439b7f27601dfb576fb864099c7ef0c3e55531/pyyaml-6.0.3-cp312-cp312-win_arm64.whl", hash = "sha256:64386e5e707d03a7e172c0701abfb7e10f0fb753ee1d773128192742712a98fd", size = 140344, upload-time = "2025-09-25T21:32:22.617Z" }, + { url = "https://files.pythonhosted.org/packages/d1/11/0fd08f8192109f7169db964b5707a2f1e8b745d4e239b784a5a1dd80d1db/pyyaml-6.0.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:8da9669d359f02c0b91ccc01cac4a67f16afec0dac22c2ad09f46bee0697eba8", size = 181669, upload-time = "2025-09-25T21:32:23.673Z" }, + { url = "https://files.pythonhosted.org/packages/b1/16/95309993f1d3748cd644e02e38b75d50cbc0d9561d21f390a76242ce073f/pyyaml-6.0.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:2283a07e2c21a2aa78d9c4442724ec1eb15f5e42a723b99cb3d822d48f5f7ad1", size = 173252, upload-time = "2025-09-25T21:32:25.149Z" }, + { url = "https://files.pythonhosted.org/packages/50/31/b20f376d3f810b9b2371e72ef5adb33879b25edb7a6d072cb7ca0c486398/pyyaml-6.0.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ee2922902c45ae8ccada2c5b501ab86c36525b883eff4255313a253a3160861c", size = 767081, upload-time = "2025-09-25T21:32:26.575Z" }, + { url = "https://files.pythonhosted.org/packages/49/1e/a55ca81e949270d5d4432fbbd19dfea5321eda7c41a849d443dc92fd1ff7/pyyaml-6.0.3-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:a33284e20b78bd4a18c8c2282d549d10bc8408a2a7ff57653c0cf0b9be0afce5", size = 841159, upload-time = "2025-09-25T21:32:27.727Z" }, + { url = "https://files.pythonhosted.org/packages/74/27/e5b8f34d02d9995b80abcef563ea1f8b56d20134d8f4e5e81733b1feceb2/pyyaml-6.0.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0f29edc409a6392443abf94b9cf89ce99889a1dd5376d94316ae5145dfedd5d6", size = 801626, upload-time = "2025-09-25T21:32:28.878Z" }, + { url = "https://files.pythonhosted.org/packages/f9/11/ba845c23988798f40e52ba45f34849aa8a1f2d4af4b798588010792ebad6/pyyaml-6.0.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f7057c9a337546edc7973c0d3ba84ddcdf0daa14533c2065749c9075001090e6", size = 753613, upload-time = "2025-09-25T21:32:30.178Z" }, + { url = "https://files.pythonhosted.org/packages/3d/e0/7966e1a7bfc0a45bf0a7fb6b98ea03fc9b8d84fa7f2229e9659680b69ee3/pyyaml-6.0.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:eda16858a3cab07b80edaf74336ece1f986ba330fdb8ee0d6c0d68fe82bc96be", size = 794115, upload-time = "2025-09-25T21:32:31.353Z" }, + { url = "https://files.pythonhosted.org/packages/de/94/980b50a6531b3019e45ddeada0626d45fa85cbe22300844a7983285bed3b/pyyaml-6.0.3-cp313-cp313-win32.whl", hash = "sha256:d0eae10f8159e8fdad514efdc92d74fd8d682c933a6dd088030f3834bc8e6b26", size = 137427, upload-time = "2025-09-25T21:32:32.58Z" }, + { url = "https://files.pythonhosted.org/packages/97/c9/39d5b874e8b28845e4ec2202b5da735d0199dbe5b8fb85f91398814a9a46/pyyaml-6.0.3-cp313-cp313-win_amd64.whl", hash = "sha256:79005a0d97d5ddabfeeea4cf676af11e647e41d81c9a7722a193022accdb6b7c", size = 154090, upload-time = "2025-09-25T21:32:33.659Z" }, + { url = "https://files.pythonhosted.org/packages/73/e8/2bdf3ca2090f68bb3d75b44da7bbc71843b19c9f2b9cb9b0f4ab7a5a4329/pyyaml-6.0.3-cp313-cp313-win_arm64.whl", hash = "sha256:5498cd1645aa724a7c71c8f378eb29ebe23da2fc0d7a08071d89469bf1d2defb", size = 140246, upload-time = "2025-09-25T21:32:34.663Z" }, + { url = "https://files.pythonhosted.org/packages/9d/8c/f4bd7f6465179953d3ac9bc44ac1a8a3e6122cf8ada906b4f96c60172d43/pyyaml-6.0.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:8d1fab6bb153a416f9aeb4b8763bc0f22a5586065f86f7664fc23339fc1c1fac", size = 181814, upload-time = "2025-09-25T21:32:35.712Z" }, + { url = "https://files.pythonhosted.org/packages/bd/9c/4d95bb87eb2063d20db7b60faa3840c1b18025517ae857371c4dd55a6b3a/pyyaml-6.0.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:34d5fcd24b8445fadc33f9cf348c1047101756fd760b4dacb5c3e99755703310", size = 173809, upload-time = "2025-09-25T21:32:36.789Z" }, + { url = "https://files.pythonhosted.org/packages/92/b5/47e807c2623074914e29dabd16cbbdd4bf5e9b2db9f8090fa64411fc5382/pyyaml-6.0.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:501a031947e3a9025ed4405a168e6ef5ae3126c59f90ce0cd6f2bfc477be31b7", size = 766454, upload-time = "2025-09-25T21:32:37.966Z" }, + { url = "https://files.pythonhosted.org/packages/02/9e/e5e9b168be58564121efb3de6859c452fccde0ab093d8438905899a3a483/pyyaml-6.0.3-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:b3bc83488de33889877a0f2543ade9f70c67d66d9ebb4ac959502e12de895788", size = 836355, upload-time = "2025-09-25T21:32:39.178Z" }, + { url = "https://files.pythonhosted.org/packages/88/f9/16491d7ed2a919954993e48aa941b200f38040928474c9e85ea9e64222c3/pyyaml-6.0.3-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c458b6d084f9b935061bc36216e8a69a7e293a2f1e68bf956dcd9e6cbcd143f5", size = 794175, upload-time = "2025-09-25T21:32:40.865Z" }, + { url = "https://files.pythonhosted.org/packages/dd/3f/5989debef34dc6397317802b527dbbafb2b4760878a53d4166579111411e/pyyaml-6.0.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:7c6610def4f163542a622a73fb39f534f8c101d690126992300bf3207eab9764", size = 755228, upload-time = "2025-09-25T21:32:42.084Z" }, + { url = "https://files.pythonhosted.org/packages/d7/ce/af88a49043cd2e265be63d083fc75b27b6ed062f5f9fd6cdc223ad62f03e/pyyaml-6.0.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:5190d403f121660ce8d1d2c1bb2ef1bd05b5f68533fc5c2ea899bd15f4399b35", size = 789194, upload-time = "2025-09-25T21:32:43.362Z" }, + { url = "https://files.pythonhosted.org/packages/23/20/bb6982b26a40bb43951265ba29d4c246ef0ff59c9fdcdf0ed04e0687de4d/pyyaml-6.0.3-cp314-cp314-win_amd64.whl", hash = "sha256:4a2e8cebe2ff6ab7d1050ecd59c25d4c8bd7e6f400f5f82b96557ac0abafd0ac", size = 156429, upload-time = "2025-09-25T21:32:57.844Z" }, + { url = "https://files.pythonhosted.org/packages/f4/f4/a4541072bb9422c8a883ab55255f918fa378ecf083f5b85e87fc2b4eda1b/pyyaml-6.0.3-cp314-cp314-win_arm64.whl", hash = "sha256:93dda82c9c22deb0a405ea4dc5f2d0cda384168e466364dec6255b293923b2f3", size = 143912, upload-time = "2025-09-25T21:32:59.247Z" }, + { url = "https://files.pythonhosted.org/packages/7c/f9/07dd09ae774e4616edf6cda684ee78f97777bdd15847253637a6f052a62f/pyyaml-6.0.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:02893d100e99e03eda1c8fd5c441d8c60103fd175728e23e431db1b589cf5ab3", size = 189108, upload-time = "2025-09-25T21:32:44.377Z" }, + { url = "https://files.pythonhosted.org/packages/4e/78/8d08c9fb7ce09ad8c38ad533c1191cf27f7ae1effe5bb9400a46d9437fcf/pyyaml-6.0.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:c1ff362665ae507275af2853520967820d9124984e0f7466736aea23d8611fba", size = 183641, upload-time = "2025-09-25T21:32:45.407Z" }, + { url = "https://files.pythonhosted.org/packages/7b/5b/3babb19104a46945cf816d047db2788bcaf8c94527a805610b0289a01c6b/pyyaml-6.0.3-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6adc77889b628398debc7b65c073bcb99c4a0237b248cacaf3fe8a557563ef6c", size = 831901, upload-time = "2025-09-25T21:32:48.83Z" }, + { url = "https://files.pythonhosted.org/packages/8b/cc/dff0684d8dc44da4d22a13f35f073d558c268780ce3c6ba1b87055bb0b87/pyyaml-6.0.3-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:a80cb027f6b349846a3bf6d73b5e95e782175e52f22108cfa17876aaeff93702", size = 861132, upload-time = "2025-09-25T21:32:50.149Z" }, + { url = "https://files.pythonhosted.org/packages/b1/5e/f77dc6b9036943e285ba76b49e118d9ea929885becb0a29ba8a7c75e29fe/pyyaml-6.0.3-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:00c4bdeba853cc34e7dd471f16b4114f4162dc03e6b7afcc2128711f0eca823c", size = 839261, upload-time = "2025-09-25T21:32:51.808Z" }, + { url = "https://files.pythonhosted.org/packages/ce/88/a9db1376aa2a228197c58b37302f284b5617f56a5d959fd1763fb1675ce6/pyyaml-6.0.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:66e1674c3ef6f541c35191caae2d429b967b99e02040f5ba928632d9a7f0f065", size = 805272, upload-time = "2025-09-25T21:32:52.941Z" }, + { url = "https://files.pythonhosted.org/packages/da/92/1446574745d74df0c92e6aa4a7b0b3130706a4142b2d1a5869f2eaa423c6/pyyaml-6.0.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:16249ee61e95f858e83976573de0f5b2893b3677ba71c9dd36b9cf8be9ac6d65", size = 829923, upload-time = "2025-09-25T21:32:54.537Z" }, + { url = "https://files.pythonhosted.org/packages/f0/7a/1c7270340330e575b92f397352af856a8c06f230aa3e76f86b39d01b416a/pyyaml-6.0.3-cp314-cp314t-win_amd64.whl", hash = "sha256:4ad1906908f2f5ae4e5a8ddfce73c320c2a1429ec52eafd27138b7f1cbe341c9", size = 174062, upload-time = "2025-09-25T21:32:55.767Z" }, + { url = "https://files.pythonhosted.org/packages/f1/12/de94a39c2ef588c7e6455cfbe7343d3b2dc9d6b6b2f40c4c6565744c873d/pyyaml-6.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:ebc55a14a21cb14062aa4162f906cd962b28e2e9ea38f9b4391244cd8de4ae0b", size = 149341, upload-time = "2025-09-25T21:32:56.828Z" }, +] + +[[package]] +name = "regex" +version = "2026.4.4" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/cb/0e/3a246dbf05666918bd3664d9d787f84a9108f6f43cc953a077e4a7dfdb7e/regex-2026.4.4.tar.gz", hash = "sha256:e08270659717f6973523ce3afbafa53515c4dc5dcad637dc215b6fd50f689423", size = 416000, upload-time = "2026-04-03T20:56:28.155Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/e5/28/b972a4d3df61e1d7bcf1b59fdb3cddef22f88b6be43f161bb41ebc0e4081/regex-2026.4.4-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:c07ab8794fa929e58d97a0e1796b8b76f70943fa39df225ac9964615cf1f9d52", size = 490434, upload-time = "2026-04-03T20:53:40.219Z" }, + { url = "https://files.pythonhosted.org/packages/84/20/30041446cf6dc3e0eab344fc62770e84c23b6b68a3b657821f9f80cb69b4/regex-2026.4.4-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:2c785939dc023a1ce4ec09599c032cc9933d258a998d16ca6f2b596c010940eb", size = 292061, upload-time = "2026-04-03T20:53:41.862Z" }, + { url = "https://files.pythonhosted.org/packages/62/c8/3baa06d75c98c46d4cc4262b71fd2edb9062b5665e868bca57859dadf93a/regex-2026.4.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:1b1ce5c81c9114f1ce2f9288a51a8fd3aeea33a0cc440c415bf02da323aa0a76", size = 289628, upload-time = "2026-04-03T20:53:43.701Z" }, + { url = "https://files.pythonhosted.org/packages/31/87/3accf55634caad8c0acab23f5135ef7d4a21c39f28c55c816ae012931408/regex-2026.4.4-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:760ef21c17d8e6a4fe8cf406a97cf2806a4df93416ccc82fc98d25b1c20425be", size = 796651, upload-time = "2026-04-03T20:53:45.379Z" }, + { url = "https://files.pythonhosted.org/packages/f6/0c/aaa2c83f34efedbf06f61cb1942c25f6cf1ee3b200f832c4d05f28306c2e/regex-2026.4.4-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:7088fcdcb604a4417c208e2169715800d28838fefd7455fbe40416231d1d47c1", size = 865916, upload-time = "2026-04-03T20:53:47.064Z" }, + { url = "https://files.pythonhosted.org/packages/d9/f6/8c6924c865124643e8f37823eca845dc27ac509b2ee58123685e71cd0279/regex-2026.4.4-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:07edca1ba687998968f7db5bc355288d0c6505caa7374f013d27356d93976d13", size = 912287, upload-time = "2026-04-03T20:53:49.422Z" }, + { url = "https://files.pythonhosted.org/packages/11/0e/a9f6f81013e0deaf559b25711623864970fe6a098314e374ccb1540a4152/regex-2026.4.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:993f657a7c1c6ec51b5e0ba97c9817d06b84ea5fa8d82e43b9405de0defdc2b9", size = 801126, upload-time = "2026-04-03T20:53:51.096Z" }, + { url = "https://files.pythonhosted.org/packages/71/61/3a0cc8af2dc0c8deb48e644dd2521f173f7e6513c6e195aad9aa8dd77ac5/regex-2026.4.4-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:2b69102a743e7569ebee67e634a69c4cb7e59d6fa2e1aa7d3bdbf3f61435f62d", size = 776788, upload-time = "2026-04-03T20:53:52.889Z" }, + { url = "https://files.pythonhosted.org/packages/64/0b/8bb9cbf21ef7dee58e49b0fdb066a7aded146c823202e16494a36777594f/regex-2026.4.4-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:6dac006c8b6dda72d86ea3d1333d45147de79a3a3f26f10c1cf9287ca4ca0ac3", size = 785184, upload-time = "2026-04-03T20:53:55.627Z" }, + { url = "https://files.pythonhosted.org/packages/99/c2/d3e80e8137b25ee06c92627de4e4d98b94830e02b3e6f81f3d2e3f504cf5/regex-2026.4.4-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:50a766ee2010d504554bfb5f578ed2e066898aa26411d57e6296230627cdefa0", size = 859913, upload-time = "2026-04-03T20:53:57.249Z" }, + { url = "https://files.pythonhosted.org/packages/bc/e6/9d5d876157d969c804622456ef250017ac7a8f83e0e14f903b9e6df5ce95/regex-2026.4.4-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:9e2f5217648f68e3028c823df58663587c1507a5ba8419f4fdfc8a461be76043", size = 765732, upload-time = "2026-04-03T20:53:59.428Z" }, + { url = "https://files.pythonhosted.org/packages/82/80/b568935b4421388561c8ed42aff77247285d3ae3bb2a6ca22af63bae805e/regex-2026.4.4-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:39d8de85a08e32632974151ba59c6e9140646dcc36c80423962b1c5c0a92e244", size = 852152, upload-time = "2026-04-03T20:54:01.505Z" }, + { url = "https://files.pythonhosted.org/packages/39/29/f0f81217e21cd998245da047405366385d5c6072048038a3d33b37a79dc0/regex-2026.4.4-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:55d9304e0e7178dfb1e106c33edf834097ddf4a890e2f676f6c5118f84390f73", size = 789076, upload-time = "2026-04-03T20:54:03.323Z" }, + { url = "https://files.pythonhosted.org/packages/49/1d/1d957a61976ab9d4e767dd4f9d04b66cc0c41c5e36cf40e2d43688b5ae6f/regex-2026.4.4-cp312-cp312-win32.whl", hash = "sha256:04bb679bc0bde8a7bfb71e991493d47314e7b98380b083df2447cda4b6edb60f", size = 266700, upload-time = "2026-04-03T20:54:05.639Z" }, + { url = "https://files.pythonhosted.org/packages/c5/5c/bf575d396aeb58ea13b06ef2adf624f65b70fafef6950a80fc3da9cae3bc/regex-2026.4.4-cp312-cp312-win_amd64.whl", hash = "sha256:db0ac18435a40a2543dbb3d21e161a6c78e33e8159bd2e009343d224bb03bb1b", size = 277768, upload-time = "2026-04-03T20:54:07.312Z" }, + { url = "https://files.pythonhosted.org/packages/c9/27/049df16ec6a6828ccd72add3c7f54b4df029669bea8e9817df6fff58be90/regex-2026.4.4-cp312-cp312-win_arm64.whl", hash = "sha256:4ce255cc05c1947a12989c6db801c96461947adb7a59990f1360b5983fab4983", size = 270568, upload-time = "2026-04-03T20:54:09.484Z" }, + { url = "https://files.pythonhosted.org/packages/9d/83/c4373bc5f31f2cf4b66f9b7c31005bd87fe66f0dce17701f7db4ee79ee29/regex-2026.4.4-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:62f5519042c101762509b1d717b45a69c0139d60414b3c604b81328c01bd1943", size = 490273, upload-time = "2026-04-03T20:54:11.202Z" }, + { url = "https://files.pythonhosted.org/packages/46/f8/fe62afbcc3cf4ad4ac9adeaafd98aa747869ae12d3e8e2ac293d0593c435/regex-2026.4.4-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:3790ba9fb5dd76715a7afe34dbe603ba03f8820764b1dc929dd08106214ed031", size = 291954, upload-time = "2026-04-03T20:54:13.412Z" }, + { url = "https://files.pythonhosted.org/packages/5a/92/4712b9fe6a33d232eeb1c189484b80c6c4b8422b90e766e1195d6e758207/regex-2026.4.4-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:8fae3c6e795d7678963f2170152b0d892cf6aee9ee8afc8c45e6be38d5107fe7", size = 289487, upload-time = "2026-04-03T20:54:15.824Z" }, + { url = "https://files.pythonhosted.org/packages/88/2c/f83b93f85e01168f1070f045a42d4c937b69fdb8dd7ae82d307253f7e36e/regex-2026.4.4-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:298c3ec2d53225b3bf91142eb9691025bab610e0c0c51592dde149db679b3d17", size = 796646, upload-time = "2026-04-03T20:54:18.229Z" }, + { url = "https://files.pythonhosted.org/packages/df/55/61a2e17bf0c4dc57e11caf8dd11771280d8aaa361785f9e3bc40d653f4a7/regex-2026.4.4-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:e9638791082eaf5b3ac112c587518ee78e083a11c4b28012d8fe2a0f536dfb17", size = 865904, upload-time = "2026-04-03T20:54:20.019Z" }, + { url = "https://files.pythonhosted.org/packages/45/32/1ac8ed1b5a346b5993a3d256abe0a0f03b0b73c8cc88d928537368ac65b6/regex-2026.4.4-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:ae3e764bd4c5ff55035dc82a8d49acceb42a5298edf6eb2fc4d328ee5dd7afae", size = 912304, upload-time = "2026-04-03T20:54:22.403Z" }, + { url = "https://files.pythonhosted.org/packages/26/47/2ee5c613ab546f0eddebf9905d23e07beb933416b1246c2d8791d01979b4/regex-2026.4.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ffa81f81b80047ba89a3c69ae6a0f78d06f4a42ce5126b0eb2a0a10ad44e0b2e", size = 801126, upload-time = "2026-04-03T20:54:24.308Z" }, + { url = "https://files.pythonhosted.org/packages/75/cd/41dacd129ca9fd20bd7d02f83e0fad83e034ac8a084ec369c90f55ef37e2/regex-2026.4.4-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:f56ebf9d70305307a707911b88469213630aba821e77de7d603f9d2f0730687d", size = 776772, upload-time = "2026-04-03T20:54:26.319Z" }, + { url = "https://files.pythonhosted.org/packages/89/6d/5af0b588174cb5f46041fa7dd64d3fd5cd2fe51f18766703d1edc387f324/regex-2026.4.4-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:773d1dfd652bbffb09336abf890bfd64785c7463716bf766d0eb3bc19c8b7f27", size = 785228, upload-time = "2026-04-03T20:54:28.387Z" }, + { url = "https://files.pythonhosted.org/packages/b7/3b/f5a72b7045bd59575fc33bf1345f156fcfd5a8484aea6ad84b12c5a82114/regex-2026.4.4-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:d51d20befd5275d092cdffba57ded05f3c436317ee56466c8928ac32d960edaf", size = 860032, upload-time = "2026-04-03T20:54:30.641Z" }, + { url = "https://files.pythonhosted.org/packages/39/a4/72a317003d6fcd7a573584a85f59f525dfe8f67e355ca74eb6b53d66a5e2/regex-2026.4.4-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:0a51cdb3c1e9161154f976cb2bef9894bc063ac82f31b733087ffb8e880137d0", size = 765714, upload-time = "2026-04-03T20:54:32.789Z" }, + { url = "https://files.pythonhosted.org/packages/25/1e/5672e16f34dbbcb2560cc7e6a2fbb26dfa8b270711e730101da4423d3973/regex-2026.4.4-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:ae5266a82596114e41fb5302140e9630204c1b5f325c770bec654b95dd54b0aa", size = 852078, upload-time = "2026-04-03T20:54:34.546Z" }, + { url = "https://files.pythonhosted.org/packages/f7/0d/c813f0af7c6cc7ed7b9558bac2e5120b60ad0fa48f813e4d4bd55446f214/regex-2026.4.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:c882cd92ec68585e9c1cf36c447ec846c0d94edd706fe59e0c198e65822fd23b", size = 789181, upload-time = "2026-04-03T20:54:36.642Z" }, + { url = "https://files.pythonhosted.org/packages/ea/6d/a344608d1adbd2a95090ddd906cec09a11be0e6517e878d02a5123e0917f/regex-2026.4.4-cp313-cp313-win32.whl", hash = "sha256:05568c4fbf3cb4fa9e28e3af198c40d3237cf6041608a9022285fe567ec3ad62", size = 266690, upload-time = "2026-04-03T20:54:38.343Z" }, + { url = "https://files.pythonhosted.org/packages/31/07/54049f89b46235ca6f45cd6c88668a7050e77d4a15555e47dd40fde75263/regex-2026.4.4-cp313-cp313-win_amd64.whl", hash = "sha256:3384df51ed52db0bea967e21458ab0a414f67cdddfd94401688274e55147bb81", size = 277733, upload-time = "2026-04-03T20:54:40.11Z" }, + { url = "https://files.pythonhosted.org/packages/0e/21/61366a8e20f4d43fb597708cac7f0e2baadb491ecc9549b4980b2be27d16/regex-2026.4.4-cp313-cp313-win_arm64.whl", hash = "sha256:acd38177bd2c8e69a411d6521760806042e244d0ef94e2dd03ecdaa8a3c99427", size = 270565, upload-time = "2026-04-03T20:54:41.883Z" }, + { url = "https://files.pythonhosted.org/packages/f1/1e/3a2b9672433bef02f5d39aa1143ca2c08f311c1d041c464a42be9ae648dc/regex-2026.4.4-cp313-cp313t-macosx_10_13_universal2.whl", hash = "sha256:f94a11a9d05afcfcfa640e096319720a19cc0c9f7768e1a61fceee6a3afc6c7c", size = 494126, upload-time = "2026-04-03T20:54:43.602Z" }, + { url = "https://files.pythonhosted.org/packages/4e/4b/c132a4f4fe18ad3340d89fcb56235132b69559136036b845be3c073142ed/regex-2026.4.4-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:36bcb9d6d1307ab629edc553775baada2aefa5c50ccc0215fbfd2afcfff43141", size = 293882, upload-time = "2026-04-03T20:54:45.41Z" }, + { url = "https://files.pythonhosted.org/packages/f4/5f/eaa38092ce7a023656280f2341dbbd4ad5f05d780a70abba7bb4f4bea54c/regex-2026.4.4-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:261c015b3e2ed0919157046d768774ecde57f03d8fa4ba78d29793447f70e717", size = 292334, upload-time = "2026-04-03T20:54:47.051Z" }, + { url = "https://files.pythonhosted.org/packages/5f/f6/dd38146af1392dac33db7074ab331cec23cced3759167735c42c5460a243/regex-2026.4.4-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c228cf65b4a54583763645dcd73819b3b381ca8b4bb1b349dee1c135f4112c07", size = 811691, upload-time = "2026-04-03T20:54:49.074Z" }, + { url = "https://files.pythonhosted.org/packages/7a/f0/dc54c2e69f5eeec50601054998ec3690d5344277e782bd717e49867c1d29/regex-2026.4.4-cp313-cp313t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:dd2630faeb6876fb0c287f664d93ddce4d50cd46c6e88e60378c05c9047e08ca", size = 871227, upload-time = "2026-04-03T20:54:51.035Z" }, + { url = "https://files.pythonhosted.org/packages/a1/af/cb16bd5dc61621e27df919a4449bbb7e5a1034c34d307e0a706e9cc0f3e3/regex-2026.4.4-cp313-cp313t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:6a50ab11b7779b849472337191f3a043e27e17f71555f98d0092fa6d73364520", size = 917435, upload-time = "2026-04-03T20:54:52.994Z" }, + { url = "https://files.pythonhosted.org/packages/5c/71/8b260897f22996b666edd9402861668f45a2ca259f665ac029e6104a2d7d/regex-2026.4.4-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0734f63afe785138549fbe822a8cfeaccd1bae814c5057cc0ed5b9f2de4fc883", size = 816358, upload-time = "2026-04-03T20:54:54.884Z" }, + { url = "https://files.pythonhosted.org/packages/1c/60/775f7f72a510ef238254906c2f3d737fc80b16ca85f07d20e318d2eea894/regex-2026.4.4-cp313-cp313t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:c4ee50606cb1967db7e523224e05f32089101945f859928e65657a2cbb3d278b", size = 785549, upload-time = "2026-04-03T20:54:57.01Z" }, + { url = "https://files.pythonhosted.org/packages/58/42/34d289b3627c03cf381e44da534a0021664188fa49ba41513da0b4ec6776/regex-2026.4.4-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:6c1818f37be3ca02dcb76d63f2c7aaba4b0dc171b579796c6fbe00148dfec6b1", size = 801364, upload-time = "2026-04-03T20:54:58.981Z" }, + { url = "https://files.pythonhosted.org/packages/fc/20/f6ecf319b382a8f1ab529e898b222c3f30600fcede7834733c26279e7465/regex-2026.4.4-cp313-cp313t-musllinux_1_2_ppc64le.whl", hash = "sha256:f5bfc2741d150d0be3e4a0401a5c22b06e60acb9aa4daa46d9e79a6dcd0f135b", size = 866221, upload-time = "2026-04-03T20:55:00.88Z" }, + { url = "https://files.pythonhosted.org/packages/92/6a/9f16d3609d549bd96d7a0b2aee1625d7512ba6a03efc01652149ef88e74d/regex-2026.4.4-cp313-cp313t-musllinux_1_2_riscv64.whl", hash = "sha256:504ffa8a03609a087cad81277a629b6ce884b51a24bd388a7980ad61748618ff", size = 772530, upload-time = "2026-04-03T20:55:03.213Z" }, + { url = "https://files.pythonhosted.org/packages/fa/f6/aa9768bc96a4c361ac96419fbaf2dcdc33970bb813df3ba9b09d5d7b6d96/regex-2026.4.4-cp313-cp313t-musllinux_1_2_s390x.whl", hash = "sha256:70aadc6ff12e4b444586e57fc30771f86253f9f0045b29016b9605b4be5f7dfb", size = 856989, upload-time = "2026-04-03T20:55:05.087Z" }, + { url = "https://files.pythonhosted.org/packages/4d/b4/c671db3556be2473ae3e4bb7a297c518d281452871501221251ea4ecba57/regex-2026.4.4-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:f4f83781191007b6ef43b03debc35435f10cad9b96e16d147efe84a1d48bdde4", size = 803241, upload-time = "2026-04-03T20:55:07.162Z" }, + { url = "https://files.pythonhosted.org/packages/2a/5c/83e3b1d89fa4f6e5a1bc97b4abd4a9a97b3c1ac7854164f694f5f0ba98a0/regex-2026.4.4-cp313-cp313t-win32.whl", hash = "sha256:e014a797de43d1847df957c0a2a8e861d1c17547ee08467d1db2c370b7568baa", size = 269921, upload-time = "2026-04-03T20:55:09.62Z" }, + { url = "https://files.pythonhosted.org/packages/28/07/077c387121f42cdb4d92b1301133c0d93b5709d096d1669ab847dda9fe2e/regex-2026.4.4-cp313-cp313t-win_amd64.whl", hash = "sha256:b15b88b0d52b179712632832c1d6e58e5774f93717849a41096880442da41ab0", size = 281240, upload-time = "2026-04-03T20:55:11.521Z" }, + { url = "https://files.pythonhosted.org/packages/9d/22/ead4a4abc7c59a4d882662aa292ca02c8b617f30b6e163bc1728879e9353/regex-2026.4.4-cp313-cp313t-win_arm64.whl", hash = "sha256:586b89cdadf7d67bf86ae3342a4dcd2b8d70a832d90c18a0ae955105caf34dbe", size = 272440, upload-time = "2026-04-03T20:55:13.365Z" }, + { url = "https://files.pythonhosted.org/packages/f0/f5/ed97c2dc47b5fbd4b73c0d7d75f9ebc8eca139f2bbef476bba35f28c0a77/regex-2026.4.4-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:2da82d643fa698e5e5210e54af90181603d5853cf469f5eedf9bfc8f59b4b8c7", size = 490343, upload-time = "2026-04-03T20:55:15.241Z" }, + { url = "https://files.pythonhosted.org/packages/80/e9/de4828a7385ec166d673a5790ad06ac48cdaa98bc0960108dd4b9cc1aef7/regex-2026.4.4-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:54a1189ad9d9357760557c91103d5e421f0a2dabe68a5cdf9103d0dcf4e00752", size = 291909, upload-time = "2026-04-03T20:55:17.558Z" }, + { url = "https://files.pythonhosted.org/packages/b4/d6/5cfbfc97f3201a4d24b596a77957e092030dcc4205894bc035cedcfce62f/regex-2026.4.4-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:76d67d5afb1fe402d10a6403bae668d000441e2ab115191a804287d53b772951", size = 289692, upload-time = "2026-04-03T20:55:20.561Z" }, + { url = "https://files.pythonhosted.org/packages/8e/ac/f2212d9fd56fe897e36d0110ba30ba2d247bd6410c5bd98499c7e5a1e1f2/regex-2026.4.4-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e7cd3e4ee8d80447a83bbc9ab0c8459781fa77087f856c3e740d7763be0df27f", size = 796979, upload-time = "2026-04-03T20:55:22.56Z" }, + { url = "https://files.pythonhosted.org/packages/c9/e3/a016c12675fbac988a60c7e1c16e67823ff0bc016beb27bd7a001dbdabc6/regex-2026.4.4-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:2e19e18c568d2866d8b6a6dfad823db86193503f90823a8f66689315ba28fbe8", size = 866744, upload-time = "2026-04-03T20:55:24.646Z" }, + { url = "https://files.pythonhosted.org/packages/af/a4/0b90ca4cf17adc3cb43de80ec71018c37c88ad64987e8d0d481a95ca60b5/regex-2026.4.4-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:7698a6f38730fd1385d390d1ed07bb13dce39aa616aca6a6d89bea178464b9a4", size = 911613, upload-time = "2026-04-03T20:55:27.033Z" }, + { url = "https://files.pythonhosted.org/packages/8e/3b/2b3dac0b82d41ab43aa87c6ecde63d71189d03fe8854b8ca455a315edac3/regex-2026.4.4-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:173a66f3651cdb761018078e2d9487f4cf971232c990035ec0eb1cdc6bf929a9", size = 800551, upload-time = "2026-04-03T20:55:29.532Z" }, + { url = "https://files.pythonhosted.org/packages/25/fe/5365eb7aa0e753c4b5957815c321519ecab033c279c60e1b1ae2367fa810/regex-2026.4.4-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:fa7922bbb2cc84fa062d37723f199d4c0cd200245ce269c05db82d904db66b83", size = 776911, upload-time = "2026-04-03T20:55:31.526Z" }, + { url = "https://files.pythonhosted.org/packages/aa/b3/7fb0072156bba065e3b778a7bc7b0a6328212be5dd6a86fd207e0c4f2dab/regex-2026.4.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:59f67cd0a0acaf0e564c20bbd7f767286f23e91e2572c5703bf3e56ea7557edb", size = 785751, upload-time = "2026-04-03T20:55:33.797Z" }, + { url = "https://files.pythonhosted.org/packages/02/1a/9f83677eb699273e56e858f7bd95acdbee376d42f59e8bfca2fd80d79df3/regex-2026.4.4-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:475e50f3f73f73614f7cba5524d6de49dee269df00272a1b85e3d19f6d498465", size = 860484, upload-time = "2026-04-03T20:55:35.745Z" }, + { url = "https://files.pythonhosted.org/packages/3b/7a/93937507b61cfcff8b4c5857f1b452852b09f741daa9acae15c971d8554e/regex-2026.4.4-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:a1c0c7d67b64d85ac2e1879923bad2f08a08f3004055f2f406ef73c850114bd4", size = 765939, upload-time = "2026-04-03T20:55:37.972Z" }, + { url = "https://files.pythonhosted.org/packages/86/ea/81a7f968a351c6552b1670ead861e2a385be730ee28402233020c67f9e0f/regex-2026.4.4-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:1371c2ccbb744d66ee63631cc9ca12aa233d5749972626b68fe1a649dd98e566", size = 851417, upload-time = "2026-04-03T20:55:39.92Z" }, + { url = "https://files.pythonhosted.org/packages/4c/7e/323c18ce4b5b8f44517a36342961a0306e931e499febbd876bb149d900f0/regex-2026.4.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:59968142787042db793348a3f5b918cf24ced1f23247328530e063f89c128a95", size = 789056, upload-time = "2026-04-03T20:55:42.303Z" }, + { url = "https://files.pythonhosted.org/packages/c0/af/e7510f9b11b1913b0cd44eddb784b2d650b2af6515bfce4cffcc5bfd1d38/regex-2026.4.4-cp314-cp314-win32.whl", hash = "sha256:59efe72d37fd5a91e373e5146f187f921f365f4abc1249a5ab446a60f30dd5f8", size = 272130, upload-time = "2026-04-03T20:55:44.995Z" }, + { url = "https://files.pythonhosted.org/packages/9a/51/57dae534c915e2d3a21490e88836fa2ae79dde3b66255ecc0c0a155d2c10/regex-2026.4.4-cp314-cp314-win_amd64.whl", hash = "sha256:e0aab3ff447845049d676827d2ff714aab4f73f340e155b7de7458cf53baa5a4", size = 280992, upload-time = "2026-04-03T20:55:47.316Z" }, + { url = "https://files.pythonhosted.org/packages/0a/5e/abaf9f4c3792e34edb1434f06717fae2b07888d85cb5cec29f9204931bf8/regex-2026.4.4-cp314-cp314-win_arm64.whl", hash = "sha256:a7a5bb6aa0cf62208bb4fa079b0c756734f8ad0e333b425732e8609bd51ee22f", size = 273563, upload-time = "2026-04-03T20:55:49.273Z" }, + { url = "https://files.pythonhosted.org/packages/ff/06/35da85f9f217b9538b99cbb170738993bcc3b23784322decb77619f11502/regex-2026.4.4-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:97850d0638391bdc7d35dc1c1039974dcb921eaafa8cc935ae4d7f272b1d60b3", size = 494191, upload-time = "2026-04-03T20:55:51.258Z" }, + { url = "https://files.pythonhosted.org/packages/54/5b/1bc35f479eef8285c4baf88d8c002023efdeebb7b44a8735b36195486ae7/regex-2026.4.4-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:ee7337f88f2a580679f7bbfe69dc86c043954f9f9c541012f49abc554a962f2e", size = 293877, upload-time = "2026-04-03T20:55:53.214Z" }, + { url = "https://files.pythonhosted.org/packages/39/5b/f53b9ad17480b3ddd14c90da04bfb55ac6894b129e5dea87bcaf7d00e336/regex-2026.4.4-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:7429f4e6192c11d659900c0648ba8776243bf396ab95558b8c51a345afeddde6", size = 292410, upload-time = "2026-04-03T20:55:55.736Z" }, + { url = "https://files.pythonhosted.org/packages/bb/56/52377f59f60a7c51aa4161eecf0b6032c20b461805aca051250da435ffc9/regex-2026.4.4-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:dc4f10fbd5dd13dcf4265b4cc07d69ca70280742870c97ae10093e3d66000359", size = 811831, upload-time = "2026-04-03T20:55:57.802Z" }, + { url = "https://files.pythonhosted.org/packages/dd/63/8026310bf066f702a9c361f83a8c9658f3fe4edb349f9c1e5d5273b7c40c/regex-2026.4.4-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:a152560af4f9742b96f3827090f866eeec5becd4765c8e0d3473d9d280e76a5a", size = 871199, upload-time = "2026-04-03T20:56:00.333Z" }, + { url = "https://files.pythonhosted.org/packages/20/9f/a514bbb00a466dbb506d43f187a04047f7be1505f10a9a15615ead5080ee/regex-2026.4.4-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:54170b3e95339f415d54651f97df3bff7434a663912f9358237941bbf9143f55", size = 917649, upload-time = "2026-04-03T20:56:02.445Z" }, + { url = "https://files.pythonhosted.org/packages/cb/6b/8399f68dd41a2030218839b9b18360d79b86d22b9fab5ef477c7f23ca67c/regex-2026.4.4-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:07f190d65f5a72dcb9cf7106bfc3d21e7a49dd2879eda2207b683f32165e4d99", size = 816388, upload-time = "2026-04-03T20:56:04.595Z" }, + { url = "https://files.pythonhosted.org/packages/1e/9c/103963f47c24339a483b05edd568594c2be486188f688c0170fd504b2948/regex-2026.4.4-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:9a2741ce5a29d3c84b0b94261ba630ab459a1b847a0d6beca7d62d188175c790", size = 785746, upload-time = "2026-04-03T20:56:07.13Z" }, + { url = "https://files.pythonhosted.org/packages/fa/ee/7f6054c0dec0cee3463c304405e4ff42e27cff05bf36fcb34be549ab17bd/regex-2026.4.4-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:b26c30df3a28fd9793113dac7385a4deb7294a06c0f760dd2b008bd49a9139bc", size = 801483, upload-time = "2026-04-03T20:56:09.365Z" }, + { url = "https://files.pythonhosted.org/packages/30/c2/51d3d941cf6070dc00c3338ecf138615fc3cce0421c3df6abe97a08af61a/regex-2026.4.4-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:421439d1bee44b19f4583ccf42670ca464ffb90e9fdc38d37f39d1ddd1e44f1f", size = 866331, upload-time = "2026-04-03T20:56:12.039Z" }, + { url = "https://files.pythonhosted.org/packages/16/e8/76d50dcc122ac33927d939f350eebcfe3dbcbda96913e03433fc36de5e63/regex-2026.4.4-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:b40379b53ecbc747fd9bdf4a0ea14eb8188ca1bd0f54f78893a39024b28f4863", size = 772673, upload-time = "2026-04-03T20:56:14.558Z" }, + { url = "https://files.pythonhosted.org/packages/a5/6e/5f6bf75e20ea6873d05ba4ec78378c375cbe08cdec571c83fbb01606e563/regex-2026.4.4-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:08c55c13d2eef54f73eeadc33146fb0baaa49e7335eb1aff6ae1324bf0ddbe4a", size = 857146, upload-time = "2026-04-03T20:56:16.663Z" }, + { url = "https://files.pythonhosted.org/packages/0b/33/3c76d9962949e487ebba353a18e89399f292287204ac8f2f4cfc3a51c233/regex-2026.4.4-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:9776b85f510062f5a75ef112afe5f494ef1635607bf1cc220c1391e9ac2f5e81", size = 803463, upload-time = "2026-04-03T20:56:18.923Z" }, + { url = "https://files.pythonhosted.org/packages/19/eb/ef32dcd2cb69b69bc0c3e55205bce94a7def48d495358946bc42186dcccc/regex-2026.4.4-cp314-cp314t-win32.whl", hash = "sha256:385edaebde5db5be103577afc8699fea73a0e36a734ba24870be7ffa61119d74", size = 275709, upload-time = "2026-04-03T20:56:20.996Z" }, + { url = "https://files.pythonhosted.org/packages/a0/86/c291bf740945acbf35ed7dbebf8e2eea2f3f78041f6bd7cdab80cb274dc0/regex-2026.4.4-cp314-cp314t-win_amd64.whl", hash = "sha256:5d354b18839328927832e2fa5f7c95b7a3ccc39e7a681529e1685898e6436d45", size = 285622, upload-time = "2026-04-03T20:56:23.641Z" }, + { url = "https://files.pythonhosted.org/packages/d5/e7/ec846d560ae6a597115153c02ca6138a7877a1748b2072d9521c10a93e58/regex-2026.4.4-cp314-cp314t-win_arm64.whl", hash = "sha256:af0384cb01a33600c49505c27c6c57ab0b27bf84a74e28524c92ca897ebdac9d", size = 275773, upload-time = "2026-04-03T20:56:26.07Z" }, +] + +[[package]] +name = "requests" +version = "2.33.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "certifi" }, + { name = "charset-normalizer" }, + { name = "idna" }, + { name = "urllib3" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/5f/a4/98b9c7c6428a668bf7e42ebb7c79d576a1c3c1e3ae2d47e674b468388871/requests-2.33.1.tar.gz", hash = "sha256:18817f8c57c6263968bc123d237e3b8b08ac046f5456bd1e307ee8f4250d3517", size = 134120, upload-time = "2026-03-30T16:09:15.531Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d7/8e/7540e8a2036f79a125c1d2ebadf69ed7901608859186c856fa0388ef4197/requests-2.33.1-py3-none-any.whl", hash = "sha256:4e6d1ef462f3626a1f0a0a9c42dd93c63bad33f9f1c1937509b8c5c8718ab56a", size = 64947, upload-time = "2026-03-30T16:09:13.83Z" }, +] + +[[package]] +name = "requests-toolbelt" +version = "1.0.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "requests" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/f3/61/d7545dafb7ac2230c70d38d31cbfe4cc64f7144dc41f6e4e4b78ecd9f5bb/requests-toolbelt-1.0.0.tar.gz", hash = "sha256:7681a0a3d047012b5bdc0ee37d7f8f07ebe76ab08caeccfc3921ce23c88d5bc6", size = 206888, upload-time = "2023-05-01T04:11:33.229Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/3f/51/d4db610ef29373b879047326cbf6fa98b6c1969d6f6dc423279de2b1be2c/requests_toolbelt-1.0.0-py2.py3-none-any.whl", hash = "sha256:cccfdd665f0a24fcf4726e690f65639d272bb0637b9b92dfd91a5568ccf6bd06", size = 54481, upload-time = "2023-05-01T04:11:28.427Z" }, +] + +[[package]] +name = "six" +version = "1.17.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/94/e7/b2c673351809dca68a0e064b6af791aa332cf192da575fd474ed7d6f16a2/six-1.17.0.tar.gz", hash = "sha256:ff70335d468e7eb6ec65b95b99d3a2836546063f63acc5171de367e834932a81", size = 34031, upload-time = "2024-12-04T17:35:28.174Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl", hash = "sha256:4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274", size = 11050, upload-time = "2024-12-04T17:35:26.475Z" }, +] + +[[package]] +name = "sniffio" +version = "1.3.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/a2/87/a6771e1546d97e7e041b6ae58d80074f81b7d5121207425c964ddf5cfdbd/sniffio-1.3.1.tar.gz", hash = "sha256:f4324edc670a0f49750a81b895f35c3adb843cca46f0530f79fc1babb23789dc", size = 20372, upload-time = "2024-02-25T23:20:04.057Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl", hash = "sha256:2f6da418d1f1e0fddd844478f41680e794e6051915791a034ff65e5f100525a2", size = 10235, upload-time = "2024-02-25T23:20:01.196Z" }, +] + +[[package]] +name = "tenacity" +version = "9.1.4" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/47/c6/ee486fd809e357697ee8a44d3d69222b344920433d3b6666ccd9b374630c/tenacity-9.1.4.tar.gz", hash = "sha256:adb31d4c263f2bd041081ab33b498309a57c77f9acf2db65aadf0898179cf93a", size = 49413, upload-time = "2026-02-07T10:45:33.841Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d7/c1/eb8f9debc45d3b7918a32ab756658a0904732f75e555402972246b0b8e71/tenacity-9.1.4-py3-none-any.whl", hash = "sha256:6095a360c919085f28c6527de529e76a06ad89b23659fa881ae0649b867a9d55", size = 28926, upload-time = "2026-02-07T10:45:32.24Z" }, +] + +[[package]] +name = "tiktoken" +version = "0.12.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "regex" }, + { name = "requests" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/7d/ab/4d017d0f76ec3171d469d80fc03dfbb4e48a4bcaddaa831b31d526f05edc/tiktoken-0.12.0.tar.gz", hash = "sha256:b18ba7ee2b093863978fcb14f74b3707cdc8d4d4d3836853ce7ec60772139931", size = 37806, upload-time = "2025-10-06T20:22:45.419Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a4/85/be65d39d6b647c79800fd9d29241d081d4eeb06271f383bb87200d74cf76/tiktoken-0.12.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b97f74aca0d78a1ff21b8cd9e9925714c15a9236d6ceacf5c7327c117e6e21e8", size = 1050728, upload-time = "2025-10-06T20:21:52.756Z" }, + { url = "https://files.pythonhosted.org/packages/4a/42/6573e9129bc55c9bf7300b3a35bef2c6b9117018acca0dc760ac2d93dffe/tiktoken-0.12.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:2b90f5ad190a4bb7c3eb30c5fa32e1e182ca1ca79f05e49b448438c3e225a49b", size = 994049, upload-time = "2025-10-06T20:21:53.782Z" }, + { url = "https://files.pythonhosted.org/packages/66/c5/ed88504d2f4a5fd6856990b230b56d85a777feab84e6129af0822f5d0f70/tiktoken-0.12.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:65b26c7a780e2139e73acc193e5c63ac754021f160df919add909c1492c0fb37", size = 1129008, upload-time = "2025-10-06T20:21:54.832Z" }, + { url = "https://files.pythonhosted.org/packages/f4/90/3dae6cc5436137ebd38944d396b5849e167896fc2073da643a49f372dc4f/tiktoken-0.12.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:edde1ec917dfd21c1f2f8046b86348b0f54a2c0547f68149d8600859598769ad", size = 1152665, upload-time = "2025-10-06T20:21:56.129Z" }, + { url = "https://files.pythonhosted.org/packages/a3/fe/26df24ce53ffde419a42f5f53d755b995c9318908288c17ec3f3448313a3/tiktoken-0.12.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:35a2f8ddd3824608b3d650a000c1ef71f730d0c56486845705a8248da00f9fe5", size = 1194230, upload-time = "2025-10-06T20:21:57.546Z" }, + { url = "https://files.pythonhosted.org/packages/20/cc/b064cae1a0e9fac84b0d2c46b89f4e57051a5f41324e385d10225a984c24/tiktoken-0.12.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:83d16643edb7fa2c99eff2ab7733508aae1eebb03d5dfc46f5565862810f24e3", size = 1254688, upload-time = "2025-10-06T20:21:58.619Z" }, + { url = "https://files.pythonhosted.org/packages/81/10/b8523105c590c5b8349f2587e2fdfe51a69544bd5a76295fc20f2374f470/tiktoken-0.12.0-cp312-cp312-win_amd64.whl", hash = "sha256:ffc5288f34a8bc02e1ea7047b8d041104791d2ddbf42d1e5fa07822cbffe16bd", size = 878694, upload-time = "2025-10-06T20:21:59.876Z" }, + { url = "https://files.pythonhosted.org/packages/00/61/441588ee21e6b5cdf59d6870f86beb9789e532ee9718c251b391b70c68d6/tiktoken-0.12.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:775c2c55de2310cc1bc9a3ad8826761cbdc87770e586fd7b6da7d4589e13dab3", size = 1050802, upload-time = "2025-10-06T20:22:00.96Z" }, + { url = "https://files.pythonhosted.org/packages/1f/05/dcf94486d5c5c8d34496abe271ac76c5b785507c8eae71b3708f1ad9b45a/tiktoken-0.12.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:a01b12f69052fbe4b080a2cfb867c4de12c704b56178edf1d1d7b273561db160", size = 993995, upload-time = "2025-10-06T20:22:02.788Z" }, + { url = "https://files.pythonhosted.org/packages/a0/70/5163fe5359b943f8db9946b62f19be2305de8c3d78a16f629d4165e2f40e/tiktoken-0.12.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:01d99484dc93b129cd0964f9d34eee953f2737301f18b3c7257bf368d7615baa", size = 1128948, upload-time = "2025-10-06T20:22:03.814Z" }, + { url = "https://files.pythonhosted.org/packages/0c/da/c028aa0babf77315e1cef357d4d768800c5f8a6de04d0eac0f377cb619fa/tiktoken-0.12.0-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:4a1a4fcd021f022bfc81904a911d3df0f6543b9e7627b51411da75ff2fe7a1be", size = 1151986, upload-time = "2025-10-06T20:22:05.173Z" }, + { url = "https://files.pythonhosted.org/packages/a0/5a/886b108b766aa53e295f7216b509be95eb7d60b166049ce2c58416b25f2a/tiktoken-0.12.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:981a81e39812d57031efdc9ec59fa32b2a5a5524d20d4776574c4b4bd2e9014a", size = 1194222, upload-time = "2025-10-06T20:22:06.265Z" }, + { url = "https://files.pythonhosted.org/packages/f4/f8/4db272048397636ac7a078d22773dd2795b1becee7bc4922fe6207288d57/tiktoken-0.12.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:9baf52f84a3f42eef3ff4e754a0db79a13a27921b457ca9832cf944c6be4f8f3", size = 1255097, upload-time = "2025-10-06T20:22:07.403Z" }, + { url = "https://files.pythonhosted.org/packages/8e/32/45d02e2e0ea2be3a9ed22afc47d93741247e75018aac967b713b2941f8ea/tiktoken-0.12.0-cp313-cp313-win_amd64.whl", hash = "sha256:b8a0cd0c789a61f31bf44851defbd609e8dd1e2c8589c614cc1060940ef1f697", size = 879117, upload-time = "2025-10-06T20:22:08.418Z" }, + { url = "https://files.pythonhosted.org/packages/ce/76/994fc868f88e016e6d05b0da5ac24582a14c47893f4474c3e9744283f1d5/tiktoken-0.12.0-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:d5f89ea5680066b68bcb797ae85219c72916c922ef0fcdd3480c7d2315ffff16", size = 1050309, upload-time = "2025-10-06T20:22:10.939Z" }, + { url = "https://files.pythonhosted.org/packages/f6/b8/57ef1456504c43a849821920d582a738a461b76a047f352f18c0b26c6516/tiktoken-0.12.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:b4e7ed1c6a7a8a60a3230965bdedba8cc58f68926b835e519341413370e0399a", size = 993712, upload-time = "2025-10-06T20:22:12.115Z" }, + { url = "https://files.pythonhosted.org/packages/72/90/13da56f664286ffbae9dbcfadcc625439142675845baa62715e49b87b68b/tiktoken-0.12.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:fc530a28591a2d74bce821d10b418b26a094bf33839e69042a6e86ddb7a7fb27", size = 1128725, upload-time = "2025-10-06T20:22:13.541Z" }, + { url = "https://files.pythonhosted.org/packages/05/df/4f80030d44682235bdaecd7346c90f67ae87ec8f3df4a3442cb53834f7e4/tiktoken-0.12.0-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:06a9f4f49884139013b138920a4c393aa6556b2f8f536345f11819389c703ebb", size = 1151875, upload-time = "2025-10-06T20:22:14.559Z" }, + { url = "https://files.pythonhosted.org/packages/22/1f/ae535223a8c4ef4c0c1192e3f9b82da660be9eb66b9279e95c99288e9dab/tiktoken-0.12.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:04f0e6a985d95913cabc96a741c5ffec525a2c72e9df086ff17ebe35985c800e", size = 1194451, upload-time = "2025-10-06T20:22:15.545Z" }, + { url = "https://files.pythonhosted.org/packages/78/a7/f8ead382fce0243cb625c4f266e66c27f65ae65ee9e77f59ea1653b6d730/tiktoken-0.12.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:0ee8f9ae00c41770b5f9b0bb1235474768884ae157de3beb5439ca0fd70f3e25", size = 1253794, upload-time = "2025-10-06T20:22:16.624Z" }, + { url = "https://files.pythonhosted.org/packages/93/e0/6cc82a562bc6365785a3ff0af27a2a092d57c47d7a81d9e2295d8c36f011/tiktoken-0.12.0-cp313-cp313t-win_amd64.whl", hash = "sha256:dc2dd125a62cb2b3d858484d6c614d136b5b848976794edfb63688d539b8b93f", size = 878777, upload-time = "2025-10-06T20:22:18.036Z" }, + { url = "https://files.pythonhosted.org/packages/72/05/3abc1db5d2c9aadc4d2c76fa5640134e475e58d9fbb82b5c535dc0de9b01/tiktoken-0.12.0-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:a90388128df3b3abeb2bfd1895b0681412a8d7dc644142519e6f0a97c2111646", size = 1050188, upload-time = "2025-10-06T20:22:19.563Z" }, + { url = "https://files.pythonhosted.org/packages/e3/7b/50c2f060412202d6c95f32b20755c7a6273543b125c0985d6fa9465105af/tiktoken-0.12.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:da900aa0ad52247d8794e307d6446bd3cdea8e192769b56276695d34d2c9aa88", size = 993978, upload-time = "2025-10-06T20:22:20.702Z" }, + { url = "https://files.pythonhosted.org/packages/14/27/bf795595a2b897e271771cd31cb847d479073497344c637966bdf2853da1/tiktoken-0.12.0-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:285ba9d73ea0d6171e7f9407039a290ca77efcdb026be7769dccc01d2c8d7fff", size = 1129271, upload-time = "2025-10-06T20:22:22.06Z" }, + { url = "https://files.pythonhosted.org/packages/f5/de/9341a6d7a8f1b448573bbf3425fa57669ac58258a667eb48a25dfe916d70/tiktoken-0.12.0-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:d186a5c60c6a0213f04a7a802264083dea1bbde92a2d4c7069e1a56630aef830", size = 1151216, upload-time = "2025-10-06T20:22:23.085Z" }, + { url = "https://files.pythonhosted.org/packages/75/0d/881866647b8d1be4d67cb24e50d0c26f9f807f994aa1510cb9ba2fe5f612/tiktoken-0.12.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:604831189bd05480f2b885ecd2d1986dc7686f609de48208ebbbddeea071fc0b", size = 1194860, upload-time = "2025-10-06T20:22:24.602Z" }, + { url = "https://files.pythonhosted.org/packages/b3/1e/b651ec3059474dab649b8d5b69f5c65cd8fcd8918568c1935bd4136c9392/tiktoken-0.12.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:8f317e8530bb3a222547b85a58583238c8f74fd7a7408305f9f63246d1a0958b", size = 1254567, upload-time = "2025-10-06T20:22:25.671Z" }, + { url = "https://files.pythonhosted.org/packages/80/57/ce64fd16ac390fafde001268c364d559447ba09b509181b2808622420eec/tiktoken-0.12.0-cp314-cp314-win_amd64.whl", hash = "sha256:399c3dd672a6406719d84442299a490420b458c44d3ae65516302a99675888f3", size = 921067, upload-time = "2025-10-06T20:22:26.753Z" }, + { url = "https://files.pythonhosted.org/packages/ac/a4/72eed53e8976a099539cdd5eb36f241987212c29629d0a52c305173e0a68/tiktoken-0.12.0-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:c2c714c72bc00a38ca969dae79e8266ddec999c7ceccd603cc4f0d04ccd76365", size = 1050473, upload-time = "2025-10-06T20:22:27.775Z" }, + { url = "https://files.pythonhosted.org/packages/e6/d7/0110b8f54c008466b19672c615f2168896b83706a6611ba6e47313dbc6e9/tiktoken-0.12.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:cbb9a3ba275165a2cb0f9a83f5d7025afe6b9d0ab01a22b50f0e74fee2ad253e", size = 993855, upload-time = "2025-10-06T20:22:28.799Z" }, + { url = "https://files.pythonhosted.org/packages/5f/77/4f268c41a3957c418b084dd576ea2fad2e95da0d8e1ab705372892c2ca22/tiktoken-0.12.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:dfdfaa5ffff8993a3af94d1125870b1d27aed7cb97aa7eb8c1cefdbc87dbee63", size = 1129022, upload-time = "2025-10-06T20:22:29.981Z" }, + { url = "https://files.pythonhosted.org/packages/4e/2b/fc46c90fe5028bd094cd6ee25a7db321cb91d45dc87531e2bdbb26b4867a/tiktoken-0.12.0-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:584c3ad3d0c74f5269906eb8a659c8bfc6144a52895d9261cdaf90a0ae5f4de0", size = 1150736, upload-time = "2025-10-06T20:22:30.996Z" }, + { url = "https://files.pythonhosted.org/packages/28/c0/3c7a39ff68022ddfd7d93f3337ad90389a342f761c4d71de99a3ccc57857/tiktoken-0.12.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:54c891b416a0e36b8e2045b12b33dd66fb34a4fe7965565f1b482da50da3e86a", size = 1194908, upload-time = "2025-10-06T20:22:32.073Z" }, + { url = "https://files.pythonhosted.org/packages/ab/0d/c1ad6f4016a3968c048545f5d9b8ffebf577774b2ede3e2e352553b685fe/tiktoken-0.12.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:5edb8743b88d5be814b1a8a8854494719080c28faaa1ccbef02e87354fe71ef0", size = 1253706, upload-time = "2025-10-06T20:22:33.385Z" }, + { url = "https://files.pythonhosted.org/packages/af/df/c7891ef9d2712ad774777271d39fdef63941ffba0a9d59b7ad1fd2765e57/tiktoken-0.12.0-cp314-cp314t-win_amd64.whl", hash = "sha256:f61c0aea5565ac82e2ec50a05e02a6c44734e91b51c10510b084ea1b8e633a71", size = 920667, upload-time = "2025-10-06T20:22:34.444Z" }, +] + +[[package]] +name = "tqdm" +version = "4.67.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama", marker = "sys_platform == 'win32'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/09/a9/6ba95a270c6f1fbcd8dac228323f2777d886cb206987444e4bce66338dd4/tqdm-4.67.3.tar.gz", hash = "sha256:7d825f03f89244ef73f1d4ce193cb1774a8179fd96f31d7e1dcde62092b960bb", size = 169598, upload-time = "2026-02-03T17:35:53.048Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" }, +] + +[[package]] +name = "typing-extensions" +version = "4.15.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/72/94/1a15dd82efb362ac84269196e94cf00f187f7ed21c242792a923cdb1c61f/typing_extensions-4.15.0.tar.gz", hash = "sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466", size = 109391, upload-time = "2025-08-25T13:49:26.313Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614, upload-time = "2025-08-25T13:49:24.86Z" }, +] + +[[package]] +name = "typing-inspection" +version = "0.4.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/55/e3/70399cb7dd41c10ac53367ae42139cf4b1ca5f36bb3dc6c9d33acdb43655/typing_inspection-0.4.2.tar.gz", hash = "sha256:ba561c48a67c5958007083d386c3295464928b01faa735ab8547c5692e87f464", size = 75949, upload-time = "2025-10-01T02:14:41.687Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/dc/9b/47798a6c91d8bdb567fe2698fe81e0c6b7cb7ef4d13da4114b41d239f65d/typing_inspection-0.4.2-py3-none-any.whl", hash = "sha256:4ed1cacbdc298c220f1bd249ed5287caa16f34d44ef4e9c3d0cbad5b521545e7", size = 14611, upload-time = "2025-10-01T02:14:40.154Z" }, +] + +[[package]] +name = "urllib3" +version = "2.6.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/c7/24/5f1b3bdffd70275f6661c76461e25f024d5a38a46f04aaca912426a2b1d3/urllib3-2.6.3.tar.gz", hash = "sha256:1b62b6884944a57dbe321509ab94fd4d3b307075e0c2eae991ac71ee15ad38ed", size = 435556, upload-time = "2026-01-07T16:24:43.925Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/39/08/aaaad47bc4e9dc8c725e68f9d04865dbcb2052843ff09c97b08904852d84/urllib3-2.6.3-py3-none-any.whl", hash = "sha256:bf272323e553dfb2e87d9bfd225ca7b0f467b919d7bbd355436d3fd37cb0acd4", size = 131584, upload-time = "2026-01-07T16:24:42.685Z" }, +] + +[[package]] +name = "uuid-utils" +version = "0.14.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/7b/d1/38a573f0c631c062cf42fa1f5d021d4dd3c31fb23e4376e4b56b0c9fbbed/uuid_utils-0.14.1.tar.gz", hash = "sha256:9bfc95f64af80ccf129c604fb6b8ca66c6f256451e32bc4570f760e4309c9b69", size = 22195, upload-time = "2026-02-20T22:50:38.833Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/43/b7/add4363039a34506a58457d96d4aa2126061df3a143eb4d042aedd6a2e76/uuid_utils-0.14.1-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:93a3b5dc798a54a1feb693f2d1cb4cf08258c32ff05ae4929b5f0a2ca624a4f0", size = 604679, upload-time = "2026-02-20T22:50:27.469Z" }, + { url = "https://files.pythonhosted.org/packages/dd/84/d1d0bef50d9e66d31b2019997c741b42274d53dde2e001b7a83e9511c339/uuid_utils-0.14.1-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:ccd65a4b8e83af23eae5e56d88034b2fe7264f465d3e830845f10d1591b81741", size = 309346, upload-time = "2026-02-20T22:50:31.857Z" }, + { url = "https://files.pythonhosted.org/packages/ef/ed/b6d6fd52a6636d7c3eddf97d68da50910bf17cd5ac221992506fb56cf12e/uuid_utils-0.14.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b56b0cacd81583834820588378e432b0696186683b813058b707aedc1e16c4b1", size = 344714, upload-time = "2026-02-20T22:50:42.642Z" }, + { url = "https://files.pythonhosted.org/packages/a8/a7/a19a1719fb626fe0b31882db36056d44fe904dc0cf15b06fdf56b2679cf7/uuid_utils-0.14.1-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:bb3cf14de789097320a3c56bfdfdd51b1225d11d67298afbedee7e84e3837c96", size = 350914, upload-time = "2026-02-20T22:50:36.487Z" }, + { url = "https://files.pythonhosted.org/packages/1d/fc/f6690e667fdc3bb1a73f57951f97497771c56fe23e3d302d7404be394d4f/uuid_utils-0.14.1-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:60e0854a90d67f4b0cc6e54773deb8be618f4c9bad98d3326f081423b5d14fae", size = 482609, upload-time = "2026-02-20T22:50:37.511Z" }, + { url = "https://files.pythonhosted.org/packages/54/6e/dcd3fa031320921a12ec7b4672dea3bd1dd90ddffa363a91831ba834d559/uuid_utils-0.14.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ce6743ba194de3910b5feb1a62590cd2587e33a73ab6af8a01b642ceb5055862", size = 345699, upload-time = "2026-02-20T22:50:46.87Z" }, + { url = "https://files.pythonhosted.org/packages/04/28/e5220204b58b44ac0047226a9d016a113fde039280cc8732d9e6da43b39f/uuid_utils-0.14.1-cp39-abi3-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:043fb58fde6cf1620a6c066382f04f87a8e74feb0f95a585e4ed46f5d44af57b", size = 372205, upload-time = "2026-02-20T22:50:28.438Z" }, + { url = "https://files.pythonhosted.org/packages/c7/d9/3d2eb98af94b8dfffc82b6a33b4dfc87b0a5de2c68a28f6dde0db1f8681b/uuid_utils-0.14.1-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:c915d53f22945e55fe0d3d3b0b87fd965a57f5fd15666fd92d6593a73b1dd297", size = 521836, upload-time = "2026-02-20T22:50:23.057Z" }, + { url = "https://files.pythonhosted.org/packages/a8/15/0eb106cc6fe182f7577bc0ab6e2f0a40be247f35c5e297dbf7bbc460bd02/uuid_utils-0.14.1-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:0972488e3f9b449e83f006ead5a0e0a33ad4a13e4462e865b7c286ab7d7566a3", size = 625260, upload-time = "2026-02-20T22:50:25.949Z" }, + { url = "https://files.pythonhosted.org/packages/3c/17/f539507091334b109e7496830af2f093d9fc8082411eafd3ece58af1f8ba/uuid_utils-0.14.1-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:1c238812ae0c8ffe77d8d447a32c6dfd058ea4631246b08b5a71df586ff08531", size = 587824, upload-time = "2026-02-20T22:50:35.225Z" }, + { url = "https://files.pythonhosted.org/packages/2e/c2/d37a7b2e41f153519367d4db01f0526e0d4b06f1a4a87f1c5dfca5d70a8b/uuid_utils-0.14.1-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:bec8f8ef627af86abf8298e7ec50926627e29b34fa907fcfbedb45aaa72bca43", size = 551407, upload-time = "2026-02-20T22:50:44.915Z" }, + { url = "https://files.pythonhosted.org/packages/65/36/2d24b2cbe78547c6532da33fb8613debd3126eccc33a6374ab788f5e46e9/uuid_utils-0.14.1-cp39-abi3-win32.whl", hash = "sha256:b54d6aa6252d96bac1fdbc80d26ba71bad9f220b2724d692ad2f2310c22ef523", size = 183476, upload-time = "2026-02-20T22:50:32.745Z" }, + { url = "https://files.pythonhosted.org/packages/83/92/2d7e90df8b1a69ec4cff33243ce02b7a62f926ef9e2f0eca5a026889cd73/uuid_utils-0.14.1-cp39-abi3-win_amd64.whl", hash = "sha256:fc27638c2ce267a0ce3e06828aff786f91367f093c80625ee21dad0208e0f5ba", size = 187147, upload-time = "2026-02-20T22:50:45.807Z" }, + { url = "https://files.pythonhosted.org/packages/d9/26/529f4beee17e5248e37e0bc17a2761d34c0fa3b1e5729c88adb2065bae6e/uuid_utils-0.14.1-cp39-abi3-win_arm64.whl", hash = "sha256:b04cb49b42afbc4ff8dbc60cf054930afc479d6f4dd7f1ec3bbe5dbfdde06b7a", size = 188132, upload-time = "2026-02-20T22:50:41.718Z" }, +] + +[[package]] +name = "xxhash" +version = "3.6.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/02/84/30869e01909fb37a6cc7e18688ee8bf1e42d57e7e0777636bd47524c43c7/xxhash-3.6.0.tar.gz", hash = "sha256:f0162a78b13a0d7617b2845b90c763339d1f1d82bb04a4b07f4ab535cc5e05d6", size = 85160, upload-time = "2025-10-02T14:37:08.097Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/9a/07/d9412f3d7d462347e4511181dea65e47e0d0e16e26fbee2ea86a2aefb657/xxhash-3.6.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:01362c4331775398e7bb34e3ab403bc9ee9f7c497bc7dee6272114055277dd3c", size = 32744, upload-time = "2025-10-02T14:34:34.622Z" }, + { url = "https://files.pythonhosted.org/packages/79/35/0429ee11d035fc33abe32dca1b2b69e8c18d236547b9a9b72c1929189b9a/xxhash-3.6.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:b7b2df81a23f8cb99656378e72501b2cb41b1827c0f5a86f87d6b06b69f9f204", size = 30816, upload-time = "2025-10-02T14:34:36.043Z" }, + { url = "https://files.pythonhosted.org/packages/b7/f2/57eb99aa0f7d98624c0932c5b9a170e1806406cdbcdb510546634a1359e0/xxhash-3.6.0-cp312-cp312-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:dc94790144e66b14f67b10ac8ed75b39ca47536bf8800eb7c24b50271ea0c490", size = 194035, upload-time = "2025-10-02T14:34:37.354Z" }, + { url = "https://files.pythonhosted.org/packages/4c/ed/6224ba353690d73af7a3f1c7cdb1fc1b002e38f783cb991ae338e1eb3d79/xxhash-3.6.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:93f107c673bccf0d592cdba077dedaf52fe7f42dcd7676eba1f6d6f0c3efffd2", size = 212914, upload-time = "2025-10-02T14:34:38.6Z" }, + { url = "https://files.pythonhosted.org/packages/38/86/fb6b6130d8dd6b8942cc17ab4d90e223653a89aa32ad2776f8af7064ed13/xxhash-3.6.0-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:2aa5ee3444c25b69813663c9f8067dcfaa2e126dc55e8dddf40f4d1c25d7effa", size = 212163, upload-time = "2025-10-02T14:34:39.872Z" }, + { url = "https://files.pythonhosted.org/packages/ee/dc/e84875682b0593e884ad73b2d40767b5790d417bde603cceb6878901d647/xxhash-3.6.0-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:f7f99123f0e1194fa59cc69ad46dbae2e07becec5df50a0509a808f90a0f03f0", size = 445411, upload-time = "2025-10-02T14:34:41.569Z" }, + { url = "https://files.pythonhosted.org/packages/11/4f/426f91b96701ec2f37bb2b8cec664eff4f658a11f3fa9d94f0a887ea6d2b/xxhash-3.6.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:49e03e6fe2cac4a1bc64952dd250cf0dbc5ef4ebb7b8d96bce82e2de163c82a2", size = 193883, upload-time = "2025-10-02T14:34:43.249Z" }, + { url = "https://files.pythonhosted.org/packages/53/5a/ddbb83eee8e28b778eacfc5a85c969673e4023cdeedcfcef61f36731610b/xxhash-3.6.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:bd17fede52a17a4f9a7bc4472a5867cb0b160deeb431795c0e4abe158bc784e9", size = 210392, upload-time = "2025-10-02T14:34:45.042Z" }, + { url = "https://files.pythonhosted.org/packages/1e/c2/ff69efd07c8c074ccdf0a4f36fcdd3d27363665bcdf4ba399abebe643465/xxhash-3.6.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:6fb5f5476bef678f69db04f2bd1efbed3030d2aba305b0fc1773645f187d6a4e", size = 197898, upload-time = "2025-10-02T14:34:46.302Z" }, + { url = "https://files.pythonhosted.org/packages/58/ca/faa05ac19b3b622c7c9317ac3e23954187516298a091eb02c976d0d3dd45/xxhash-3.6.0-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:843b52f6d88071f87eba1631b684fcb4b2068cd2180a0224122fe4ef011a9374", size = 210655, upload-time = "2025-10-02T14:34:47.571Z" }, + { url = "https://files.pythonhosted.org/packages/d4/7a/06aa7482345480cc0cb597f5c875b11a82c3953f534394f620b0be2f700c/xxhash-3.6.0-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:7d14a6cfaf03b1b6f5f9790f76880601ccc7896aff7ab9cd8978a939c1eb7e0d", size = 414001, upload-time = "2025-10-02T14:34:49.273Z" }, + { url = "https://files.pythonhosted.org/packages/23/07/63ffb386cd47029aa2916b3d2f454e6cc5b9f5c5ada3790377d5430084e7/xxhash-3.6.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:418daf3db71e1413cfe211c2f9a528456936645c17f46b5204705581a45390ae", size = 191431, upload-time = "2025-10-02T14:34:50.798Z" }, + { url = "https://files.pythonhosted.org/packages/0f/93/14fde614cadb4ddf5e7cebf8918b7e8fac5ae7861c1875964f17e678205c/xxhash-3.6.0-cp312-cp312-win32.whl", hash = "sha256:50fc255f39428a27299c20e280d6193d8b63b8ef8028995323bf834a026b4fbb", size = 30617, upload-time = "2025-10-02T14:34:51.954Z" }, + { url = "https://files.pythonhosted.org/packages/13/5d/0d125536cbe7565a83d06e43783389ecae0c0f2ed037b48ede185de477c0/xxhash-3.6.0-cp312-cp312-win_amd64.whl", hash = "sha256:c0f2ab8c715630565ab8991b536ecded9416d615538be8ecddce43ccf26cbc7c", size = 31534, upload-time = "2025-10-02T14:34:53.276Z" }, + { url = "https://files.pythonhosted.org/packages/54/85/6ec269b0952ec7e36ba019125982cf11d91256a778c7c3f98a4c5043d283/xxhash-3.6.0-cp312-cp312-win_arm64.whl", hash = "sha256:eae5c13f3bc455a3bbb68bdc513912dc7356de7e2280363ea235f71f54064829", size = 27876, upload-time = "2025-10-02T14:34:54.371Z" }, + { url = "https://files.pythonhosted.org/packages/33/76/35d05267ac82f53ae9b0e554da7c5e281ee61f3cad44c743f0fcd354f211/xxhash-3.6.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:599e64ba7f67472481ceb6ee80fa3bd828fd61ba59fb11475572cc5ee52b89ec", size = 32738, upload-time = "2025-10-02T14:34:55.839Z" }, + { url = "https://files.pythonhosted.org/packages/31/a8/3fbce1cd96534a95e35d5120637bf29b0d7f5d8fa2f6374e31b4156dd419/xxhash-3.6.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:7d8b8aaa30fca4f16f0c84a5c8d7ddee0e25250ec2796c973775373257dde8f1", size = 30821, upload-time = "2025-10-02T14:34:57.219Z" }, + { url = "https://files.pythonhosted.org/packages/0c/ea/d387530ca7ecfa183cb358027f1833297c6ac6098223fd14f9782cd0015c/xxhash-3.6.0-cp313-cp313-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:d597acf8506d6e7101a4a44a5e428977a51c0fadbbfd3c39650cca9253f6e5a6", size = 194127, upload-time = "2025-10-02T14:34:59.21Z" }, + { url = "https://files.pythonhosted.org/packages/ba/0c/71435dcb99874b09a43b8d7c54071e600a7481e42b3e3ce1eb5226a5711a/xxhash-3.6.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:858dc935963a33bc33490128edc1c12b0c14d9c7ebaa4e387a7869ecc4f3e263", size = 212975, upload-time = "2025-10-02T14:35:00.816Z" }, + { url = "https://files.pythonhosted.org/packages/84/7a/c2b3d071e4bb4a90b7057228a99b10d51744878f4a8a6dd643c8bd897620/xxhash-3.6.0-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:ba284920194615cb8edf73bf52236ce2e1664ccd4a38fdb543506413529cc546", size = 212241, upload-time = "2025-10-02T14:35:02.207Z" }, + { url = "https://files.pythonhosted.org/packages/81/5f/640b6eac0128e215f177df99eadcd0f1b7c42c274ab6a394a05059694c5a/xxhash-3.6.0-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:4b54219177f6c6674d5378bd862c6aedf64725f70dd29c472eaae154df1a2e89", size = 445471, upload-time = "2025-10-02T14:35:03.61Z" }, + { url = "https://files.pythonhosted.org/packages/5e/1e/3c3d3ef071b051cc3abbe3721ffb8365033a172613c04af2da89d5548a87/xxhash-3.6.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:42c36dd7dbad2f5238950c377fcbf6811b1cdb1c444fab447960030cea60504d", size = 193936, upload-time = "2025-10-02T14:35:05.013Z" }, + { url = "https://files.pythonhosted.org/packages/2c/bd/4a5f68381939219abfe1c22a9e3a5854a4f6f6f3c4983a87d255f21f2e5d/xxhash-3.6.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f22927652cba98c44639ffdc7aaf35828dccf679b10b31c4ad72a5b530a18eb7", size = 210440, upload-time = "2025-10-02T14:35:06.239Z" }, + { url = "https://files.pythonhosted.org/packages/eb/37/b80fe3d5cfb9faff01a02121a0f4d565eb7237e9e5fc66e73017e74dcd36/xxhash-3.6.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:b45fad44d9c5c119e9c6fbf2e1c656a46dc68e280275007bbfd3d572b21426db", size = 197990, upload-time = "2025-10-02T14:35:07.735Z" }, + { url = "https://files.pythonhosted.org/packages/d7/fd/2c0a00c97b9e18f72e1f240ad4e8f8a90fd9d408289ba9c7c495ed7dc05c/xxhash-3.6.0-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:6f2580ffab1a8b68ef2b901cde7e55fa8da5e4be0977c68f78fc80f3c143de42", size = 210689, upload-time = "2025-10-02T14:35:09.438Z" }, + { url = "https://files.pythonhosted.org/packages/93/86/5dd8076a926b9a95db3206aba20d89a7fc14dd5aac16e5c4de4b56033140/xxhash-3.6.0-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:40c391dd3cd041ebc3ffe6f2c862f402e306eb571422e0aa918d8070ba31da11", size = 414068, upload-time = "2025-10-02T14:35:11.162Z" }, + { url = "https://files.pythonhosted.org/packages/af/3c/0bb129170ee8f3650f08e993baee550a09593462a5cddd8e44d0011102b1/xxhash-3.6.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:f205badabde7aafd1a31e8ca2a3e5a763107a71c397c4481d6a804eb5063d8bd", size = 191495, upload-time = "2025-10-02T14:35:12.971Z" }, + { url = "https://files.pythonhosted.org/packages/e9/3a/6797e0114c21d1725e2577508e24006fd7ff1d8c0c502d3b52e45c1771d8/xxhash-3.6.0-cp313-cp313-win32.whl", hash = "sha256:2577b276e060b73b73a53042ea5bd5203d3e6347ce0d09f98500f418a9fcf799", size = 30620, upload-time = "2025-10-02T14:35:14.129Z" }, + { url = "https://files.pythonhosted.org/packages/86/15/9bc32671e9a38b413a76d24722a2bf8784a132c043063a8f5152d390b0f9/xxhash-3.6.0-cp313-cp313-win_amd64.whl", hash = "sha256:757320d45d2fbcce8f30c42a6b2f47862967aea7bf458b9625b4bbe7ee390392", size = 31542, upload-time = "2025-10-02T14:35:15.21Z" }, + { url = "https://files.pythonhosted.org/packages/39/c5/cc01e4f6188656e56112d6a8e0dfe298a16934b8c47a247236549a3f7695/xxhash-3.6.0-cp313-cp313-win_arm64.whl", hash = "sha256:457b8f85dec5825eed7b69c11ae86834a018b8e3df5e77783c999663da2f96d6", size = 27880, upload-time = "2025-10-02T14:35:16.315Z" }, + { url = "https://files.pythonhosted.org/packages/f3/30/25e5321c8732759e930c555176d37e24ab84365482d257c3b16362235212/xxhash-3.6.0-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:a42e633d75cdad6d625434e3468126c73f13f7584545a9cf34e883aa1710e702", size = 32956, upload-time = "2025-10-02T14:35:17.413Z" }, + { url = "https://files.pythonhosted.org/packages/9f/3c/0573299560d7d9f8ab1838f1efc021a280b5ae5ae2e849034ef3dee18810/xxhash-3.6.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:568a6d743219e717b07b4e03b0a828ce593833e498c3b64752e0f5df6bfe84db", size = 31072, upload-time = "2025-10-02T14:35:18.844Z" }, + { url = "https://files.pythonhosted.org/packages/7a/1c/52d83a06e417cd9d4137722693424885cc9878249beb3a7c829e74bf7ce9/xxhash-3.6.0-cp313-cp313t-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:bec91b562d8012dae276af8025a55811b875baace6af510412a5e58e3121bc54", size = 196409, upload-time = "2025-10-02T14:35:20.31Z" }, + { url = "https://files.pythonhosted.org/packages/e3/8e/c6d158d12a79bbd0b878f8355432075fc82759e356ab5a111463422a239b/xxhash-3.6.0-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:78e7f2f4c521c30ad5e786fdd6bae89d47a32672a80195467b5de0480aa97b1f", size = 215736, upload-time = "2025-10-02T14:35:21.616Z" }, + { url = "https://files.pythonhosted.org/packages/bc/68/c4c80614716345d55071a396cf03d06e34b5f4917a467faf43083c995155/xxhash-3.6.0-cp313-cp313t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:3ed0df1b11a79856df5ffcab572cbd6b9627034c1c748c5566fa79df9048a7c5", size = 214833, upload-time = "2025-10-02T14:35:23.32Z" }, + { url = "https://files.pythonhosted.org/packages/7e/e9/ae27c8ffec8b953efa84c7c4a6c6802c263d587b9fc0d6e7cea64e08c3af/xxhash-3.6.0-cp313-cp313t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:0e4edbfc7d420925b0dd5e792478ed393d6e75ff8fc219a6546fb446b6a417b1", size = 448348, upload-time = "2025-10-02T14:35:25.111Z" }, + { url = "https://files.pythonhosted.org/packages/d7/6b/33e21afb1b5b3f46b74b6bd1913639066af218d704cc0941404ca717fc57/xxhash-3.6.0-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:fba27a198363a7ef87f8c0f6b171ec36b674fe9053742c58dd7e3201c1ab30ee", size = 196070, upload-time = "2025-10-02T14:35:26.586Z" }, + { url = "https://files.pythonhosted.org/packages/96/b6/fcabd337bc5fa624e7203aa0fa7d0c49eed22f72e93229431752bddc83d9/xxhash-3.6.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:794fe9145fe60191c6532fa95063765529770edcdd67b3d537793e8004cabbfd", size = 212907, upload-time = "2025-10-02T14:35:28.087Z" }, + { url = "https://files.pythonhosted.org/packages/4b/d3/9ee6160e644d660fcf176c5825e61411c7f62648728f69c79ba237250143/xxhash-3.6.0-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:6105ef7e62b5ac73a837778efc331a591d8442f8ef5c7e102376506cb4ae2729", size = 200839, upload-time = "2025-10-02T14:35:29.857Z" }, + { url = "https://files.pythonhosted.org/packages/0d/98/e8de5baa5109394baf5118f5e72ab21a86387c4f89b0e77ef3e2f6b0327b/xxhash-3.6.0-cp313-cp313t-musllinux_1_2_ppc64le.whl", hash = "sha256:f01375c0e55395b814a679b3eea205db7919ac2af213f4a6682e01220e5fe292", size = 213304, upload-time = "2025-10-02T14:35:31.222Z" }, + { url = "https://files.pythonhosted.org/packages/7b/1d/71056535dec5c3177eeb53e38e3d367dd1d16e024e63b1cee208d572a033/xxhash-3.6.0-cp313-cp313t-musllinux_1_2_s390x.whl", hash = "sha256:d706dca2d24d834a4661619dcacf51a75c16d65985718d6a7d73c1eeeb903ddf", size = 416930, upload-time = "2025-10-02T14:35:32.517Z" }, + { url = "https://files.pythonhosted.org/packages/dc/6c/5cbde9de2cd967c322e651c65c543700b19e7ae3e0aae8ece3469bf9683d/xxhash-3.6.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:5f059d9faeacd49c0215d66f4056e1326c80503f51a1532ca336a385edadd033", size = 193787, upload-time = "2025-10-02T14:35:33.827Z" }, + { url = "https://files.pythonhosted.org/packages/19/fa/0172e350361d61febcea941b0cc541d6e6c8d65d153e85f850a7b256ff8a/xxhash-3.6.0-cp313-cp313t-win32.whl", hash = "sha256:1244460adc3a9be84731d72b8e80625788e5815b68da3da8b83f78115a40a7ec", size = 30916, upload-time = "2025-10-02T14:35:35.107Z" }, + { url = "https://files.pythonhosted.org/packages/ad/e6/e8cf858a2b19d6d45820f072eff1bea413910592ff17157cabc5f1227a16/xxhash-3.6.0-cp313-cp313t-win_amd64.whl", hash = "sha256:b1e420ef35c503869c4064f4a2f2b08ad6431ab7b229a05cce39d74268bca6b8", size = 31799, upload-time = "2025-10-02T14:35:36.165Z" }, + { url = "https://files.pythonhosted.org/packages/56/15/064b197e855bfb7b343210e82490ae672f8bc7cdf3ddb02e92f64304ee8a/xxhash-3.6.0-cp313-cp313t-win_arm64.whl", hash = "sha256:ec44b73a4220623235f67a996c862049f375df3b1052d9899f40a6382c32d746", size = 28044, upload-time = "2025-10-02T14:35:37.195Z" }, + { url = "https://files.pythonhosted.org/packages/7e/5e/0138bc4484ea9b897864d59fce9be9086030825bc778b76cb5a33a906d37/xxhash-3.6.0-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:a40a3d35b204b7cc7643cbcf8c9976d818cb47befcfac8bbefec8038ac363f3e", size = 32754, upload-time = "2025-10-02T14:35:38.245Z" }, + { url = "https://files.pythonhosted.org/packages/18/d7/5dac2eb2ec75fd771957a13e5dda560efb2176d5203f39502a5fc571f899/xxhash-3.6.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:a54844be970d3fc22630b32d515e79a90d0a3ddb2644d8d7402e3c4c8da61405", size = 30846, upload-time = "2025-10-02T14:35:39.6Z" }, + { url = "https://files.pythonhosted.org/packages/fe/71/8bc5be2bb00deb5682e92e8da955ebe5fa982da13a69da5a40a4c8db12fb/xxhash-3.6.0-cp314-cp314-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:016e9190af8f0a4e3741343777710e3d5717427f175adfdc3e72508f59e2a7f3", size = 194343, upload-time = "2025-10-02T14:35:40.69Z" }, + { url = "https://files.pythonhosted.org/packages/e7/3b/52badfb2aecec2c377ddf1ae75f55db3ba2d321c5e164f14461c90837ef3/xxhash-3.6.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4f6f72232f849eb9d0141e2ebe2677ece15adfd0fa599bc058aad83c714bb2c6", size = 213074, upload-time = "2025-10-02T14:35:42.29Z" }, + { url = "https://files.pythonhosted.org/packages/a2/2b/ae46b4e9b92e537fa30d03dbc19cdae57ed407e9c26d163895e968e3de85/xxhash-3.6.0-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:63275a8aba7865e44b1813d2177e0f5ea7eadad3dd063a21f7cf9afdc7054063", size = 212388, upload-time = "2025-10-02T14:35:43.929Z" }, + { url = "https://files.pythonhosted.org/packages/f5/80/49f88d3afc724b4ac7fbd664c8452d6db51b49915be48c6982659e0e7942/xxhash-3.6.0-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:3cd01fa2aa00d8b017c97eb46b9a794fbdca53fc14f845f5a328c71254b0abb7", size = 445614, upload-time = "2025-10-02T14:35:45.216Z" }, + { url = "https://files.pythonhosted.org/packages/ed/ba/603ce3961e339413543d8cd44f21f2c80e2a7c5cfe692a7b1f2cccf58f3c/xxhash-3.6.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0226aa89035b62b6a86d3c68df4d7c1f47a342b8683da2b60cedcddb46c4d95b", size = 194024, upload-time = "2025-10-02T14:35:46.959Z" }, + { url = "https://files.pythonhosted.org/packages/78/d1/8e225ff7113bf81545cfdcd79eef124a7b7064a0bba53605ff39590b95c2/xxhash-3.6.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:c6e193e9f56e4ca4923c61238cdaced324f0feac782544eb4c6d55ad5cc99ddd", size = 210541, upload-time = "2025-10-02T14:35:48.301Z" }, + { url = "https://files.pythonhosted.org/packages/6f/58/0f89d149f0bad89def1a8dd38feb50ccdeb643d9797ec84707091d4cb494/xxhash-3.6.0-cp314-cp314-musllinux_1_2_i686.whl", hash = "sha256:9176dcaddf4ca963d4deb93866d739a343c01c969231dbe21680e13a5d1a5bf0", size = 198305, upload-time = "2025-10-02T14:35:49.584Z" }, + { url = "https://files.pythonhosted.org/packages/11/38/5eab81580703c4df93feb5f32ff8fa7fe1e2c51c1f183ee4e48d4bb9d3d7/xxhash-3.6.0-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:c1ce4009c97a752e682b897aa99aef84191077a9433eb237774689f14f8ec152", size = 210848, upload-time = "2025-10-02T14:35:50.877Z" }, + { url = "https://files.pythonhosted.org/packages/5e/6b/953dc4b05c3ce678abca756416e4c130d2382f877a9c30a20d08ee6a77c0/xxhash-3.6.0-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:8cb2f4f679b01513b7adbb9b1b2f0f9cdc31b70007eaf9d59d0878809f385b11", size = 414142, upload-time = "2025-10-02T14:35:52.15Z" }, + { url = "https://files.pythonhosted.org/packages/08/a9/238ec0d4e81a10eb5026d4a6972677cbc898ba6c8b9dbaec12ae001b1b35/xxhash-3.6.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:653a91d7c2ab54a92c19ccf43508b6a555440b9be1bc8be553376778be7f20b5", size = 191547, upload-time = "2025-10-02T14:35:53.547Z" }, + { url = "https://files.pythonhosted.org/packages/f1/ee/3cf8589e06c2164ac77c3bf0aa127012801128f1feebf2a079272da5737c/xxhash-3.6.0-cp314-cp314-win32.whl", hash = "sha256:a756fe893389483ee8c394d06b5ab765d96e68fbbfe6fde7aa17e11f5720559f", size = 31214, upload-time = "2025-10-02T14:35:54.746Z" }, + { url = "https://files.pythonhosted.org/packages/02/5d/a19552fbc6ad4cb54ff953c3908bbc095f4a921bc569433d791f755186f1/xxhash-3.6.0-cp314-cp314-win_amd64.whl", hash = "sha256:39be8e4e142550ef69629c9cd71b88c90e9a5db703fecbcf265546d9536ca4ad", size = 32290, upload-time = "2025-10-02T14:35:55.791Z" }, + { url = "https://files.pythonhosted.org/packages/b1/11/dafa0643bc30442c887b55baf8e73353a344ee89c1901b5a5c54a6c17d39/xxhash-3.6.0-cp314-cp314-win_arm64.whl", hash = "sha256:25915e6000338999236f1eb68a02a32c3275ac338628a7eaa5a269c401995679", size = 28795, upload-time = "2025-10-02T14:35:57.162Z" }, + { url = "https://files.pythonhosted.org/packages/2c/db/0e99732ed7f64182aef4a6fb145e1a295558deec2a746265dcdec12d191e/xxhash-3.6.0-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:c5294f596a9017ca5a3e3f8884c00b91ab2ad2933cf288f4923c3fd4346cf3d4", size = 32955, upload-time = "2025-10-02T14:35:58.267Z" }, + { url = "https://files.pythonhosted.org/packages/55/f4/2a7c3c68e564a099becfa44bb3d398810cc0ff6749b0d3cb8ccb93f23c14/xxhash-3.6.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:1cf9dcc4ab9cff01dfbba78544297a3a01dafd60f3bde4e2bfd016cf7e4ddc67", size = 31072, upload-time = "2025-10-02T14:35:59.382Z" }, + { url = "https://files.pythonhosted.org/packages/c6/d9/72a29cddc7250e8a5819dad5d466facb5dc4c802ce120645630149127e73/xxhash-3.6.0-cp314-cp314t-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:01262da8798422d0685f7cef03b2bd3f4f46511b02830861df548d7def4402ad", size = 196579, upload-time = "2025-10-02T14:36:00.838Z" }, + { url = "https://files.pythonhosted.org/packages/63/93/b21590e1e381040e2ca305a884d89e1c345b347404f7780f07f2cdd47ef4/xxhash-3.6.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:51a73fb7cb3a3ead9f7a8b583ffd9b8038e277cdb8cb87cf890e88b3456afa0b", size = 215854, upload-time = "2025-10-02T14:36:02.207Z" }, + { url = "https://files.pythonhosted.org/packages/ce/b8/edab8a7d4fa14e924b29be877d54155dcbd8b80be85ea00d2be3413a9ed4/xxhash-3.6.0-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:b9c6df83594f7df8f7f708ce5ebeacfc69f72c9fbaaababf6cf4758eaada0c9b", size = 214965, upload-time = "2025-10-02T14:36:03.507Z" }, + { url = "https://files.pythonhosted.org/packages/27/67/dfa980ac7f0d509d54ea0d5a486d2bb4b80c3f1bb22b66e6a05d3efaf6c0/xxhash-3.6.0-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:627f0af069b0ea56f312fd5189001c24578868643203bca1abbc2c52d3a6f3ca", size = 448484, upload-time = "2025-10-02T14:36:04.828Z" }, + { url = "https://files.pythonhosted.org/packages/8c/63/8ffc2cc97e811c0ca5d00ab36604b3ea6f4254f20b7bc658ca825ce6c954/xxhash-3.6.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:aa912c62f842dfd013c5f21a642c9c10cd9f4c4e943e0af83618b4a404d9091a", size = 196162, upload-time = "2025-10-02T14:36:06.182Z" }, + { url = "https://files.pythonhosted.org/packages/4b/77/07f0e7a3edd11a6097e990f6e5b815b6592459cb16dae990d967693e6ea9/xxhash-3.6.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:b465afd7909db30168ab62afe40b2fcf79eedc0b89a6c0ab3123515dc0df8b99", size = 213007, upload-time = "2025-10-02T14:36:07.733Z" }, + { url = "https://files.pythonhosted.org/packages/ae/d8/bc5fa0d152837117eb0bef6f83f956c509332ce133c91c63ce07ee7c4873/xxhash-3.6.0-cp314-cp314t-musllinux_1_2_i686.whl", hash = "sha256:a881851cf38b0a70e7c4d3ce81fc7afd86fbc2a024f4cfb2a97cf49ce04b75d3", size = 200956, upload-time = "2025-10-02T14:36:09.106Z" }, + { url = "https://files.pythonhosted.org/packages/26/a5/d749334130de9411783873e9b98ecc46688dad5db64ca6e04b02acc8b473/xxhash-3.6.0-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:9b3222c686a919a0f3253cfc12bb118b8b103506612253b5baeaac10d8027cf6", size = 213401, upload-time = "2025-10-02T14:36:10.585Z" }, + { url = "https://files.pythonhosted.org/packages/89/72/abed959c956a4bfc72b58c0384bb7940663c678127538634d896b1195c10/xxhash-3.6.0-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:c5aa639bc113e9286137cec8fadc20e9cd732b2cc385c0b7fa673b84fc1f2a93", size = 417083, upload-time = "2025-10-02T14:36:12.276Z" }, + { url = "https://files.pythonhosted.org/packages/0c/b3/62fd2b586283b7d7d665fb98e266decadf31f058f1cf6c478741f68af0cb/xxhash-3.6.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:5c1343d49ac102799905e115aee590183c3921d475356cb24b4de29a4bc56518", size = 193913, upload-time = "2025-10-02T14:36:14.025Z" }, + { url = "https://files.pythonhosted.org/packages/9a/9a/c19c42c5b3f5a4aad748a6d5b4f23df3bed7ee5445accc65a0fb3ff03953/xxhash-3.6.0-cp314-cp314t-win32.whl", hash = "sha256:5851f033c3030dd95c086b4a36a2683c2ff4a799b23af60977188b057e467119", size = 31586, upload-time = "2025-10-02T14:36:15.603Z" }, + { url = "https://files.pythonhosted.org/packages/03/d6/4cc450345be9924fd5dc8c590ceda1db5b43a0a889587b0ae81a95511360/xxhash-3.6.0-cp314-cp314t-win_amd64.whl", hash = "sha256:0444e7967dac37569052d2409b00a8860c2135cff05502df4da80267d384849f", size = 32526, upload-time = "2025-10-02T14:36:16.708Z" }, + { url = "https://files.pythonhosted.org/packages/0f/c9/7243eb3f9eaabd1a88a5a5acadf06df2d83b100c62684b7425c6a11bcaa8/xxhash-3.6.0-cp314-cp314t-win_arm64.whl", hash = "sha256:bb79b1e63f6fd84ec778a4b1916dfe0a7c3fdb986c06addd5db3a0d413819d95", size = 28898, upload-time = "2025-10-02T14:36:17.843Z" }, +] + +[[package]] +name = "zstandard" +version = "0.25.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/fd/aa/3e0508d5a5dd96529cdc5a97011299056e14c6505b678fd58938792794b1/zstandard-0.25.0.tar.gz", hash = "sha256:7713e1179d162cf5c7906da876ec2ccb9c3a9dcbdffef0cc7f70c3667a205f0b", size = 711513, upload-time = "2025-09-14T22:15:54.002Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/82/fc/f26eb6ef91ae723a03e16eddb198abcfce2bc5a42e224d44cc8b6765e57e/zstandard-0.25.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:7b3c3a3ab9daa3eed242d6ecceead93aebbb8f5f84318d82cee643e019c4b73b", size = 795738, upload-time = "2025-09-14T22:16:56.237Z" }, + { url = "https://files.pythonhosted.org/packages/aa/1c/d920d64b22f8dd028a8b90e2d756e431a5d86194caa78e3819c7bf53b4b3/zstandard-0.25.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:913cbd31a400febff93b564a23e17c3ed2d56c064006f54efec210d586171c00", size = 640436, upload-time = "2025-09-14T22:16:57.774Z" }, + { url = "https://files.pythonhosted.org/packages/53/6c/288c3f0bd9fcfe9ca41e2c2fbfd17b2097f6af57b62a81161941f09afa76/zstandard-0.25.0-cp312-cp312-manylinux2010_i686.manylinux2014_i686.manylinux_2_12_i686.manylinux_2_17_i686.whl", hash = "sha256:011d388c76b11a0c165374ce660ce2c8efa8e5d87f34996aa80f9c0816698b64", size = 5343019, upload-time = "2025-09-14T22:16:59.302Z" }, + { url = "https://files.pythonhosted.org/packages/1e/15/efef5a2f204a64bdb5571e6161d49f7ef0fffdbca953a615efbec045f60f/zstandard-0.25.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:6dffecc361d079bb48d7caef5d673c88c8988d3d33fb74ab95b7ee6da42652ea", size = 5063012, upload-time = "2025-09-14T22:17:01.156Z" }, + { url = "https://files.pythonhosted.org/packages/b7/37/a6ce629ffdb43959e92e87ebdaeebb5ac81c944b6a75c9c47e300f85abdf/zstandard-0.25.0-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:7149623bba7fdf7e7f24312953bcf73cae103db8cae49f8154dd1eadc8a29ecb", size = 5394148, upload-time = "2025-09-14T22:17:03.091Z" }, + { url = "https://files.pythonhosted.org/packages/e3/79/2bf870b3abeb5c070fe2d670a5a8d1057a8270f125ef7676d29ea900f496/zstandard-0.25.0-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:6a573a35693e03cf1d67799fd01b50ff578515a8aeadd4595d2a7fa9f3ec002a", size = 5451652, upload-time = "2025-09-14T22:17:04.979Z" }, + { url = "https://files.pythonhosted.org/packages/53/60/7be26e610767316c028a2cbedb9a3beabdbe33e2182c373f71a1c0b88f36/zstandard-0.25.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5a56ba0db2d244117ed744dfa8f6f5b366e14148e00de44723413b2f3938a902", size = 5546993, upload-time = "2025-09-14T22:17:06.781Z" }, + { url = "https://files.pythonhosted.org/packages/85/c7/3483ad9ff0662623f3648479b0380d2de5510abf00990468c286c6b04017/zstandard-0.25.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:10ef2a79ab8e2974e2075fb984e5b9806c64134810fac21576f0668e7ea19f8f", size = 5046806, upload-time = "2025-09-14T22:17:08.415Z" }, + { url = "https://files.pythonhosted.org/packages/08/b3/206883dd25b8d1591a1caa44b54c2aad84badccf2f1de9e2d60a446f9a25/zstandard-0.25.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:aaf21ba8fb76d102b696781bddaa0954b782536446083ae3fdaa6f16b25a1c4b", size = 5576659, upload-time = "2025-09-14T22:17:10.164Z" }, + { url = "https://files.pythonhosted.org/packages/9d/31/76c0779101453e6c117b0ff22565865c54f48f8bd807df2b00c2c404b8e0/zstandard-0.25.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:1869da9571d5e94a85a5e8d57e4e8807b175c9e4a6294e3b66fa4efb074d90f6", size = 4953933, upload-time = "2025-09-14T22:17:11.857Z" }, + { url = "https://files.pythonhosted.org/packages/18/e1/97680c664a1bf9a247a280a053d98e251424af51f1b196c6d52f117c9720/zstandard-0.25.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:809c5bcb2c67cd0ed81e9229d227d4ca28f82d0f778fc5fea624a9def3963f91", size = 5268008, upload-time = "2025-09-14T22:17:13.627Z" }, + { url = "https://files.pythonhosted.org/packages/1e/73/316e4010de585ac798e154e88fd81bb16afc5c5cb1a72eeb16dd37e8024a/zstandard-0.25.0-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:f27662e4f7dbf9f9c12391cb37b4c4c3cb90ffbd3b1fb9284dadbbb8935fa708", size = 5433517, upload-time = "2025-09-14T22:17:16.103Z" }, + { url = "https://files.pythonhosted.org/packages/5b/60/dd0f8cfa8129c5a0ce3ea6b7f70be5b33d2618013a161e1ff26c2b39787c/zstandard-0.25.0-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:99c0c846e6e61718715a3c9437ccc625de26593fea60189567f0118dc9db7512", size = 5814292, upload-time = "2025-09-14T22:17:17.827Z" }, + { url = "https://files.pythonhosted.org/packages/fc/5f/75aafd4b9d11b5407b641b8e41a57864097663699f23e9ad4dbb91dc6bfe/zstandard-0.25.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:474d2596a2dbc241a556e965fb76002c1ce655445e4e3bf38e5477d413165ffa", size = 5360237, upload-time = "2025-09-14T22:17:19.954Z" }, + { url = "https://files.pythonhosted.org/packages/ff/8d/0309daffea4fcac7981021dbf21cdb2e3427a9e76bafbcdbdf5392ff99a4/zstandard-0.25.0-cp312-cp312-win32.whl", hash = "sha256:23ebc8f17a03133b4426bcc04aabd68f8236eb78c3760f12783385171b0fd8bd", size = 436922, upload-time = "2025-09-14T22:17:24.398Z" }, + { url = "https://files.pythonhosted.org/packages/79/3b/fa54d9015f945330510cb5d0b0501e8253c127cca7ebe8ba46a965df18c5/zstandard-0.25.0-cp312-cp312-win_amd64.whl", hash = "sha256:ffef5a74088f1e09947aecf91011136665152e0b4b359c42be3373897fb39b01", size = 506276, upload-time = "2025-09-14T22:17:21.429Z" }, + { url = "https://files.pythonhosted.org/packages/ea/6b/8b51697e5319b1f9ac71087b0af9a40d8a6288ff8025c36486e0c12abcc4/zstandard-0.25.0-cp312-cp312-win_arm64.whl", hash = "sha256:181eb40e0b6a29b3cd2849f825e0fa34397f649170673d385f3598ae17cca2e9", size = 462679, upload-time = "2025-09-14T22:17:23.147Z" }, + { url = "https://files.pythonhosted.org/packages/35/0b/8df9c4ad06af91d39e94fa96cc010a24ac4ef1378d3efab9223cc8593d40/zstandard-0.25.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:ec996f12524f88e151c339688c3897194821d7f03081ab35d31d1e12ec975e94", size = 795735, upload-time = "2025-09-14T22:17:26.042Z" }, + { url = "https://files.pythonhosted.org/packages/3f/06/9ae96a3e5dcfd119377ba33d4c42a7d89da1efabd5cb3e366b156c45ff4d/zstandard-0.25.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:a1a4ae2dec3993a32247995bdfe367fc3266da832d82f8438c8570f989753de1", size = 640440, upload-time = "2025-09-14T22:17:27.366Z" }, + { url = "https://files.pythonhosted.org/packages/d9/14/933d27204c2bd404229c69f445862454dcc101cd69ef8c6068f15aaec12c/zstandard-0.25.0-cp313-cp313-manylinux2010_i686.manylinux2014_i686.manylinux_2_12_i686.manylinux_2_17_i686.whl", hash = "sha256:e96594a5537722fdfb79951672a2a63aec5ebfb823e7560586f7484819f2a08f", size = 5343070, upload-time = "2025-09-14T22:17:28.896Z" }, + { url = "https://files.pythonhosted.org/packages/6d/db/ddb11011826ed7db9d0e485d13df79b58586bfdec56e5c84a928a9a78c1c/zstandard-0.25.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:bfc4e20784722098822e3eee42b8e576b379ed72cca4a7cb856ae733e62192ea", size = 5063001, upload-time = "2025-09-14T22:17:31.044Z" }, + { url = "https://files.pythonhosted.org/packages/db/00/87466ea3f99599d02a5238498b87bf84a6348290c19571051839ca943777/zstandard-0.25.0-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:457ed498fc58cdc12fc48f7950e02740d4f7ae9493dd4ab2168a47c93c31298e", size = 5394120, upload-time = "2025-09-14T22:17:32.711Z" }, + { url = "https://files.pythonhosted.org/packages/2b/95/fc5531d9c618a679a20ff6c29e2b3ef1d1f4ad66c5e161ae6ff847d102a9/zstandard-0.25.0-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:fd7a5004eb1980d3cefe26b2685bcb0b17989901a70a1040d1ac86f1d898c551", size = 5451230, upload-time = "2025-09-14T22:17:34.41Z" }, + { url = "https://files.pythonhosted.org/packages/63/4b/e3678b4e776db00f9f7b2fe58e547e8928ef32727d7a1ff01dea010f3f13/zstandard-0.25.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:8e735494da3db08694d26480f1493ad2cf86e99bdd53e8e9771b2752a5c0246a", size = 5547173, upload-time = "2025-09-14T22:17:36.084Z" }, + { url = "https://files.pythonhosted.org/packages/4e/d5/ba05ed95c6b8ec30bd468dfeab20589f2cf709b5c940483e31d991f2ca58/zstandard-0.25.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:3a39c94ad7866160a4a46d772e43311a743c316942037671beb264e395bdd611", size = 5046736, upload-time = "2025-09-14T22:17:37.891Z" }, + { url = "https://files.pythonhosted.org/packages/50/d5/870aa06b3a76c73eced65c044b92286a3c4e00554005ff51962deef28e28/zstandard-0.25.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:172de1f06947577d3a3005416977cce6168f2261284c02080e7ad0185faeced3", size = 5576368, upload-time = "2025-09-14T22:17:40.206Z" }, + { url = "https://files.pythonhosted.org/packages/5d/35/398dc2ffc89d304d59bc12f0fdd931b4ce455bddf7038a0a67733a25f550/zstandard-0.25.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:3c83b0188c852a47cd13ef3bf9209fb0a77fa5374958b8c53aaa699398c6bd7b", size = 4954022, upload-time = "2025-09-14T22:17:41.879Z" }, + { url = "https://files.pythonhosted.org/packages/9a/5c/36ba1e5507d56d2213202ec2b05e8541734af5f2ce378c5d1ceaf4d88dc4/zstandard-0.25.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:1673b7199bbe763365b81a4f3252b8e80f44c9e323fc42940dc8843bfeaf9851", size = 5267889, upload-time = "2025-09-14T22:17:43.577Z" }, + { url = "https://files.pythonhosted.org/packages/70/e8/2ec6b6fb7358b2ec0113ae202647ca7c0e9d15b61c005ae5225ad0995df5/zstandard-0.25.0-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:0be7622c37c183406f3dbf0cba104118eb16a4ea7359eeb5752f0794882fc250", size = 5433952, upload-time = "2025-09-14T22:17:45.271Z" }, + { url = "https://files.pythonhosted.org/packages/7b/01/b5f4d4dbc59ef193e870495c6f1275f5b2928e01ff5a81fecb22a06e22fb/zstandard-0.25.0-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:5f5e4c2a23ca271c218ac025bd7d635597048b366d6f31f420aaeb715239fc98", size = 5814054, upload-time = "2025-09-14T22:17:47.08Z" }, + { url = "https://files.pythonhosted.org/packages/b2/e5/fbd822d5c6f427cf158316d012c5a12f233473c2f9c5fe5ab1ae5d21f3d8/zstandard-0.25.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:4f187a0bb61b35119d1926aee039524d1f93aaf38a9916b8c4b78ac8514a0aaf", size = 5360113, upload-time = "2025-09-14T22:17:48.893Z" }, + { url = "https://files.pythonhosted.org/packages/8e/e0/69a553d2047f9a2c7347caa225bb3a63b6d7704ad74610cb7823baa08ed7/zstandard-0.25.0-cp313-cp313-win32.whl", hash = "sha256:7030defa83eef3e51ff26f0b7bfb229f0204b66fe18e04359ce3474ac33cbc09", size = 436936, upload-time = "2025-09-14T22:17:52.658Z" }, + { url = "https://files.pythonhosted.org/packages/d9/82/b9c06c870f3bd8767c201f1edbdf9e8dc34be5b0fbc5682c4f80fe948475/zstandard-0.25.0-cp313-cp313-win_amd64.whl", hash = "sha256:1f830a0dac88719af0ae43b8b2d6aef487d437036468ef3c2ea59c51f9d55fd5", size = 506232, upload-time = "2025-09-14T22:17:50.402Z" }, + { url = "https://files.pythonhosted.org/packages/d4/57/60c3c01243bb81d381c9916e2a6d9e149ab8627c0c7d7abb2d73384b3c0c/zstandard-0.25.0-cp313-cp313-win_arm64.whl", hash = "sha256:85304a43f4d513f5464ceb938aa02c1e78c2943b29f44a750b48b25ac999a049", size = 462671, upload-time = "2025-09-14T22:17:51.533Z" }, + { url = "https://files.pythonhosted.org/packages/3d/5c/f8923b595b55fe49e30612987ad8bf053aef555c14f05bb659dd5dbe3e8a/zstandard-0.25.0-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:e29f0cf06974c899b2c188ef7f783607dbef36da4c242eb6c82dcd8b512855e3", size = 795887, upload-time = "2025-09-14T22:17:54.198Z" }, + { url = "https://files.pythonhosted.org/packages/8d/09/d0a2a14fc3439c5f874042dca72a79c70a532090b7ba0003be73fee37ae2/zstandard-0.25.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:05df5136bc5a011f33cd25bc9f506e7426c0c9b3f9954f056831ce68f3b6689f", size = 640658, upload-time = "2025-09-14T22:17:55.423Z" }, + { url = "https://files.pythonhosted.org/packages/5d/7c/8b6b71b1ddd517f68ffb55e10834388d4f793c49c6b83effaaa05785b0b4/zstandard-0.25.0-cp314-cp314-manylinux2010_i686.manylinux_2_12_i686.manylinux_2_28_i686.whl", hash = "sha256:f604efd28f239cc21b3adb53eb061e2a205dc164be408e553b41ba2ffe0ca15c", size = 5379849, upload-time = "2025-09-14T22:17:57.372Z" }, + { url = "https://files.pythonhosted.org/packages/a4/86/a48e56320d0a17189ab7a42645387334fba2200e904ee47fc5a26c1fd8ca/zstandard-0.25.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:223415140608d0f0da010499eaa8ccdb9af210a543fac54bce15babbcfc78439", size = 5058095, upload-time = "2025-09-14T22:17:59.498Z" }, + { url = "https://files.pythonhosted.org/packages/f8/ad/eb659984ee2c0a779f9d06dbfe45e2dc39d99ff40a319895df2d3d9a48e5/zstandard-0.25.0-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:2e54296a283f3ab5a26fc9b8b5d4978ea0532f37b231644f367aa588930aa043", size = 5551751, upload-time = "2025-09-14T22:18:01.618Z" }, + { url = "https://files.pythonhosted.org/packages/61/b3/b637faea43677eb7bd42ab204dfb7053bd5c4582bfe6b1baefa80ac0c47b/zstandard-0.25.0-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:ca54090275939dc8ec5dea2d2afb400e0f83444b2fc24e07df7fdef677110859", size = 6364818, upload-time = "2025-09-14T22:18:03.769Z" }, + { url = "https://files.pythonhosted.org/packages/31/dc/cc50210e11e465c975462439a492516a73300ab8caa8f5e0902544fd748b/zstandard-0.25.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e09bb6252b6476d8d56100e8147b803befa9a12cea144bbe629dd508800d1ad0", size = 5560402, upload-time = "2025-09-14T22:18:05.954Z" }, + { url = "https://files.pythonhosted.org/packages/c9/ae/56523ae9c142f0c08efd5e868a6da613ae76614eca1305259c3bf6a0ed43/zstandard-0.25.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:a9ec8c642d1ec73287ae3e726792dd86c96f5681eb8df274a757bf62b750eae7", size = 4955108, upload-time = "2025-09-14T22:18:07.68Z" }, + { url = "https://files.pythonhosted.org/packages/98/cf/c899f2d6df0840d5e384cf4c4121458c72802e8bda19691f3b16619f51e9/zstandard-0.25.0-cp314-cp314-musllinux_1_2_i686.whl", hash = "sha256:a4089a10e598eae6393756b036e0f419e8c1d60f44a831520f9af41c14216cf2", size = 5269248, upload-time = "2025-09-14T22:18:09.753Z" }, + { url = "https://files.pythonhosted.org/packages/1b/c0/59e912a531d91e1c192d3085fc0f6fb2852753c301a812d856d857ea03c6/zstandard-0.25.0-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:f67e8f1a324a900e75b5e28ffb152bcac9fbed1cc7b43f99cd90f395c4375344", size = 5430330, upload-time = "2025-09-14T22:18:11.966Z" }, + { url = "https://files.pythonhosted.org/packages/a0/1d/7e31db1240de2df22a58e2ea9a93fc6e38cc29353e660c0272b6735d6669/zstandard-0.25.0-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:9654dbc012d8b06fc3d19cc825af3f7bf8ae242226df5f83936cb39f5fdc846c", size = 5811123, upload-time = "2025-09-14T22:18:13.907Z" }, + { url = "https://files.pythonhosted.org/packages/f6/49/fac46df5ad353d50535e118d6983069df68ca5908d4d65b8c466150a4ff1/zstandard-0.25.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:4203ce3b31aec23012d3a4cf4a2ed64d12fea5269c49aed5e4c3611b938e4088", size = 5359591, upload-time = "2025-09-14T22:18:16.465Z" }, + { url = "https://files.pythonhosted.org/packages/c2/38/f249a2050ad1eea0bb364046153942e34abba95dd5520af199aed86fbb49/zstandard-0.25.0-cp314-cp314-win32.whl", hash = "sha256:da469dc041701583e34de852d8634703550348d5822e66a0c827d39b05365b12", size = 444513, upload-time = "2025-09-14T22:18:20.61Z" }, + { url = "https://files.pythonhosted.org/packages/3a/43/241f9615bcf8ba8903b3f0432da069e857fc4fd1783bd26183db53c4804b/zstandard-0.25.0-cp314-cp314-win_amd64.whl", hash = "sha256:c19bcdd826e95671065f8692b5a4aa95c52dc7a02a4c5a0cac46deb879a017a2", size = 516118, upload-time = "2025-09-14T22:18:17.849Z" }, + { url = "https://files.pythonhosted.org/packages/f0/ef/da163ce2450ed4febf6467d77ccb4cd52c4c30ab45624bad26ca0a27260c/zstandard-0.25.0-cp314-cp314-win_arm64.whl", hash = "sha256:d7541afd73985c630bafcd6338d2518ae96060075f9463d7dc14cfb33514383d", size = 476940, upload-time = "2025-09-14T22:18:19.088Z" }, +] diff --git a/conf/config.yaml b/conf/config.yaml index 2c53336e..6a69a491 100644 --- a/conf/config.yaml +++ b/conf/config.yaml @@ -278,7 +278,7 @@ chunker: # --- Retriever --- # Env: RETRIEVER_TYPE, RETRIEVER_TOP_K, SIMILARITY_THRESHOLD, # WITH_SURROUNDING_CHUNKS, INCLUDE_RELATED, INCLUDE_ANCESTORS, -# RELATED_LIMIT, MAX_DEPTH +# RELATED_LIMIT, MAX_DEPTH, RETRIEVER_ALLOW_FILTERLESS_FALLBACK retriever: type: single top_k: 50 @@ -290,6 +290,9 @@ retriever: max_ancestor_depth: 10 k_queries: 3 combine: false + # When a temporally-filtered retrieval returns no documents, re-query + # without the filter. Set to false for strict temporal retrieval. + allow_filterless_fallback: true # --- RAG --- # Env: RAG_MODE diff --git a/openrag/components/pipeline.py b/openrag/components/pipeline.py index 7745f73a..5408d2b4 100644 --- a/openrag/components/pipeline.py +++ b/openrag/components/pipeline.py @@ -2,7 +2,9 @@ import copy from datetime import datetime from enum import Enum +from typing import Literal +import openai import ray from components.prompts import ( QUERY_CONTEXTUALIZER_PROMPT, @@ -14,8 +16,9 @@ from components.websearch import WebSearchFactory from config import load_config from langchain_core.documents.base import Document +from langchain_core.exceptions import OutputParserException from langchain_openai import ChatOpenAI -from pydantic import BaseModel, Field +from pydantic import BaseModel, Field, ValidationError from utils.logger import get_logger from .llm import LLM @@ -34,19 +37,76 @@ class RAGMODE(Enum): CHATBOTRAG = "ChatBotRag" -class SearchQueries(BaseModel): - """Search queries for semantic retrieval.""" +class TemporalPredicate(BaseModel): + """A single constraint on a document's creation date. + + Multiple predicates on the same `Query` are combined with logical AND. + Use two predicates to express a closed range (e.g. last month): + [{op: ">=", value: "2026-03-01..."}, {op: "<=", value: "2026-03-31..."}] + """ + + field: Literal["created_at"] = Field( + default="created_at", + description="Document metadata field to filter on. Always `created_at` for now.", + ) + operator: Literal[">", "<", ">=", "<="] = Field( + description="Comparison operator applied to the date field.", + ) + value: str = Field( + description='ISO 8601 datetime with timezone, e.g. "2026-03-15T00:00:00+00:00".', + ) + + +class Query(BaseModel): + """A single vector database search query with optional temporal filters on document creation date. + + Predicates in `temporal_filters` are AND-combined. To express an exclusion + (e.g. "last year except March"), emit TWO `Query` objects, each with its own + AND-combined predicates covering one side of the gap. + """ + + query: str = Field(description="A semantically enriched, descriptive query for vector similarity search.") + temporal_filters: list[TemporalPredicate] | None = Field( + default=None, + description="Date predicates on `created_at`, AND-combined. Null when no temporal reference in the query.", + ) + + def to_milvus_filter(self) -> str | None: + if not self.temporal_filters: + return None + parts = [] + for p in self.temporal_filters: + try: + datetime.fromisoformat(p.value) + except (TypeError, ValueError): + logger.warning( + "Dropping temporal predicate with non-ISO value", + field=p.field, + operator=p.operator, + value=p.value, + ) + continue + parts.append(f'{p.field} {p.operator} ISO "{p.value}"') + if not parts: + return None + return " and ".join(parts) + + def __str__(self) -> str: + return f"Query: {self.query}, Filter: {self.to_milvus_filter()}" + - query_list: list[str] = Field(..., description="Search sub-queries to retrieve relevant documents.") +class SearchQueries(BaseModel): + query_list: list[Query] = Field(..., description="Search sub-queries to retrieve relevant documents.") def __str__(self) -> str: - return " -- ".join(f"Query: {q}" for q in self.query_list) + return " --- ".join(str(q) for q in self.query_list) class RetrieverPipeline: def __init__(self) -> None: # retriever self.retriever: BaseRetriever = RetrieverFactory.create_retriever(config=config) + self.allow_filterless_fallback = config.retriever.allow_filterless_fallback # reranker self.reranker_enabled = config.reranker.enabled @@ -57,20 +117,36 @@ def __init__(self) -> None: async def retrieve_docs( self, partition: list[str], - query: str, + query: Query, top_k: int | None = None, - filter: str | None = None, filter_params: dict | None = None, ) -> list[Document]: + milvus_filter = query.to_milvus_filter() docs = await self.retriever.retrieve( - partition=partition, query=query, filter=filter, filter_params=filter_params + partition=partition, query=query.query, filter=milvus_filter, filter_params=filter_params ) + + # Fallback: drop temporal filter if it wiped out all candidates. + # Gated by `retriever.allow_filterless_fallback` so deployments that + # prefer strict temporal retrieval can opt out (returns no docs + # rather than temporally-incorrect ones). + if not docs and milvus_filter and self.allow_filterless_fallback: + logger.warning( + "Temporal filter dropped: no documents matched, retrying without filter", + query=str(query.query), + filter=milvus_filter, + partition=partition, + ) + docs = await self.retriever.retrieve( + partition=partition, query=query.query, filter=None, filter_params=filter_params + ) + logger.debug("Documents retreived", document_count=len(docs)) if docs: # 1. rerank all the docs if self.reranker_enabled: - docs = await self.reranker.rerank(query=query, documents=docs, top_k=None) + docs = await self.reranker.rerank(query=query.query, documents=docs, top_k=None) logger.debug("Documents reranked", document_count=len(docs)) # 2. expand the docs with related documents @@ -89,7 +165,7 @@ async def retrieve_docs( # rerank again after expansion if reranker is enabled if self.reranker_enabled: - docs = await self.reranker.rerank(query=query, documents=docs, top_k=None) + docs = await self.reranker.rerank(query=query.query, documents=docs, top_k=None) logger.debug("Documents after expansion and reranking", document_count=len(docs)) return docs @@ -99,11 +175,10 @@ async def get_relevant_docs( partition: list[str], search_queries: SearchQueries, top_k: int | None = None, - filter: str | None = None, filter_params: dict | None = None, ) -> list[Document]: tasks = [ - self.retrieve_docs(partition=partition, query=q, top_k=top_k, filter=filter, filter_params=filter_params) + self.retrieve_docs(partition=partition, query=q, top_k=top_k, filter_params=filter_params) for q in search_queries.query_list ] results = await asyncio.gather(*tasks) @@ -125,12 +200,20 @@ def __init__(self) -> None: self.max_context_tokens = config.reranker.top_k * config.chunker.chunk_size self.llm_client = LLM(config.llm, logger) - self.query_generator = ChatOpenAI( + + llm = ChatOpenAI( base_url=config.llm.base_url, api_key=config.llm.api_key, model=config.llm.model, temperature=config.llm.temperature, - ).with_structured_output(SearchQueries, method="function_calling") + ) + + primary = llm.with_structured_output(SearchQueries, method="json_schema", strict=True) + fallback = llm.with_structured_output(SearchQueries, method="function_calling", strict=False) + self.query_generator = primary.with_fallbacks( + [fallback], + exceptions_to_handle=(openai.BadRequestError,), + ) self.max_contextualized_query_len = config.rag.max_contextualized_query_len @@ -149,7 +232,7 @@ async def generate_query(self, messages: list[dict]) -> SearchQueries: case RAGMODE.SIMPLERAG: # For SimpleRag, we don't need to contextualize the query as the chat history is not taken into account last_msg = messages[-1] - return SearchQueries(query_list=[last_msg["content"]]) + return SearchQueries(query_list=[Query(query=last_msg["content"])]) case RAGMODE.CHATBOTRAG: # Contextualize the query based on the chat history @@ -157,7 +240,8 @@ async def generate_query(self, messages: list[dict]) -> SearchQueries: for m in messages: chat_history += f"{m['role']}: {m['content']}\n" - query_language = detect_language(messages[-1]["content"]) + last_user_query = messages[-1]["content"] + query_language = detect_language(last_user_query) model_kwargs = { "max_completion_tokens": self.max_contextualized_query_len, @@ -165,23 +249,28 @@ async def generate_query(self, messages: list[dict]) -> SearchQueries: } prompt = QUERY_CONTEXTUALIZER_PROMPT.format( query_language=query_language, - current_date=datetime.now().strftime("%Y-%m-%d"), + current_date=datetime.now().strftime("%A, %B %d, %Y, %H:%M:%S"), ) - messages = [ - { - "role": "system", - "content": prompt, - }, - { - "role": "user", - "content": f"Here is the chat history: \n{chat_history}\n", - }, + llm_messages = [ + {"role": "system", "content": prompt}, + {"role": "user", "content": f"Here is the chat history: \n{chat_history}\n"}, ] - # generate queries based on the chat history - output: SearchQueries = await self.query_generator.bind(**model_kwargs).ainvoke(messages) - return output + # Retry once on schema-validation failure; fall back to the raw user query on the second failure. + generator = self.query_generator.bind(**model_kwargs) + for attempt in (1, 2): + try: + return await generator.ainvoke(llm_messages) + except (ValidationError, OutputParserException) as exc: + if attempt == 1: + logger.warning("Query generation schema error — retrying", error=str(exc)) + else: + logger.warning( + "Query generation failed twice — falling back to raw user query", + error=str(exc), + ) + return SearchQueries(query_list=[Query(query=last_user_query)]) async def _prepare_for_chat_completion(self, partition: list[str] | None, payload: dict): messages = payload["messages"] @@ -234,7 +323,7 @@ async def _prepare_for_chat_completion(self, partition: list[str] | None, payloa ) for q in queries.query_list ] - web_tasks = [self.web_search_service.search(q) for q in queries.query_list] + web_tasks = [self.web_search_service.search(q.query) for q in queries.query_list] all_results = await asyncio.gather(*rag_tasks, *web_tasks) n = len(queries.query_list) raw_doc_lists = list(all_results[:n]) @@ -257,7 +346,7 @@ async def _prepare_for_chat_completion(self, partition: list[str] | None, payloa else: # Web-only mode (partition is None): no RAG retrieval. # Run one web search per sub-query concurrently and deduplicate by URL. - raw_web_lists = await asyncio.gather(*[self.web_search_service.search(q) for q in queries.query_list]) + raw_web_lists = await asyncio.gather(*[self.web_search_service.search(q.query) for q in queries.query_list]) seen_urls = set() web_results = [] for result in (r for web_list in raw_web_lists for r in web_list): @@ -271,7 +360,7 @@ async def _prepare_for_chat_completion(self, partition: list[str] | None, payloa return payload, [], [] if use_map_reduce and docs: - docs = await self.map_reduce.map(query=" ".join(queries.query_list), chunks=docs) + docs = await self.map_reduce.map(query=" ".join(q.query for q in queries.query_list), chunks=docs) # 3. Format web results first to know actual token usage, then allocate remaining budget to RAG web_formatted = "" @@ -310,7 +399,9 @@ async def _prepare_for_chat_completion(self, partition: list[str] | None, payloa 0, { "role": "system", - "content": prompt.format(context=context), + "content": prompt.format( + context=context, current_date=datetime.now().strftime("%A, %B %d, %Y, %H:%M:%S") + ), }, ) payload["messages"] = messages diff --git a/openrag/config/loader.py b/openrag/config/loader.py index ed506a64..20f6d359 100644 --- a/openrag/config/loader.py +++ b/openrag/config/loader.py @@ -149,6 +149,7 @@ ("INCLUDE_ANCESTORS", "retriever.include_ancestors", bool), ("RELATED_LIMIT", "retriever.related_limit", int), ("MAX_DEPTH", "retriever.max_ancestor_depth", int), + ("RETRIEVER_ALLOW_FILTERLESS_FALLBACK", "retriever.allow_filterless_fallback", bool), # RAG ("RAG_MODE", "rag.mode", str), # WebSearch diff --git a/openrag/config/models.py b/openrag/config/models.py index 738fcfaf..99e958f4 100644 --- a/openrag/config/models.py +++ b/openrag/config/models.py @@ -419,6 +419,7 @@ class _BaseRetrieverConfig(ConfigMixin): include_ancestors: bool = True related_limit: int = 10 max_ancestor_depth: int = 10 + allow_filterless_fallback: bool = True class SingleRetrieverConfig(_BaseRetrieverConfig): diff --git a/prompts/example1/query_contextualizer_tmpl.txt b/prompts/example1/query_contextualizer_tmpl.txt index 121b3c58..67a134d3 100644 --- a/prompts/example1/query_contextualizer_tmpl.txt +++ b/prompts/example1/query_contextualizer_tmpl.txt @@ -1,36 +1,64 @@ -From a chat history, reformulate the user's last message into one or more autonomous search queries by integrating relevant context from previous conversations. - -# Task: -- For follow-up questions: Reformulate by replacing pronouns with corresponding nouns and add relevant keywords -- For independent questions: Apply only minimal corrections (grammar, keywords) -- For simple thank you messages: No reformulation necessary -- Enrich queries with prior conversation context only when relevant to the query. - -## Query decomposition guidelines -- Use full, descriptive sentences (not isolated keywords) to improve semantic retrieval. -- Inject previous chat history only when it is directly relevant to the current query, as unnecessary context can degrade search quality. -- Split complex queries into independent sub-queries, especially for comparisons or multi-dimensional questions. - - Distinct time periods - - "What are the sales figures for Product A in Q1 and Q2?" should give two sub-queries (one per quarter). - - "Evolution of carbon footprint in the last 2 years" should give two sub-queries (one per year). - - - Distinct entities - - "What are the sales figures for Product A and Product B?" should give two sub-queries (one per product). - - - Unrelated dimensions - - "What are the sales figures for Product A in the US and in Europe?" should give two sub-queries (one per region). - -# additional information -- Current date: {current_date} - -# Requirements: -- Preserve the original tone and intention -- Do not add information beyond necessary context -- Do not answer the questions, only reformulate them -- Respond in the language of the user's last message : {query_language} - -# Examples: -- User: I'm planning a trip to Italy and I'm interested in historical monuments and local cuisine. -- Assistant: Italy offers a wealth of history and culinary delights. -- User: What are the must-see sites? -Reformulated query: What are the must-see historical monuments and local cuisine restaurants in Italy? +Produce a JSON object listing sub-queries derived from the user's last message. + +Output shape (return this JSON object, nothing else): +{{"query_list": [ {{"query": "", "temporal_filters": }} ]}} + +Current date: {current_date} +Language for `query` field: {query_language} +Timestamps: UTC (`+00:00`). Week starts Monday. + +# Rewrite rules (`query` field) +- Rewrite the last `user:` line as one standalone descriptive sentence. +- Use earlier turns only to resolve pronouns or add context directly relevant to the query; do not inject unrelated history. +- For independent questions: minimal changes (grammar, missing keywords). +- Greetings / thanks: copy verbatim, `temporal_filters: null`. +- Do not answer. Only reformulate. + +# Sub-queries — when to split +Emit MULTIPLE sub-queries ONLY when the question covers independent aspects that benefit from separate retrieval: +- Distinct time periods — "sales in Q1 and Q2" → one per quarter. +- Distinct entities — "Product A and Product B" → one per product. +- Unrelated dimensions — "sales in US and in Europe" → one per region. +- Exclusions — "last year except March" → one per remaining range. +- Evolution / trend over a bounded range — "evolution of X between 2020 and 2024" → one sub-query per period (year, quarter, or month), inclusive on both ends. Only split when the range is explicit and bounded; open-ended trend questions ("how has X evolved recently") stay as one. +Otherwise emit ONE sub-query. Do not split for synonyms or rephrasings of the same question. + +# Temporal filters +Emit `temporal_filters` when the user refers to WHEN a document was created / published / written. Leave null if the date only describes the topic. +Each predicate: `field="created_at"`, `operator` ∈ {{`>=`, `>`, `<=`, `<`}}, `value` = ISO 8601 UTC. +Predicates are AND-combined. Always use half-open intervals: `>= start` AND `< next_start`. + +Date resolution (vs Current date): +- today / now → `[today, tomorrow)` +- yesterday → `[yesterday, today)` +- this / last week → Monday-to-Monday +- this / last month → 1st-to-1st +- this / last year → Jan 1-to-Jan 1 +- past N days/weeks/months/years → `[Current − N, tomorrow)` +- recent / latest → past 90 days (~3 months, NOT a year): `[Current − 90 days, tomorrow)` +- since X → ONE predicate only: `>= X`. Do NOT add an upper bound. +- before X → one predicate `< X` +- bare MONTH (no year) → nearest past occurrence +- in YEAR → `[Jan 1 YEAR, Jan 1 YEAR+1)` + +For exclusions, split into two sub-queries covering each remaining range. Never negate with a single predicate. + +# Examples (Current date = Wednesday, April 15, 2026) + +User: "Summary of meeting notes uploaded in the past month" +{{"query_list":[{{"query":"Summary of meeting notes uploaded in the past month","temporal_filters":[{{"field":"created_at","operator":">=","value":"2026-03-15T00:00:00+00:00"}},{{"field":"created_at","operator":"<","value":"2026-04-16T00:00:00+00:00"}}]}}]}} + +User: "Sales figures for Product A and Product B" +{{"query_list":[{{"query":"Sales figures for Product A","temporal_filters":null}},{{"query":"Sales figures for Product B","temporal_filters":null}}]}} + +User: "Documents from last year except March" +{{"query_list":[{{"query":"Documents from January or February 2025","temporal_filters":[{{"field":"created_at","operator":">=","value":"2025-01-01T00:00:00+00:00"}},{{"field":"created_at","operator":"<","value":"2025-03-01T00:00:00+00:00"}}]}},{{"query":"Documents from April to December 2025","temporal_filters":[{{"field":"created_at","operator":">=","value":"2025-04-01T00:00:00+00:00"}},{{"field":"created_at","operator":"<","value":"2026-01-01T00:00:00+00:00"}}]}}]}} + +User: "Q3 2024 reporting template" +{{"query_list":[{{"query":"Q3 2024 reporting template","temporal_filters":null}}]}} + +User: "Evolution of the Department of Justice budget between 2020 and 2022" +{{"query_list":[{{"query":"Department of Justice budget in 2020","temporal_filters":null}},{{"query":"Department of Justice budget in 2021","temporal_filters":null}},{{"query":"Department of Justice budget in 2022","temporal_filters":null}}]}} + +User: "Latest safety bulletins" +{{"query_list":[{{"query":"Latest safety bulletins","temporal_filters":[{{"field":"created_at","operator":">=","value":"2026-01-15T00:00:00+00:00"}},{{"field":"created_at","operator":"<","value":"2026-04-16T00:00:00+00:00"}}]}}]}} \ No newline at end of file diff --git a/prompts/example1/spoken_style_answer_tmpl.txt b/prompts/example1/spoken_style_answer_tmpl.txt index 11581179..d2029b56 100644 --- a/prompts/example1/spoken_style_answer_tmpl.txt +++ b/prompts/example1/spoken_style_answer_tmpl.txt @@ -1,6 +1,9 @@ You are an AI assistant designed for **spoken, conversational answers**. Your goal is to give short (1-2 sentences), clear, and accurate explanations, based only on the retrieved documents in `Context`. +# Context +- Current date: {current_date} + ## Rules 1. Use only the provided Context diff --git a/prompts/example1/sys_prompt_tmpl.txt b/prompts/example1/sys_prompt_tmpl.txt index 0602374c..bb0ea23e 100644 --- a/prompts/example1/sys_prompt_tmpl.txt +++ b/prompts/example1/sys_prompt_tmpl.txt @@ -2,6 +2,9 @@ You are an AI conversational assistant specialized in **information retrieval an Your goal is to provide **precise, reliable, and well-structured answers** using **only the retrieved documents** (`Context`). Prioritize **clarity, accuracy, and completeness** in your responses. +# Context +- Current date: {current_date} + ## Rules 1. Use only the provided Context diff --git a/pyproject.toml b/pyproject.toml index 4d35e82b..1aaa6f9b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -115,3 +115,6 @@ ignore = [ [tool.ruff.lint.isort] known-first-party = ["openrag"] + +[tool.uv.workspace] +exclude = ["benchmarks/prompt_eval"] diff --git a/tests/api_tests/api_run/mock_vllm.py b/tests/api_tests/api_run/mock_vllm.py index 0b1d2a05..2ce5f1aa 100644 --- a/tests/api_tests/api_run/mock_vllm.py +++ b/tests/api_tests/api_run/mock_vllm.py @@ -224,6 +224,46 @@ async def stream_chat_completion(request: ChatCompletionRequest): yield "data: [DONE]\n\n" +def _resolve_ref(schema: dict, root: dict) -> dict: + """Dereference a $ref against the root schema document.""" + if "$ref" not in schema: + return schema + parts = schema["$ref"].lstrip("#/").split("/") + node = root + for part in parts: + node = node.get(part, {}) + return node + + +def _mock_value(schema: dict, root: dict, user_text: str): + """Recursively build a mock value that satisfies the given JSON Schema node.""" + schema = _resolve_ref(schema, root) + # anyOf / oneOf — nullable fields (union with null) return None to avoid invalid mock values + for combiner in ("anyOf", "oneOf"): + if combiner in schema: + branches = schema[combiner] + has_null = any(s.get("type") == "null" for s in branches) + if has_null: + return None + non_null = [s for s in branches if s.get("type") != "null"] + if non_null: + return _mock_value(non_null[0], root, user_text) + return None + t = schema.get("type") + if t == "object" or "properties" in schema: + return {k: _mock_value(v, root, user_text) for k, v in schema.get("properties", {}).items()} + if t == "array": + items = _resolve_ref(schema.get("items", {}), root) + return [_mock_value(items, root, user_text)] + if t == "string": + return user_text + if t in ("integer", "number"): + return 0 + if t == "boolean": + return False + return None + + def generate_tool_call_response(request: ChatCompletionRequest) -> dict: """Generate a mock tool_calls response for structured output requests.""" tool = request.tools[0] @@ -236,14 +276,9 @@ def generate_tool_call_response(request: ChatCompletionRequest) -> dict: last_user_msg = next((m for m in reversed(request.messages) if m.role == "user"), None) user_text = str(last_user_msg.content)[:100] if last_user_msg else "mock query" - mock_args: dict = {} - for prop_name, prop_schema in properties.items(): - if prop_schema.get("type") == "array": - mock_args[prop_name] = [user_text] - elif prop_schema.get("type") == "string": - mock_args[prop_name] = user_text - else: - mock_args[prop_name] = None + mock_args: dict = { + prop_name: _mock_value(prop_schema, parameters, user_text) for prop_name, prop_schema in properties.items() + } prompt_tokens = sum(count_tokens(str(msg.content)) for msg in request.messages) args_json = json.dumps(mock_args)