Skip to content

feat(retrieval): CypherFirstAggregationStrategy + cypher-path accuracy fixes#255

Open
galshubeli wants to merge 4 commits into
mainfrom
feat/cypher-aggregation-accuracy
Open

feat(retrieval): CypherFirstAggregationStrategy + cypher-path accuracy fixes#255
galshubeli wants to merge 4 commits into
mainfrom
feat/cypher-aggregation-accuracy

Conversation

@galshubeli
Copy link
Copy Markdown
Collaborator

@galshubeli galshubeli commented May 14, 2026

Summary

Adds a new opt-in retrieval strategy, CypherFirstAggregationStrategy, that routes quantitative/structural questions ("how many", "which X has the most", "BOTH A and B", "are there any X without Y", "what is the average …") through a deterministic Cypher-first path. Non-aggregation questions delegate to MultiPathRetrieval unchanged.

Alongside the strategy, this PR also lands six targeted SDK-level fixes to the pre-existing cypher path (used by MultiPathRetrieval(enable_cypher=True) too): smart LIMIT injection, APOC/GDS/db.* function-call blocklist in the validator, 0-row label-widen fallback, two new schema-prompt examples, and metadata threading for observability.

Background

A focused failure-mode investigation on a 56-person / 10-org synthetic benchmark identified seven distinct failure modes on aggregation questions — none of them random one-offs:

  • Cypher results encoded as \" | \"-joined strings lose column names, so the answer-LLM can swap row values
  • Auto LIMIT 25 truncates group-by aggregations silently
  • apoc.text.regexGroups(...)-style function calls slipped past the CALL blocklist
  • Typed-label cypher queries fail when extraction labeled the entity differently
  • Schema prompt had no examples for "BOTH X AND Y" or top-N group-by
  • Empty cypher results were treated identically for positive and negation existentials
  • Roles/projects extracted as free text in chunks weren't queryable as typed nodes

The new strategy directly addresses each one. The 7-question failure-mode benchmark went from 2-5/7 (depending on which pre-fix variant) to 7/7 stable across three runs.

What's in this PR (4 commits)

Commit Title
`bb920c6` improve text-to-Cypher accuracy on aggregation questions
`7fa7a1f` add CypherFirstAggregationStrategy
`deaa64d` pre-merge review fixes for CypherFirstAggregationStrategy
`d818eb1` split CypherFirst into path classes + pluggable extractor

Each commit has its own detailed body.

Compatibility (cypher-off callers)

The PR is additive for users who don't opt in:

  • CypherFirstAggregationStrategy only runs if passed explicitly as retrieval_strategy=
  • The cypher-path improvements (smart LIMIT, APOC blocklist, etc.) live inside the cypher generation/execution code and only run when enable_cypher=True
  • The "Authoritative Graph Query Results" system-prompt rule is added conditionally — only when the retriever produced a cypher_results section. With cypher off, the base system prompt is byte-identical to main.

Test coverage (`tests/test_facade.py::TestCypherAuthorityRuleInjection`) pins the contract.

Usage

```python
from graphrag_sdk import (
CypherFirstAggregationStrategy,
FastCorefResolver,
GraphExtraction,
LLMVerifiedResolution,
SentenceTokenCapChunking,
)

chunker = SentenceTokenCapChunking(max_tokens=512, overlap_sentences=2)
extractor = GraphExtraction(llm=llm, coref_resolver=FastCorefResolver())
resolver = LLMVerifiedResolution(llm=llm, embedder=embedder)

async with GraphRAG(
connection=conn, llm=llm, embedder=embedder, embedding_dimension=256,
) as rag:
await rag.ingest(text=doc, chunker=chunker,
extractor=extractor, resolver=resolver)
await rag.finalize()
rag._retrieval_strategy = CypherFirstAggregationStrategy(
graph_store=rag._graph_store,
vector_store=rag._vector_store,
embedder=embedder,
llm=llm,
)
answer = await rag.completion("Which city has the most employees?")
```

Related PRs

Test plan

  • Unit tests: `tests/test_cypher_first.py` (66 tests), `tests/test_cypher_generation.py` (49 tests), `tests/test_facade.py` (75 tests including R1 contract)
  • End-to-end aggregation benchmark: 7/7 stable across 3 runs (`CypherFirstAggregationStrategy`) and 7/7 single run (`MultiPathRetrieval(enable_cypher=True)`)
  • Full SDK suite: 748 passed, 24 skipped on the cypher branch tip
  • `ruff check` clean on all touched files
  • Reviewer to confirm CI green on this PR

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added Cypher-first retrieval strategy for optimized handling of aggregation and numeric queries
    • Enhanced result formatting with markdown table support and improved metadata tracking for graph results
  • Improvements

    • Strengthened Cypher validation with namespace function call detection
    • Implemented intelligent fallback mechanisms when queries return empty results
    • Enhanced prompt support for authoritative graph query results

Review Change Stack

galshubeli and others added 4 commits May 10, 2026 17:44
…ions

Six localized fixes to the cypher_generation pipeline, identified from a
failure-mode investigation on a 56-person/10-org synthetic corpus where
both vector-only and cypher-enabled retrieval were silently producing
wrong answers on counting, top-N, group-by, and intersection questions.

P0 fixes (ship together):

- Smart LIMIT injection. _sanitize_cypher no longer auto-injects LIMIT on
  pure aggregations (count/sum/avg without group-by) and uses the new
  _DEFAULT_ROW_LIMIT=100 constant otherwise. Paired with raising the
  result_assembly slice cap to 100 plus a truncation sentinel, this stops
  group-by lists ("orgs with >=5 employees", "top-N city") from being
  silently cut at 20 rows.

- Authoritative result framing. The cypher results section is renamed to
  "Authoritative Graph Query Results (deterministic; trust over passages
  on counts and aggregates)" and a matching rule 8 is added to both
  _RAG_SYSTEM_PROMPT variants in api/main.py. Together they stop the LLM
  from contradicting a correct numeric cypher answer when verbose passage
  text mentions a different entity.

- APOC/GDS/db function blocklist. validate_cypher now rejects dotted-
  namespace function calls (apoc.text.regexGroups, gds.*, db.*) which
  FalkorDB silently returns 0 rows for. The error feeds the existing
  retry-with-feedback loop so attempt 2 has a concrete fix-it.

P1 fixes:

- 0-row label-widen fallback. When a typed-label cypher returns 0 rows
  AND a name predicate is present (label is a routing hint, not the
  filter itself), execute_cypher_retrieval rewrites typed labels to
  __Entity__ and re-runs once. Pure-cypher rewrite, no second LLM call.
  Recovers cases where the extractor labelled an entity differently than
  the schema prompt steered the LLM toward.

- Two new schema-prompt examples replacing redundant ones: a top-N
  group-by ("city with most employees", explicit 2-hop) and a set-
  intersection ("works at BOTH A and B") with a matching rule.

P2: cypher metadata (cypher_fallback, cypher_truncated, cypher_rows)
threaded through assemble_raw_result into RetrieverResult.metadata so
operators can monitor fallback firing rate.

Public API: execute_cypher_retrieval now returns a 3-tuple
(facts, entities, metadata) instead of (facts, entities). Internal —
only callers are multi_path.py and tests.

Verification on the 6-question matrix moved 2 questions from wrong to
correct (city group-by, observability existential via label widen) and
preserved the 4 that were already passing. Unit tests added for
_is_pure_aggregation, _split_top_level_commas, _widen_typed_labels,
_should_widen_labels, apoc rejection, and label-widen firing/gating.
49 cypher tests + full SDK suite (629) green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New retrieval strategy that routes quantitative and structural questions
through a deterministic Cypher-first path while delegating non-aggregation
questions to a fallback strategy (default MultiPathRetrieval). Safe to use
as the top-level strategy on GraphRAG.

Implements six mechanisms identified from a failure-mode investigation
where aggregation answers were being silently corrupted by extraction
noise, lossy result formatting, and LLM mistrust of bare cypher numbers:

- Intent classifier routes per question into numeric_math / aggregation /
  rag. Catches "how many", "which X", "more X than Y", "BOTH A and B",
  "are there any", "average / total of NUMBER".

- Multi-candidate cypher generation. K parallel samples per question,
  execute all, pick the highest-row-count result. Beats LLM stochasticity
  on structural interpretation without serial retries.

- Column-named markdown table formatting via result.header. Eliminates
  the "10 | 7 | True" ambiguity that was swapping comparison answers.

- Description+chunk-text fuzzy hybrid for "shared X" / "BOTH A and B"
  shapes when X is a free-text property (role, project) not extracted as
  a typed entity. Single batched cypher, sentence-restricted regex
  extraction, fuzzy intersect (substring or 2-token overlap). Recovers
  cases where graph extraction summarized away the project names.

- Numeric-math sub-path. RETURN raw values, do average / sum / median in
  Python. Avoids LLM-arithmetic errors deterministically.

- Negation-existential empty handling. For "are there any X without Y?"
  an empty cypher result is the definitive "No"; positive existentials
  fall back to vector retrieval since extraction labels are unreliable.

The strategy emits its result under an "Authoritative Graph Query Results"
section heading that rule 8 of the existing _RAG_SYSTEM_PROMPT (added in
bb920c6) is already configured to trust on quantitative questions.

Benchmark: prototype scored 7/7 stably across three runs on the seven-
question failure-mode matrix that previous strategies hit 2-5/7 on. The
SDK port scored 6/7 in an end-to-end check; the remaining failure was
malformed extracted org names ("Glo" / "Initech System") leaking through
into the answer — an extraction-quality issue orthogonal to this strategy.

Public API: CypherFirstAggregationStrategy is exported at the top level
and from graphrag_sdk.retrieval.strategies. Pass it as
retrieval_strategy= on GraphRAG construction to opt in.

Tests: 45 unit tests cover the pure-Python helpers (intent classifier,
shape detectors, role/project regex extractors, fuzzy intersect, markdown
table formatter, numeric coercion). Full SDK suite stays green at
674 passed, 3 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nStrategy

Addresses concerns from a pre-merge review of 7fa7a1f. Behavior preserved
on the failure-mode benchmark; the changes are about surface quality,
observability, and test coverage.

R1 — Make the cypher-authority rule opt-in. Rule 8 ("trust the
'Authoritative Graph Query Results' section over passages on counts and
aggregates") used to live inside the base _RAG_SYSTEM_PROMPT and
_RAG_SYSTEM_PROMPT_DELIMITED, so it fired on every completion() call
SDK-wide — including users who never enable cypher retrieval. The rule is
now extracted into a separate constant _CYPHER_AUTH_RULE and appended to
the system prompt only when the retriever produced a cypher_results
section (detected via item metadata or the canonical heading marker as a
defensive fallback). Callers on MultiPathRetrieval without enable_cypher
keep the unchanged 7-rule prompt.

R2 — Add cypher_first_path metadata to every strategy result. The
strategy has five sub-paths plus three RAG-fallback branches; today
operators can't tell which one handled a query from the result alone.
Each result now carries one of seven canonical PATH_* labels:
numeric_math, shared_property_hybrid, cypher_table, negation_empty_no,
rag_fallback, rag_fallback_numeric_fail, rag_fallback_cypher_empty. A
shared _tag_path() helper handles the bookkeeping including the three
delegated-fallback wrappings so the contract is uniform.

R3 — Document prose-shape and graph-topology assumptions. The shared-
property hybrid was tuned on graphs produced by the SDK's default
GraphExtraction pipeline; custom extractors or domain prose may not
match. The class docstring now has dedicated "Assumptions and known
limits" and "Accuracy ceiling" sections naming each one. Plus a runtime
warning fires when the batched (Org)<-[:RELATES]-(Person) query returns
zero tuples, so operators on different schemas get a fast signal rather
than silent wrong answers.

R4 — Mocked end-to-end routing tests. The 45 existing unit tests cover
pure-Python helpers, not the strategy's branching. Seven new tests use a
mock LLM (returning canned cypher), a mock graph (returning canned
result_sets), and a stub _FakeFallback to assert which sub-path fires
for each intent + graph-state combination. Patterns covered: rag intent
→ fallback, aggregation + rows → cypher_table, numeric → python math,
numeric extraction empty → fallback (with numeric_fail tag), negation
+ empty cypher → No (no delegation), positive-existential + empty cypher
→ fallback (with cypher_empty tag), topology-violation warning.

R10 — ruff check pass. Dropped two unused imports (RetrieverError,
ChatMessage) and sorted the test_cypher_first imports.

All 75 facade + 57 cypher_first + 49 cypher_generation tests pass. Full
SDK suite 686 passed, 3 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…extractor

R5 — Split CypherFirstAggregationStrategy into composable path classes.
The strategy file went from 944 lines with five sub-paths crammed into
the strategy class to a small dispatcher (~80 lines) backed by four
focused _AggregationPath subclasses:

  - _RagDelegationPath        — intent="rag" → fallback verbatim
  - _NumericMathPath          — intent="numeric_math" → Python arithmetic
  - _SharedPropertyHybridPath — "BOTH A and B" / "same X as Z" via chunks
  - _MultiCandidateCypherPath — K parallel cypher candidates + table
    (also owns the negation-empty and cypher-empty-fallback branches)

Each path is a small class with a single `maybe_handle(query, ctx)`
method that returns either a final `RawSearchResult` or `None` to defer.
The strategy's `_execute` dispatches by intent and consults the relevant
paths in order. Existing routing tests (TestStrategyRouting, 7 cases)
keep covering the contract end-to-end; the helper classes are
implementation detail.

Pure refactor — no behaviour change. All paths share state via a single
reference to the parent strategy, so callers' constructor signature is
unchanged.

R8 — Pluggable phrase extractor for the shared-property hybrid.

The default role/project regexes target the prose patterns the SDK's
GraphExtraction pipeline produces ("works at X as a <role>", "contributes
to <project>") and a closed set of role-suffix words. Domain-specific
use cases (medical, legal, e-commerce, non-English) need different
vocabularies.

Added a `PhraseExtractor` ABC and a `DefaultPhraseExtractor` that wraps
the existing regexes. Strategies accept a `phrase_extractor=` parameter
on construction; the shared-property hybrid consults it instead of the
module-level `_extract_phrases`. Domain users can now subclass and
inject without forking.

Exports added at strategies/__init__.py and top-level graphrag_sdk
namespace.

Tests
-----
- 1 unit test for DefaultPhraseExtractor (matches the default regexes,
  unknown kinds return empty set rather than raising).
- 1 mocked end-to-end test confirming that a custom extractor passed to
  the strategy is consulted by the hybrid path (intersection computed
  over the custom vocabulary, not the default).

All 763 unit tests pass; ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97f9ff97-d0b7-4612-a434-34ce0c9c39bc

📥 Commits

Reviewing files that changed from the base of the PR and between a90ad18 and d818eb1.

📒 Files selected for processing (10)
  • graphrag_sdk/src/graphrag_sdk/__init__.py
  • graphrag_sdk/src/graphrag_sdk/api/main.py
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/__init__.py
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/multi_path.py
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/result_assembly.py
  • graphrag_sdk/tests/test_cypher_first.py
  • graphrag_sdk/tests/test_cypher_generation.py
  • graphrag_sdk/tests/test_facade.py

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive Cypher-first aggregation strategy that detects query intent and routes to specialized execution paths: numeric-math (Cypher-based numeric computation), shared-property hybrid (fuzzy intersection over extracted phrases), and multi-candidate Cypher-table (parallel K-candidate generation). Supporting enhancements include auto-injected row limits, pure-aggregation detection, label-widening fallback on 0-row results, result cap increase from 20 to 100, and authority rule injection into completion prompts.

Changes

Cypher-First Aggregation Strategy

Layer / File(s) Summary
Public API Exports and Module Wiring
graphrag_sdk/src/graphrag_sdk/__init__.py, graphrag_sdk/src/graphrag_sdk/retrieval/strategies/__init__.py
Exposes CypherFirstAggregationStrategy, DefaultPhraseExtractor, and PhraseExtractor via package imports and __all__ lists.
Intent Detection and Query Classification
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py (lines 1–140)
Detects intent as aggregation, numeric-math, or RAG; classifies query shapes (yes/no, which-list); identifies negation-existential patterns for routing.
Numeric-Math Execution Path
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py (lines 432–564)
Generates Cypher for raw numeric values, coerces result cells to numbers in Python, computes average/sum/median, and falls back to RAG if extraction yields no values.
Shared-Property Hybrid Path
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py (lines 565–711)
Executes batched Cypher to fetch descriptions, extracts role/project phrases via pluggable regex extractors, and computes "both A and B" intersections using fuzzy token matching.
Markdown Table Rendering and Utilities
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py (lines 286–343, 1030–1061)
Converts FalkorDB result sets to markdown tables with header synthesis and truncation; coerces result cells to numeric values; tags results with strategy-path metadata.
Multi-Candidate Cypher-Table Path
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py (lines 713–868)
Generates K candidate Cypher queries in parallel, executes all in parallel, ranks by row count, and formats top result as markdown table; handles negation-existential "No" and fallback branches.
CypherFirstAggregationStrategy Class
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py (lines 870–1022)
Wires embedder, LLM, fallback strategy, and phrase extractor; routes queries in _execute based on detected intent with prioritization of shared-property hybrid before multi-candidate paths.
Cypher Generation Validation and Limits
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py (lines 44–323)
Auto-injects default row limit (100); detects pure aggregation to skip LIMIT injection; rejects dotted-namespace function calls; updates prompt examples for intersection semantics.
Label-Widening Fallback and Result Parsing
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py (lines 391–529)
Widens typed entity labels (:Person:__Entity__) and re-executes once when initial query returns 0 rows and is eligible; adds result parsing and enriched metadata (row counts, fallback flags).
Multi-Path Integration and Result Assembly
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/multi_path.py, graphrag_sdk/src/graphrag_sdk/retrieval/strategies/result_assembly.py
Threads metadata from Cypher execution through result unpacking; increases result cap to 100; adds "Authoritative Graph Query Results" heading; merges cypher_metadata with prefixed keys.
Cypher Authority Rule and Prompt Integration
graphrag_sdk/src/graphrag_sdk/api/main.py
Introduces cypher-authority system prompt rule and _has_authoritative_cypher_results helper; conditionally appends rule to default system prompt during completion.
Comprehensive Test Coverage
graphrag_sdk/tests/test_cypher_first.py, graphrag_sdk/tests/test_cypher_generation.py, graphrag_sdk/tests/test_facade.py
Tests for intent detection, query classification, phrase extraction, fuzzy intersection, markdown table formatting, numeric coercion, path tagging, end-to-end routing, validation rules, label-widening fallback, and authority rule injection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • Naseem77

Poem

🐰 Behold! A Cypher-first hop through intent-aware paths,
Where numeric aggregates compute via Python's math,
Shared properties dance with fuzzy token grace,
Multi-candidates race to win the query race!
From zero rows widened to authority's glow,
GraphRAG strategies flourish and grow. 🌱

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main additions in this PR: a new CypherFirstAggregationStrategy and multiple cypher-path accuracy improvements across multiple modules.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cypher-aggregation-accuracy
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch feat/cypher-aggregation-accuracy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant