feat(retrieval): CypherFirstAggregationStrategy + cypher-path accuracy fixes by galshubeli · Pull Request #255 · FalkorDB/GraphRAG-SDK

galshubeli · 2026-05-14T14:16:32Z

Summary

Adds a new opt-in retrieval strategy, CypherFirstAggregationStrategy, that routes quantitative/structural questions ("how many", "which X has the most", "BOTH A and B", "are there any X without Y", "what is the average …") through a deterministic Cypher-first path. Non-aggregation questions delegate to MultiPathRetrieval unchanged.

Alongside the strategy, this PR also lands six targeted SDK-level fixes to the pre-existing cypher path (used by MultiPathRetrieval(enable_cypher=True) too): smart LIMIT injection, APOC/GDS/db.* function-call blocklist in the validator, 0-row label-widen fallback, two new schema-prompt examples, and metadata threading for observability.

Background

A focused failure-mode investigation on a 56-person / 10-org synthetic benchmark identified seven distinct failure modes on aggregation questions — none of them random one-offs:

Cypher results encoded as \" | \"-joined strings lose column names, so the answer-LLM can swap row values
Auto LIMIT 25 truncates group-by aggregations silently
apoc.text.regexGroups(...)-style function calls slipped past the CALL blocklist
Typed-label cypher queries fail when extraction labeled the entity differently
Schema prompt had no examples for "BOTH X AND Y" or top-N group-by
Empty cypher results were treated identically for positive and negation existentials
Roles/projects extracted as free text in chunks weren't queryable as typed nodes

The new strategy directly addresses each one. The 7-question failure-mode benchmark went from 2-5/7 (depending on which pre-fix variant) to 7/7 stable across three runs.

What's in this PR (4 commits)

Commit	Title
`bb920c6`	improve text-to-Cypher accuracy on aggregation questions
`7fa7a1f`	add CypherFirstAggregationStrategy
`deaa64d`	pre-merge review fixes for CypherFirstAggregationStrategy
`d818eb1`	split CypherFirst into path classes + pluggable extractor

Each commit has its own detailed body.

Compatibility (cypher-off callers)

The PR is additive for users who don't opt in:

CypherFirstAggregationStrategy only runs if passed explicitly as retrieval_strategy=
The cypher-path improvements (smart LIMIT, APOC blocklist, etc.) live inside the cypher generation/execution code and only run when enable_cypher=True
The "Authoritative Graph Query Results" system-prompt rule is added conditionally — only when the retriever produced a cypher_results section. With cypher off, the base system prompt is byte-identical to main.

Test coverage (`tests/test_facade.py::TestCypherAuthorityRuleInjection`) pins the contract.

Usage

```python
from graphrag_sdk import (
CypherFirstAggregationStrategy,
FastCorefResolver,
GraphExtraction,
LLMVerifiedResolution,
SentenceTokenCapChunking,
)

chunker = SentenceTokenCapChunking(max_tokens=512, overlap_sentences=2)
extractor = GraphExtraction(llm=llm, coref_resolver=FastCorefResolver())
resolver = LLMVerifiedResolution(llm=llm, embedder=embedder)

async with GraphRAG(
connection=conn, llm=llm, embedder=embedder, embedding_dimension=256,
) as rag:
await rag.ingest(text=doc, chunker=chunker,
extractor=extractor, resolver=resolver)
await rag.finalize()
rag._retrieval_strategy = CypherFirstAggregationStrategy(
graph_store=rag._graph_store,
vector_store=rag._vector_store,
embedder=embedder,
llm=llm,
)
answer = await rag.completion("Which city has the most employees?")
```

Related PRs

feat(ingestion): default to SentenceTokenCapChunking in ingest()/upda… #254 — independent change to make SentenceTokenCapChunking the default chunker (recommended pairing; the benchmark required it to hit 7/7)
docs(retrieval): recommend SentenceTokenCapChunking for CypherFirst #253 — currently mis-targeted, will be retargeted to stack on top of this PR so its diff shrinks to the 33-line docstring follow-up

Test plan

Unit tests: `tests/test_cypher_first.py` (66 tests), `tests/test_cypher_generation.py` (49 tests), `tests/test_facade.py` (75 tests including R1 contract)
End-to-end aggregation benchmark: 7/7 stable across 3 runs (`CypherFirstAggregationStrategy`) and 7/7 single run (`MultiPathRetrieval(enable_cypher=True)`)
Full SDK suite: 748 passed, 24 skipped on the cypher branch tip
`ruff check` clean on all touched files
Reviewer to confirm CI green on this PR

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added Cypher-first retrieval strategy for optimized handling of aggregation and numeric queries
- Enhanced result formatting with markdown table support and improved metadata tracking for graph results
Improvements
- Strengthened Cypher validation with namespace function call detection
- Implemented intelligent fallback mechanisms when queries return empty results
- Enhanced prompt support for authoritative graph query results

…ions Six localized fixes to the cypher_generation pipeline, identified from a failure-mode investigation on a 56-person/10-org synthetic corpus where both vector-only and cypher-enabled retrieval were silently producing wrong answers on counting, top-N, group-by, and intersection questions. P0 fixes (ship together): - Smart LIMIT injection. _sanitize_cypher no longer auto-injects LIMIT on pure aggregations (count/sum/avg without group-by) and uses the new _DEFAULT_ROW_LIMIT=100 constant otherwise. Paired with raising the result_assembly slice cap to 100 plus a truncation sentinel, this stops group-by lists ("orgs with >=5 employees", "top-N city") from being silently cut at 20 rows. - Authoritative result framing. The cypher results section is renamed to "Authoritative Graph Query Results (deterministic; trust over passages on counts and aggregates)" and a matching rule 8 is added to both _RAG_SYSTEM_PROMPT variants in api/main.py. Together they stop the LLM from contradicting a correct numeric cypher answer when verbose passage text mentions a different entity. - APOC/GDS/db function blocklist. validate_cypher now rejects dotted- namespace function calls (apoc.text.regexGroups, gds.*, db.*) which FalkorDB silently returns 0 rows for. The error feeds the existing retry-with-feedback loop so attempt 2 has a concrete fix-it. P1 fixes: - 0-row label-widen fallback. When a typed-label cypher returns 0 rows AND a name predicate is present (label is a routing hint, not the filter itself), execute_cypher_retrieval rewrites typed labels to __Entity__ and re-runs once. Pure-cypher rewrite, no second LLM call. Recovers cases where the extractor labelled an entity differently than the schema prompt steered the LLM toward. - Two new schema-prompt examples replacing redundant ones: a top-N group-by ("city with most employees", explicit 2-hop) and a set- intersection ("works at BOTH A and B") with a matching rule. P2: cypher metadata (cypher_fallback, cypher_truncated, cypher_rows) threaded through assemble_raw_result into RetrieverResult.metadata so operators can monitor fallback firing rate. Public API: execute_cypher_retrieval now returns a 3-tuple (facts, entities, metadata) instead of (facts, entities). Internal — only callers are multi_path.py and tests. Verification on the 6-question matrix moved 2 questions from wrong to correct (city group-by, observability existential via label widen) and preserved the 4 that were already passing. Unit tests added for _is_pure_aggregation, _split_top_level_commas, _widen_typed_labels, _should_widen_labels, apoc rejection, and label-widen firing/gating. 49 cypher tests + full SDK suite (629) green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New retrieval strategy that routes quantitative and structural questions through a deterministic Cypher-first path while delegating non-aggregation questions to a fallback strategy (default MultiPathRetrieval). Safe to use as the top-level strategy on GraphRAG. Implements six mechanisms identified from a failure-mode investigation where aggregation answers were being silently corrupted by extraction noise, lossy result formatting, and LLM mistrust of bare cypher numbers: - Intent classifier routes per question into numeric_math / aggregation / rag. Catches "how many", "which X", "more X than Y", "BOTH A and B", "are there any", "average / total of NUMBER". - Multi-candidate cypher generation. K parallel samples per question, execute all, pick the highest-row-count result. Beats LLM stochasticity on structural interpretation without serial retries. - Column-named markdown table formatting via result.header. Eliminates the "10 | 7 | True" ambiguity that was swapping comparison answers. - Description+chunk-text fuzzy hybrid for "shared X" / "BOTH A and B" shapes when X is a free-text property (role, project) not extracted as a typed entity. Single batched cypher, sentence-restricted regex extraction, fuzzy intersect (substring or 2-token overlap). Recovers cases where graph extraction summarized away the project names. - Numeric-math sub-path. RETURN raw values, do average / sum / median in Python. Avoids LLM-arithmetic errors deterministically. - Negation-existential empty handling. For "are there any X without Y?" an empty cypher result is the definitive "No"; positive existentials fall back to vector retrieval since extraction labels are unreliable. The strategy emits its result under an "Authoritative Graph Query Results" section heading that rule 8 of the existing _RAG_SYSTEM_PROMPT (added in bb920c6) is already configured to trust on quantitative questions. Benchmark: prototype scored 7/7 stably across three runs on the seven- question failure-mode matrix that previous strategies hit 2-5/7 on. The SDK port scored 6/7 in an end-to-end check; the remaining failure was malformed extracted org names ("Glo" / "Initech System") leaking through into the answer — an extraction-quality issue orthogonal to this strategy. Public API: CypherFirstAggregationStrategy is exported at the top level and from graphrag_sdk.retrieval.strategies. Pass it as retrieval_strategy= on GraphRAG construction to opt in. Tests: 45 unit tests cover the pure-Python helpers (intent classifier, shape detectors, role/project regex extractors, fuzzy intersect, markdown table formatter, numeric coercion). Full SDK suite stays green at 674 passed, 3 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nStrategy Addresses concerns from a pre-merge review of 7fa7a1f. Behavior preserved on the failure-mode benchmark; the changes are about surface quality, observability, and test coverage. R1 — Make the cypher-authority rule opt-in. Rule 8 ("trust the 'Authoritative Graph Query Results' section over passages on counts and aggregates") used to live inside the base _RAG_SYSTEM_PROMPT and _RAG_SYSTEM_PROMPT_DELIMITED, so it fired on every completion() call SDK-wide — including users who never enable cypher retrieval. The rule is now extracted into a separate constant _CYPHER_AUTH_RULE and appended to the system prompt only when the retriever produced a cypher_results section (detected via item metadata or the canonical heading marker as a defensive fallback). Callers on MultiPathRetrieval without enable_cypher keep the unchanged 7-rule prompt. R2 — Add cypher_first_path metadata to every strategy result. The strategy has five sub-paths plus three RAG-fallback branches; today operators can't tell which one handled a query from the result alone. Each result now carries one of seven canonical PATH_* labels: numeric_math, shared_property_hybrid, cypher_table, negation_empty_no, rag_fallback, rag_fallback_numeric_fail, rag_fallback_cypher_empty. A shared _tag_path() helper handles the bookkeeping including the three delegated-fallback wrappings so the contract is uniform. R3 — Document prose-shape and graph-topology assumptions. The shared- property hybrid was tuned on graphs produced by the SDK's default GraphExtraction pipeline; custom extractors or domain prose may not match. The class docstring now has dedicated "Assumptions and known limits" and "Accuracy ceiling" sections naming each one. Plus a runtime warning fires when the batched (Org)<-[:RELATES]-(Person) query returns zero tuples, so operators on different schemas get a fast signal rather than silent wrong answers. R4 — Mocked end-to-end routing tests. The 45 existing unit tests cover pure-Python helpers, not the strategy's branching. Seven new tests use a mock LLM (returning canned cypher), a mock graph (returning canned result_sets), and a stub _FakeFallback to assert which sub-path fires for each intent + graph-state combination. Patterns covered: rag intent → fallback, aggregation + rows → cypher_table, numeric → python math, numeric extraction empty → fallback (with numeric_fail tag), negation + empty cypher → No (no delegation), positive-existential + empty cypher → fallback (with cypher_empty tag), topology-violation warning. R10 — ruff check pass. Dropped two unused imports (RetrieverError, ChatMessage) and sorted the test_cypher_first imports. All 75 facade + 57 cypher_first + 49 cypher_generation tests pass. Full SDK suite 686 passed, 3 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…extractor R5 — Split CypherFirstAggregationStrategy into composable path classes. The strategy file went from 944 lines with five sub-paths crammed into the strategy class to a small dispatcher (~80 lines) backed by four focused _AggregationPath subclasses: - _RagDelegationPath — intent="rag" → fallback verbatim - _NumericMathPath — intent="numeric_math" → Python arithmetic - _SharedPropertyHybridPath — "BOTH A and B" / "same X as Z" via chunks - _MultiCandidateCypherPath — K parallel cypher candidates + table (also owns the negation-empty and cypher-empty-fallback branches) Each path is a small class with a single `maybe_handle(query, ctx)` method that returns either a final `RawSearchResult` or `None` to defer. The strategy's `_execute` dispatches by intent and consults the relevant paths in order. Existing routing tests (TestStrategyRouting, 7 cases) keep covering the contract end-to-end; the helper classes are implementation detail. Pure refactor — no behaviour change. All paths share state via a single reference to the parent strategy, so callers' constructor signature is unchanged. R8 — Pluggable phrase extractor for the shared-property hybrid. The default role/project regexes target the prose patterns the SDK's GraphExtraction pipeline produces ("works at X as a <role>", "contributes to <project>") and a closed set of role-suffix words. Domain-specific use cases (medical, legal, e-commerce, non-English) need different vocabularies. Added a `PhraseExtractor` ABC and a `DefaultPhraseExtractor` that wraps the existing regexes. Strategies accept a `phrase_extractor=` parameter on construction; the shared-property hybrid consults it instead of the module-level `_extract_phrases`. Domain users can now subclass and inject without forking. Exports added at strategies/__init__.py and top-level graphrag_sdk namespace. Tests ----- - 1 unit test for DefaultPhraseExtractor (matches the default regexes, unknown kinds return empty set rather than raising). - 1 mocked end-to-end test confirming that a custom extractor passed to the strategy is consulted by the hybrid path (intersection computed over the custom vocabulary, not the default). All 763 unit tests pass; ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-14T14:20:36Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97f9ff97-d0b7-4612-a434-34ce0c9c39bc

📥 Commits

Reviewing files that changed from the base of the PR and between a90ad18 and d818eb1.

📒 Files selected for processing (10)

graphrag_sdk/src/graphrag_sdk/__init__.py
graphrag_sdk/src/graphrag_sdk/api/main.py
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/__init__.py
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/multi_path.py
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/result_assembly.py
graphrag_sdk/tests/test_cypher_first.py
graphrag_sdk/tests/test_cypher_generation.py
graphrag_sdk/tests/test_facade.py

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive Cypher-first aggregation strategy that detects query intent and routes to specialized execution paths: numeric-math (Cypher-based numeric computation), shared-property hybrid (fuzzy intersection over extracted phrases), and multi-candidate Cypher-table (parallel K-candidate generation). Supporting enhancements include auto-injected row limits, pure-aggregation detection, label-widening fallback on 0-row results, result cap increase from 20 to 100, and authority rule injection into completion prompts.

Changes

Cypher-First Aggregation Strategy

Layer / File(s)	Summary
Public API Exports and Module Wiring `graphrag_sdk/src/graphrag_sdk/__init__.py`, `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/__init__.py`	Exposes `CypherFirstAggregationStrategy`, `DefaultPhraseExtractor`, and `PhraseExtractor` via package imports and `__all__` lists.
Intent Detection and Query Classification `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py` (lines 1–140)	Detects intent as aggregation, numeric-math, or RAG; classifies query shapes (yes/no, which-list); identifies negation-existential patterns for routing.
Numeric-Math Execution Path `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py` (lines 432–564)	Generates Cypher for raw numeric values, coerces result cells to numbers in Python, computes average/sum/median, and falls back to RAG if extraction yields no values.
Shared-Property Hybrid Path `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py` (lines 565–711)	Executes batched Cypher to fetch descriptions, extracts role/project phrases via pluggable regex extractors, and computes "both A and B" intersections using fuzzy token matching.
Markdown Table Rendering and Utilities `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py` (lines 286–343, 1030–1061)	Converts FalkorDB result sets to markdown tables with header synthesis and truncation; coerces result cells to numeric values; tags results with strategy-path metadata.
Multi-Candidate Cypher-Table Path `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py` (lines 713–868)	Generates K candidate Cypher queries in parallel, executes all in parallel, ranks by row count, and formats top result as markdown table; handles negation-existential "No" and fallback branches.
CypherFirstAggregationStrategy Class `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py` (lines 870–1022)	Wires embedder, LLM, fallback strategy, and phrase extractor; routes queries in `_execute` based on detected intent with prioritization of shared-property hybrid before multi-candidate paths.
Cypher Generation Validation and Limits `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py` (lines 44–323)	Auto-injects default row limit (100); detects pure aggregation to skip LIMIT injection; rejects dotted-namespace function calls; updates prompt examples for intersection semantics.
Label-Widening Fallback and Result Parsing `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py` (lines 391–529)	Widens typed entity labels (`:Person` → `:__Entity__`) and re-executes once when initial query returns 0 rows and is eligible; adds result parsing and enriched metadata (row counts, fallback flags).
Multi-Path Integration and Result Assembly `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/multi_path.py`, `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/result_assembly.py`	Threads metadata from Cypher execution through result unpacking; increases result cap to 100; adds "Authoritative Graph Query Results" heading; merges cypher_metadata with prefixed keys.
Cypher Authority Rule and Prompt Integration `graphrag_sdk/src/graphrag_sdk/api/main.py`	Introduces cypher-authority system prompt rule and `_has_authoritative_cypher_results` helper; conditionally appends rule to default system prompt during completion.
Comprehensive Test Coverage `graphrag_sdk/tests/test_cypher_first.py`, `graphrag_sdk/tests/test_cypher_generation.py`, `graphrag_sdk/tests/test_facade.py`	Tests for intent detection, query classification, phrase extraction, fuzzy intersection, markdown table formatting, numeric coercion, path tagging, end-to-end routing, validation rules, label-widening fallback, and authority rule injection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

Naseem77

Poem

🐰 Behold! A Cypher-first hop through intent-aware paths,
Where numeric aggregates compute via Python's math,
Shared properties dance with fuzzy token grace,
Multi-candidates race to win the query race!
From zero rows widened to authority's glow,
GraphRAG strategies flourish and grow. 🌱

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main additions in this PR: a new CypherFirstAggregationStrategy and multiple cypher-path accuracy improvements across multiple modules.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/cypher-aggregation-accuracy

⚔️ Resolve merge conflicts

Resolve merge conflict in branch feat/cypher-aggregation-accuracy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

galshubeli and others added 4 commits May 10, 2026 17:44

galshubeli mentioned this pull request May 14, 2026

docs(retrieval): recommend SentenceTokenCapChunking for CypherFirst #253

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retrieval): CypherFirstAggregationStrategy + cypher-path accuracy fixes#255

feat(retrieval): CypherFirstAggregationStrategy + cypher-path accuracy fixes#255
galshubeli wants to merge 4 commits into
mainfrom
feat/cypher-aggregation-accuracy

galshubeli commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

galshubeli commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

What's in this PR (4 commits)

Compatibility (cypher-off callers)

Usage

Related PRs

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

galshubeli commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading