Skip to content

perf(ingester): lazy regex evaluation on head postings cache miss#7553

Open
alanprot wants to merge 1 commit into
cortexproject:masterfrom
alanprot:lazy-posting
Open

perf(ingester): lazy regex evaluation on head postings cache miss#7553
alanprot wants to merge 1 commit into
cortexproject:masterfrom
alanprot:lazy-posting

Conversation

@alanprot
Copy link
Copy Markdown
Member

Lazy regex evaluation on head postings cache miss

When the expanded postings cache misses on the head block, regex matchers on high-cardinality labels (e.g., pod with 400K+ values) dominate query cost — the regex runs against every label value to build the posting list.
This PR defers expensive regex matchers to a lazy per-series evaluation when a selective equality matcher (like __name__=) already narrows the result set significantly.

How it works

On cache miss, splitMatchersForHeadWithConfig splits matchers into:

  • Selective matchers (equality, low-card regex) → used for postings lookup
  • Lazy matchers (high-card regex) → applied per-series via LabelValueFor after the selective postings are resolved
    A cost-ratio gate decides when deferral is worthwhile:
  • Simple regex (single contains, prefix): deferred when cardinality > selectivePostings × 6
  • Complex regex (multi-substring, capture groups): deferred when cardinality > selectivePostings × 2
    Label cardinality lookups are cached in an expirable LRU (60s TTL) to avoid repeated LabelValues calls under load.

Benchmark results (realistic pod names, 413K cardinality, 9K selective postings)

Path Time Memory
Eager (before) 62 ms 29.8 MB
Lazy (this PR) 14 ms 12.6 MB
4.5× faster, 58% less memory per query.

Configuration

Three new flags (all disabled by default — max-cardinality=0):

-blocks-storage.expanded_postings_cache.head.lazy-matcher-max-cardinality
-blocks-storage.expanded_postings_cache.head.lazy-matcher-simple-cost-ratio
-blocks-storage.expanded_postings_cache.head.lazy-matcher-complex-cost-ratio

Testing

  • Unit tests for the gate logic and cost classification
  • Integration fuzz test (TestLazyMatchersFuzz): 300 fuzzed queries + injected regex patterns compared between eager and lazy instances — 450+ lazy triggers, zero mismatches
  • Correctness verified by intentionally breaking the filter and confirming the test catches it (445 failures)
    Checklist
  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

@alanprot alanprot marked this pull request as ready for review May 22, 2026 17:43
When the expanded postings cache misses on the head block, regex matchers
on high-cardinality labels (e.g. pod with 400K+ values) dominate query
cost. This PR defers expensive regex matchers to a lazy per-series
evaluation when a selective equality matcher already narrows the result
set significantly.

On cache miss, splitMatchersForHeadWithConfig splits matchers into:
- Selective matchers (equality, low-card regex) for postings lookup
- Lazy matchers (high-card regex) applied per-series via LabelValueFor

A cost-ratio gate decides when deferral is worthwhile:
- Simple regex (single contains, prefix): cardinality > selectivePostings * 6
- Complex regex (multi-substring, capture groups): cardinality > selectivePostings * 2

Label cardinality lookups are cached in an expirable LRU (60s TTL) to
avoid repeated LabelValues calls under load.

Benchmark (realistic pod names, 413K cardinality, 9K selective postings):
- Eager: 62ms, 29.8MB per query
- Lazy:  14ms, 12.6MB per query (4.5x faster, 58% less memory)

New flags (disabled by default with max-cardinality=0):
- blocks-storage.expanded_postings_cache.head.lazy-matcher-max-cardinality
- blocks-storage.expanded_postings_cache.head.lazy-matcher-simple-cost-ratio
- blocks-storage.expanded_postings_cache.head.lazy-matcher-complex-cost-ratio
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant