perf: reduce detection overhead from ~1.30x to ~1.03x by taobojlen · Pull Request #52 · taobojlen/django-zeal

taobojlen · 2026-03-18T11:41:00Z

Summary

Replaced inspect.stack() with sys._getframe() — the biggest win. The hot path (notify()) now walks raw frame objects to find the caller instead of creating FrameInfo named tuples for every frame in the stack. Both the default path and the SHOW_ALL_CALLERS path are optimized.
Eliminated redundant work on the alert path — cached allowlisted (model, field) pairs so repeated N+1s on the same field skip _alert() entirely after the first check. Removed redundant get_stack() call in _alert().
Reduced per-call allocations — removed @functools.wraps from hot-path closures (profiling showed update_wrapper was the top zeal-specific cost), switched to tuple keys, dropped unnecessary ContextVar.set() calls.
Cached settings lookups — ZEAL_NPLUSONE_THRESHOLD and ZEAL_SHOW_ALL_CALLERS are now read once per context instead of calling hasattr(settings, ...) on every notify().
Updated README — overhead claim updated from "2.5x slower" to "~3-5% overhead".

Benchmark results

Overhead ratio (zeal-enabled time / baseline time, lower is better):

Path	Before	After
Default	1.30x	~1.03x
`SHOW_ALL_CALLERS=True`	1.32x	~1.03x

Absolute time on the test suite workload: 95.5ms → 72.3ms (-24%).

Methodology

Developed through 6 automated experiments using an autoresearch loop (inspired by Shopify/liquid#2056): edit one thing → run tests → benchmark → keep/discard. Each change was validated against the full test suite before benchmarking. Iteration 7 confirmed we hit the noise floor (~0.23ms real overhead on a ~74ms workload).

The auto/ directory contains the benchmark infrastructure for future optimization work.

Test plan

All 57 existing unit tests pass (unchanged)
ZEAL_SHOW_ALL_CALLERS feature preserved and tested
Benchmarked across all iterations to verify progressive improvement
Profiled at noise floor to confirm no remaining significant overhead

🤖 Generated with Claude Code

…h → overhead_ratio=1.06 On the hot path (notify), use sys._getframe() to walk raw frame objects instead of inspect.stack(context=0) which allocates FrameInfo named tuples for every frame. Also skip storing full stacks in context.calls when ZEAL_SHOW_ALL_CALLERS is not enabled — just store empty lists for counting. The full inspect.stack() path is preserved for when ZEAL_SHOW_ALL_CALLERS is enabled, maintaining full backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…() calls → overhead_ratio=1.05 After _alert() determines a (model, field) pair is allowlisted, cache that result in the context so subsequent notify() calls for the same pair skip the expensive _alert() path entirely (message formatting, allowlist property allocation, fnmatch checks). Also exclude auto/ and .venv/ from ruff and pyright checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… overhead_ratio=1.04 - Remove @functools.wraps from hot-path closures in patch_queryset_fetch_all and patch_queryset_function (avoids update_wrapper overhead on every queryset) - Use tuple key (model, field, fn, lineno) instead of f-string in notify() fast path (avoids string allocation per call) - Append None instead of [] to calls list (avoids empty list allocation per call) - Cache calls[key] in local variable to avoid redundant dict lookup - Remove redundant _nplusone_context.set(context) from notify() and ignore() (the context is mutated in-place; .set() with the same object just wastes a Token allocation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…S path → overhead_ratio=1.05, overhead_ratio_allcallers=1.07 Replace get_stack() (which calls inspect.stack(context=0) creating expensive FrameInfo named tuples) with get_stack_fast() using sys._getframe() to build lightweight (filename, lineno, funcname) tuples. Also eliminate redundant get_stack() call in _alert() by using get_caller_fast() for the non-SHOW_ALL_CALLERS path and deriving caller info from the already-captured stack for SHOW_ALL_CALLERS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…frame filtering → overhead_ratio=1.05, overhead_ratio_allcallers=1.06 Replace `any(pattern in fn for pattern in PATTERNS)` with two direct `"site-packages" not in fn and "/zeal/" not in fn` checks in get_caller_fast() and get_stack_fast(). Micro-benchmarks show this is ~5x faster for the per-frame pattern matching, eliminating generator object allocation and 4-item iteration on every frame of the call stack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…asattr() overhead → overhead_ratio=1.04, overhead_ratio_allcallers=1.03 Cache ZEAL_SHOW_ALL_CALLERS and ZEAL_NPLUSONE_THRESHOLD on the NPlusOneContext dataclass using lazy initialization (None sentinel). On the first notify() call per context, the settings are read via hasattr() and cached; subsequent calls (~429 per workload) use the cached value directly, avoiding ~1.2us of hasattr(settings, ...) overhead per call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tion 7) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update the README warning from "2.5x slower" to "~3-5% overhead" based on benchmarking results from 6 optimization iterations. Add auto/ directory with benchmark and autoresearch scripts used to drive the optimizations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codspeed-hq · 2026-03-18T11:42:29Z

Merging this PR will improve performance by ×6.3

⚡ 1 improved benchmark

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	`test_performance`	1,957 ms	310.5 ms	×6.3

_{Comparing perf/autoresearch-optimizations (9bfad41) with main (2aabdaa)}

…ions Remove unused PATTERNS list, old get_stack()/get_caller() that used inspect.stack(), and the unused inspect import. Rename get_caller_fast() → get_caller() and get_stack_fast() → get_stack(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…al/" substring The "/zeal/" substring check would incorrectly filter out user code if their project path contained "/zeal/" (e.g. /home/user/zeal-app/). Now uses os.path.dirname(__file__) to check against the actual zeal package directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ensures zeal internals and site-packages are correctly identified as internal frames, while user code — including projects with "zeal" in the path — is not filtered out. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

taobojlen and others added 12 commits March 18, 2026 10:42

docs: update autoresearch.md progress log with iteration 3 results

c08a60f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: update autoresearch.md progress log with iteration 4 results

289de8a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: update autoresearch.md progress log with iteration 5 results

1fb2807

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: update autoresearch.md progress log with iteration 6 results

b92f423

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: record noise floor analysis in autoresearch progress log (itera…

3f29157

…tion 7) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

taobojlen and others added 3 commits March 18, 2026 11:47

taobojlen merged commit 6b15475 into main Mar 18, 2026
24 checks passed

taobojlen mentioned this pull request Mar 18, 2026

chore(main): release 2.0.5 #55

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce detection overhead from ~1.30x to ~1.03x#52

perf: reduce detection overhead from ~1.30x to ~1.03x#52
taobojlen merged 15 commits intomainfrom
perf/autoresearch-optimizations

taobojlen commented Mar 18, 2026

Uh oh!

codspeed-hq bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

taobojlen commented Mar 18, 2026

Summary

Benchmark results

Methodology

Test plan

Uh oh!

codspeed-hq bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by ×6.3

Performance Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codspeed-hq bot commented Mar 18, 2026 •

edited

Loading