perf: reduce detection overhead from ~1.30x to ~1.03x#52
Merged
Conversation
…h → overhead_ratio=1.06 On the hot path (notify), use sys._getframe() to walk raw frame objects instead of inspect.stack(context=0) which allocates FrameInfo named tuples for every frame. Also skip storing full stacks in context.calls when ZEAL_SHOW_ALL_CALLERS is not enabled — just store empty lists for counting. The full inspect.stack() path is preserved for when ZEAL_SHOW_ALL_CALLERS is enabled, maintaining full backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…() calls → overhead_ratio=1.05 After _alert() determines a (model, field) pair is allowlisted, cache that result in the context so subsequent notify() calls for the same pair skip the expensive _alert() path entirely (message formatting, allowlist property allocation, fnmatch checks). Also exclude auto/ and .venv/ from ruff and pyright checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… overhead_ratio=1.04 - Remove @functools.wraps from hot-path closures in patch_queryset_fetch_all and patch_queryset_function (avoids update_wrapper overhead on every queryset) - Use tuple key (model, field, fn, lineno) instead of f-string in notify() fast path (avoids string allocation per call) - Append None instead of [] to calls list (avoids empty list allocation per call) - Cache calls[key] in local variable to avoid redundant dict lookup - Remove redundant _nplusone_context.set(context) from notify() and ignore() (the context is mutated in-place; .set() with the same object just wastes a Token allocation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…S path → overhead_ratio=1.05, overhead_ratio_allcallers=1.07 Replace get_stack() (which calls inspect.stack(context=0) creating expensive FrameInfo named tuples) with get_stack_fast() using sys._getframe() to build lightweight (filename, lineno, funcname) tuples. Also eliminate redundant get_stack() call in _alert() by using get_caller_fast() for the non-SHOW_ALL_CALLERS path and deriving caller info from the already-captured stack for SHOW_ALL_CALLERS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…frame filtering → overhead_ratio=1.05, overhead_ratio_allcallers=1.06 Replace `any(pattern in fn for pattern in PATTERNS)` with two direct `"site-packages" not in fn and "/zeal/" not in fn` checks in get_caller_fast() and get_stack_fast(). Micro-benchmarks show this is ~5x faster for the per-frame pattern matching, eliminating generator object allocation and 4-item iteration on every frame of the call stack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…asattr() overhead → overhead_ratio=1.04, overhead_ratio_allcallers=1.03 Cache ZEAL_SHOW_ALL_CALLERS and ZEAL_NPLUSONE_THRESHOLD on the NPlusOneContext dataclass using lazy initialization (None sentinel). On the first notify() call per context, the settings are read via hasattr() and cached; subsequent calls (~429 per workload) use the cached value directly, avoiding ~1.2us of hasattr(settings, ...) overhead per call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion 7) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update the README warning from "2.5x slower" to "~3-5% overhead" based on benchmarking results from 6 optimization iterations. Add auto/ directory with benchmark and autoresearch scripts used to drive the optimizations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merging this PR will improve performance by ×6.3
Performance Changes
Comparing |
…ions Remove unused PATTERNS list, old get_stack()/get_caller() that used inspect.stack(), and the unused inspect import. Rename get_caller_fast() → get_caller() and get_stack_fast() → get_stack(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…al/" substring The "/zeal/" substring check would incorrectly filter out user code if their project path contained "/zeal/" (e.g. /home/user/zeal-app/). Now uses os.path.dirname(__file__) to check against the actual zeal package directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensures zeal internals and site-packages are correctly identified as internal frames, while user code — including projects with "zeal" in the path — is not filtered out. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
inspect.stack()withsys._getframe()— the biggest win. The hot path (notify()) now walks raw frame objects to find the caller instead of creatingFrameInfonamed tuples for every frame in the stack. Both the default path and theSHOW_ALL_CALLERSpath are optimized.(model, field)pairs so repeated N+1s on the same field skip_alert()entirely after the first check. Removed redundantget_stack()call in_alert().@functools.wrapsfrom hot-path closures (profiling showedupdate_wrapperwas the top zeal-specific cost), switched to tuple keys, dropped unnecessaryContextVar.set()calls.ZEAL_NPLUSONE_THRESHOLDandZEAL_SHOW_ALL_CALLERSare now read once per context instead of callinghasattr(settings, ...)on everynotify().Benchmark results
Overhead ratio (zeal-enabled time / baseline time, lower is better):
SHOW_ALL_CALLERS=TrueAbsolute time on the test suite workload: 95.5ms → 72.3ms (-24%).
Methodology
Developed through 6 automated experiments using an autoresearch loop (inspired by Shopify/liquid#2056): edit one thing → run tests → benchmark → keep/discard. Each change was validated against the full test suite before benchmarking. Iteration 7 confirmed we hit the noise floor (~0.23ms real overhead on a ~74ms workload).
The
auto/directory contains the benchmark infrastructure for future optimization work.Test plan
ZEAL_SHOW_ALL_CALLERSfeature preserved and tested🤖 Generated with Claude Code