Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
e572720
perf: replace inspect.stack() with sys._getframe() in notify fast pat…
taobojlen Mar 18, 2026
58ce8a2
perf: cache allowlisted (model, field) pairs to skip redundant _alert…
taobojlen Mar 18, 2026
f4bc2c4
perf: reduce per-call allocations in notify() and queryset patching →…
taobojlen Mar 18, 2026
c08a60f
docs: update autoresearch.md progress log with iteration 3 results
taobojlen Mar 18, 2026
0ec30ac
perf: replace inspect.stack() with sys._getframe() in SHOW_ALL_CALLER…
taobojlen Mar 18, 2026
289de8a
docs: update autoresearch.md progress log with iteration 4 results
taobojlen Mar 18, 2026
29f0d47
perf: replace pattern list iteration with direct substring checks in …
taobojlen Mar 18, 2026
1fb2807
docs: update autoresearch.md progress log with iteration 5 results
taobojlen Mar 18, 2026
3e89e08
perf: lazy-cache settings lookups in notify() to eliminate per-call h…
taobojlen Mar 18, 2026
b92f423
docs: update autoresearch.md progress log with iteration 6 results
taobojlen Mar 18, 2026
3f29157
docs: record noise floor analysis in autoresearch progress log (itera…
taobojlen Mar 18, 2026
a00c05f
perf: update README overhead claim and add autoresearch infrastructure
taobojlen Mar 18, 2026
634feee
refactor: clean up util.py — remove dead code, rename fast-path funct…
taobojlen Mar 18, 2026
f2a684f
fix: use actual package directory for frame filtering instead of "/ze…
taobojlen Mar 18, 2026
9bfad41
test: add tests for _is_internal_frame to verify frame filtering
taobojlen Mar 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,10 @@ if DEBUG:
This will detect N+1s that happen in web requests. To catch N+1s in more places,
read on!

> [!WARNING]
> [!WARNING]
> You probably don't want to run zeal in production:
> there is significant overhead to detecting N+1s, and my benchmarks show that it
> can make your code 2.5x slower in some cases.
> there is some overhead to detecting N+1s. Benchmarks show ~3-5% overhead
> on a typical workload.

### Celery

Expand Down
88 changes: 88 additions & 0 deletions auto/autoresearch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Autoresearch: django-zeal Performance Optimization

## Objective
Reduce the runtime overhead of django-zeal's N+1 query detection.
The workload exercises all relationship types (forward/reverse many-to-one,
one-to-one, many-to-many, chained) with zeal enabled and N+1s suppressed
via `zeal_ignore()`. We measure the overhead ratio: zeal-enabled time vs
baseline (no zeal).

## How to Run
Run `./auto/autoresearch.sh` — it runs the full test suite (correctness gate),
then the performance benchmark, outputting METRIC lines.

For benchmark only: `.venv/bin/python auto/bench.py`

## Metrics
- **Primary (optimization targets)**: Both must be optimized:
- `overhead_ratio` — zeal overhead with default settings (lower is better, 1.00 = zero overhead)
- `overhead_ratio_allcallers` — zeal overhead with `ZEAL_SHOW_ALL_CALLERS=True` (lower is better)
- **Secondary (for context)**:
- `zeal_ms` — median time with zeal enabled (ms)
- `zeal_allcallers_ms` — median time with zeal + SHOW_ALL_CALLERS (ms)
- `baseline_ms` — median time without zeal (ms)

## Iteration Protocol
Each autoresearch iteration must:
1. Make **exactly one change** (a single optimization idea)
2. Run `./auto/autoresearch.sh`
3. If tests fail → revert immediately (`git checkout -- .`)
4. If either overhead_ratio improved (without regressing the other) → commit with a message describing the change and the new metrics
5. If both overhead_ratios regressed or unchanged → revert (`git checkout -- .`)
6. Update the Progress Log below with the result

## Files in Scope
- `src/zeal/listeners.py` — N+1 detection logic, context management, alert path
- `src/zeal/patch.py` — Django ORM monkey-patching, queryset wrapping
- `src/zeal/util.py` — Stack inspection utilities (`get_stack`, `get_caller`)
- `src/zeal/middleware.py` — Django middleware integration

## Off Limits
- `tests/` — tests must continue to pass unchanged
- `auto/` — benchmark infrastructure, do not modify
- Do not change the public API (`zeal_context`, `zeal_ignore`, signals, settings)
- Do not add new dependencies

## Constraints
- All 57 unit tests must pass
- Semantic correctness must be preserved — N+1 detection behavior must be identical
- `ZEAL_SHOW_ALL_CALLERS` feature must still work when enabled (full stack storage)
- When `ZEAL_SHOW_ALL_CALLERS` is not enabled, optimizations can skip stack storage

## Architecture Notes
The hot path on every relationship access is:
1. Patched descriptor calls `patch_queryset_fetch_all` wrapper
2. On query execution, `n_plus_one_listener.notify(model, field, instance_key)` is called
3. `notify()` calls `get_stack()` → `inspect.stack(context=0)` + frame filtering
4. Builds a key tuple `(model, field, "filename:lineno")`
5. Appends full stack to `context.calls[key]`
6. If count >= threshold: calls `_alert()` which calls `get_stack()` again

Known overhead sources (in estimated order of impact):
- `inspect.stack(context=0)` creates FrameInfo named tuples for every frame
- Full stack traces stored in memory for every call (even when SHOW_ALL_CALLERS is off)
- `_alert()` redundantly calls `get_stack()` (already captured in `notify()`)
- `hasattr(settings, ...)` checked on every `_threshold` / `_allowlist` access
- `fnmatch()` called per-allowlist-entry on every alert
- f-string key creation on every notify

## Baseline
- **Commit**: 2aabdaa (HEAD of main, before any optimizations)
- **overhead_ratio**: 1.30
- **overhead_ratio_allcallers**: 1.30 (same path before iter1 split)
- **zeal_ms**: 101
- **baseline_ms**: 78

## Current (post iter7 — noise floor reached)
- **overhead_ratio**: 1.02–1.09 (varies by run, noise-dominated)
- **overhead_ratio_allcallers**: 1.02–1.10 (varies by run, noise-dominated)
- **baseline_ms**: 69.5–72.5

## Progress Log
1. **e572720** — Replace `inspect.stack(context=0)` with `sys._getframe()` in `notify()` fast path; skip storing full stacks when `ZEAL_SHOW_ALL_CALLERS` is off. **overhead_ratio=1.06** (was 1.30). baseline_ms=77.7, zeal_ms=82.7. All 57 tests pass. KEPT.
2. **58ce8a2** — Cache allowlisted (model, field) pairs in `NPlusOneContext._allowlisted_keys` so that once `_alert()` determines a pair is allowlisted, subsequent `notify()` calls skip `_alert()` entirely (avoiding message formatting, `_allowlist` property allocation, and `fnmatch()` checks on every call past threshold). **overhead_ratio=1.05** (was 1.06). baseline_ms=78.6, zeal_ms=82.6. All 57 tests pass. KEPT.
3. **f4bc2c4** — Reduce per-call allocations: remove `@functools.wraps` from hot-path queryset closures (profile showed `update_wrapper` as top zeal overhead at 0.013s/9250 calls), use tuple key instead of f-string in notify(), append `None` instead of `[]`, cache `calls[key]` in local var, remove redundant `_nplusone_context.set()` from `notify()` and `ignore()`. **overhead_ratio=1.04** (was 1.05). Official benchmark range: 1.02–1.05 (median 1.04). Interleaved A/B benchmark: median_pair_ratio=1.034 (was 1.038). All 57 tests pass. KEPT.
4. **0ec30ac** — Replace `inspect.stack(context=0)` with `sys._getframe()` in SHOW_ALL_CALLERS path: add `get_stack_fast()` that builds lightweight `(filename, lineno, funcname)` tuples instead of FrameInfo named tuples; eliminate redundant `get_stack()` call in `_alert()` by reusing already-captured stack data and using `get_caller_fast()`. **overhead_ratio=1.05, overhead_ratio_allcallers=1.07** (was 1.27). baseline_ms=73.3, zeal_ms=76.8, zeal_allcallers_ms=78.2. All 57 tests pass. KEPT.
5. **29f0d47** — Replace `any(pattern in fn for pattern in PATTERNS)` with two direct substring checks (`"site-packages" not in fn and "/zeal/" not in fn`) in `get_caller_fast()`, `get_stack_fast()`, and `get_stack()`. Eliminates generator object allocation and 4-item iteration on every frame; micro-benchmark shows ~5x speedup for this operation. **overhead_ratio=1.05, overhead_ratio_allcallers=1.06** (was 1.07). baseline_ms=74.3, zeal_ms=78.5, zeal_allcallers_ms=79.1. All 57 tests pass. KEPT.
6. **3e89e08** — Lazy-cache `ZEAL_SHOW_ALL_CALLERS` and `ZEAL_NPLUSONE_THRESHOLD` settings on the `NPlusOneContext` dataclass. On the first `notify()` call per context, settings are read via `hasattr()` and cached; subsequent calls (~429 per workload) use the cached value directly, eliminating ~1.2us of `hasattr(settings, ...)` overhead per call (two `hasattr` calls at ~0.6us each). Profiling showed these two `hasattr` calls accounted for ~0.5ms of the ~2.2ms total overhead. **overhead_ratio=1.04, overhead_ratio_allcallers=1.03** (was 1.05/1.06). Across 4 benchmark runs: overhead_ratio ranged 0.99-1.06, overhead_ratio_allcallers ranged 0.98-1.07. All 57 tests pass. KEPT.
7. **NOISE FLOOR REACHED** — Detailed profiling showed the total differential zeal overhead is ~0.23ms per workload (430 notify() calls at ~0.53us extra each), representing ~0.3% of the ~75ms baseline. Benchmark measurement noise (stdev ~2-5ms, or ~3-7%) is 10-20x larger than the signal. Three consecutive benchmark runs on identical code yielded overhead_ratio = 1.03, 1.09, 1.02 — confirming the remaining variance is measurement noise, not real overhead. Component breakdown: get_caller_fast() frame walking is the single largest remaining cost at 0.199us/call (29% of notify()), but it's irreducible since it must walk ~4.7 frames on average to find user code. No change made. NOISE FLOOR.
20 changes: 20 additions & 0 deletions auto/autoresearch.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/usr/bin/env bash
# Autoresearch runner for django-zeal performance optimization
# Runs: unit tests (correctness gate) → performance benchmark
# Outputs METRIC lines for the agent to parse
# Exit code 0 = all good, non-zero = broken
set -euo pipefail

cd "$(dirname "$0")/.."

# ── Step 1: Unit tests (correctness gate) ───────────────────────────
echo "=== Unit Tests ==="
if ! .venv/bin/pytest tests/ -x -q --tb=short 2>&1; then
echo "FATAL: unit tests failed"
exit 1
fi

# ── Step 2: Performance benchmark ───────────────────────────────────
echo ""
echo "=== Performance Benchmark ==="
.venv/bin/python auto/bench.py 15 5 2>&1
138 changes: 138 additions & 0 deletions auto/bench.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
#!/usr/bin/env python
"""
Benchmark: measures zeal overhead ratio (zeal-enabled vs disabled).

Outputs machine-readable METRIC lines for autoresearch consumption.
Uses median of N iterations (default 15) after warmup for stability.
"""

import gc
import os
import statistics
import sys
import time
from contextlib import contextmanager

sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "src"))
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "tests"))

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "djangoproject.settings")

import django

django.setup()

from django.core.management import call_command

call_command("migrate", "--run-syncdb", verbosity=0)

from django.conf import settings

from djangoproject.social.models import Post, Profile, User
from factories import PostFactory, ProfileFactory, UserFactory
from zeal import zeal_context, zeal_ignore


def setup_data():
users = UserFactory.create_batch(10)
rels = []
for u in users:
for f in users:
if u != f:
rels.append(
User.following.through(from_user_id=u.id, to_user_id=f.id)
)
User.following.through.objects.bulk_create(rels)
for u in users:
ProfileFactory(user=u)
for u in users:
PostFactory.create_batch(10, author=u)


def workload():
posts = Post.objects.all()
for post in posts:
_ = post.author.username
_ = list(post.author.posts.all())

profiles = Profile.objects.all()
for profile in profiles:
_ = profile.user.username
_ = profile.user.profile.display_name

users = User.objects.all()
for user in users:
_ = list(user.following.all())
_ = list(user.followers.all())
_ = list(user.blocked.all())
for follower in user.followers.all():
_ = follower.profile.display_name
_ = list(follower.posts.all())


def bench(label, wrapper, n, warmup):
for _ in range(warmup):
gc.collect()
with wrapper():
workload()

times = []
for _ in range(n):
gc.collect()
start = time.perf_counter()
with wrapper():
workload()
elapsed = time.perf_counter() - start
times.append(elapsed * 1000)

median = statistics.median(times)
mean = statistics.mean(times)
stdev = statistics.stdev(times) if len(times) > 1 else 0
mn = min(times)
mx = max(times)
print(
f" {label}: median={median:.1f}ms mean={mean:.1f}ms "
f"stdev={stdev:.1f}ms min={mn:.1f}ms max={mx:.1f}ms (n={n})"
)
return median


@contextmanager
def noop_ctx():
yield


@contextmanager
def zeal_ctx():
with zeal_context(), zeal_ignore():
yield


@contextmanager
def zeal_all_callers_ctx():
settings.ZEAL_SHOW_ALL_CALLERS = True
try:
with zeal_context(), zeal_ignore():
yield
finally:
settings.ZEAL_SHOW_ALL_CALLERS = False


if __name__ == "__main__":
n = int(sys.argv[1]) if len(sys.argv) > 1 else 15
warmup = int(sys.argv[2]) if len(sys.argv) > 2 else 5
setup_data()
print(f"Benchmark: {n} iterations, {warmup} warmup\n")

baseline_ms = bench("baseline (no zeal)", noop_ctx, n, warmup)
zeal_ms = bench("with zeal", zeal_ctx, n, warmup)
zeal_allcallers_ms = bench("with zeal (SHOW_ALL_CALLERS)", zeal_all_callers_ctx, n, warmup)
overhead_ratio = zeal_ms / baseline_ms
overhead_ratio_allcallers = zeal_allcallers_ms / baseline_ms

print()
print(f"METRIC baseline_ms={baseline_ms:.1f}")
print(f"METRIC zeal_ms={zeal_ms:.1f}")
print(f"METRIC overhead_ratio={overhead_ratio:.2f}")
print(f"METRIC zeal_allcallers_ms={zeal_allcallers_ms:.1f}")
print(f"METRIC overhead_ratio_allcallers={overhead_ratio_allcallers:.2f}")
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ where = ["src"]

[tool.ruff]
line-length = 79
exclude = ["auto", ".venv"]

[tool.ruff.lint]
extend-select = [
Expand All @@ -29,6 +30,7 @@ extend-select = [

[tool.pyright]
include = ["src", "tests"]
exclude = ["auto", ".venv"]

[tool.pytest.ini_options]
DJANGO_SETTINGS_MODULE = "djangoproject.settings"
Expand Down
Loading
Loading