#21025: Optimize rendering of config context data by jeremystretch · Pull Request #22294 · netbox-community/netbox

jeremystretch · 2026-05-26T19:50:53Z

Closes: #21025

Add fields on Device and VirtualMachine models (via ConfigContextModel):
- _config_context_data JSON field to stored cached context data
- _config_context_generation integer field to serialize recalculations (for race condition avoidance)
On ConfigContextModel, split get_config_context() into a new render_config_context() method and default to returning cached config context data (if present)
Add a get_affected_objects() method to ConfigContext, which returns the devices/VMs covered by its scope
Introduce signal handlers to automatically invalidate cached config context data for the relevant objects upon applicable changes to ConfigContexts or related objects
Introduce RenderConfigContextJob, a background job which caches config context data for all devices/VMs with null _config_context_data
Introduce the rebuild_config_context_cache management command (with --force option)
Remove the conditional *WithConfigContextSerializer REST API serializers for devices/VMs
Remove calls to annotate_config_context_data() from UI & GraphQL querysets for devices/VMs
Remove ConfigContextQuerySetMixin
Drop support for suppressing config_context data by passing ?exclude=config_context with REST API requests

jeremystretch · 2026-06-04T15:40:27Z

@claude /review

github-actions · 2026-06-04T15:40:49Z

Claude finished @jeremystretch's task in 6m 17s —— View job

PR Review: #21025 Optimize rendering of config context data

Gather context
Read key changed files
Analyze code quality, correctness, and conventions
Post review findings

Overview

This PR introduces a pre-rendered caching layer for config context data on Device and VirtualMachine objects — a significant performance improvement that eliminates the expensive on-demand annotation for every API/GraphQL request. The design is sound: a JSON cache column (_config_context_data), a generation counter for compare-and-set protection against concurrent invalidations, a comprehensive signal-based invalidation graph, and a background job for background repopulation.

The implementation is thorough. The ConfigContextScopeParityTest test is particularly well done — verifying that the forward matcher (get_for_object) and the inverse matcher (get_affected_objects) agree across every scope dimension is the right way to prove correctness of the inversion. The coverage of the compare-and-set race condition test (test_job_skips_stale_write_on_concurrent_invalidation) is impressive.

Issues Found

1. Top-level cross-app imports in `extras/cache.py`

netbox/extras/cache.py, lines 13–15:

from dcim.models import Device
from extras.jobs import RenderConfigContextJob
from virtualization.models import VirtualMachine

These are module-level imports from cross-app models. This is inconsistent with the pattern used in extras/signals.py, where Device and VirtualMachine are always imported lazily inside functions (e.g., from dcim.models import Device inside invalidate_on_device_vm_tag_change). The top-level imports in cache.py work at runtime (since cache.py is only loaded at ready() time when both apps are fully initialized), but they create a static import-time dependency that could cause issues in tests, management commands, or if the import order ever changes. Consider moving them to be lazy imports inside each function that needs them, matching the existing convention. Fix this →

2. `TenantGroup` in `mptt_triggers` vs direct-equality matching

netbox/extras/signals.py (in _connect_upstream_handlers) and netbox/extras/cache.py (mptt_attrs):

TenantGroup is listed in mptt_triggers, meaning reparenting a TenantGroup triggers _make_mptt_reparent_handler, which calls get_descendants(include_self=True) and invalidates all devices in that subtree. However, get_for_object in querysets.py uses direct equality for tenant_groups — Q(tenant_groups=tenant_group) — not ancestor traversal. This means reparenting a TenantGroup has no effect on which config contexts match a device (the device's tenant.group FK doesn't change), so the reparent handler would trigger unnecessary (though harmless) invalidations.

Similarly, in invalidate_for_scope_delta, tenant_groups is in direct_attrs (correct), but TenantGroup also appears in mptt_triggers for reparent handling (inconsistent). This over-invalidates but doesn't produce stale caches.

Worth confirming this is intentional, or simplify by removing TenantGroup from mptt_triggers if its matching is strictly direct.

3. `get_affected_objects()` docstring says it returns a tuple, but returns a generator-like tuple

netbox/extras/models/configs.py, lines ~218–230:

The docstring says "Return a (device_qs, vm_qs) tuple" which is accurate. invalidate_config_context_for_configcontext iterates it with for queryset in configcontext.get_affected_objects(). This works since the tuple yields device_qs and vm_qs in order, but invalidate_config_context_for_objects derives the model_label from queryset.model._meta.label_lower — so the order matters. This is implicit and fragile. Consider returning a dict or named tuple, or having get_affected_objects() return an iterable of (model_label, queryset) pairs explicitly.

4. `?exclude=config_context` is silently dropped — API breaking change not highlighted in PR body

The PR body mentions this: "Drop support for suppressing config_context data by passing ?exclude=config_context", and the docs add a note. However, config_context is now always returned in Device and VM API responses. For deployments with many devices and slow background job queues, the first response after an upgrade could be slower (falling back to on-demand render) and larger (always including the field). The upgrade warning in the docs is good; consider whether a deprecation notice was given in any prior version.

5. No test for `Cluster.type`/`group` change → VM cache invalidation

The _connect_upstream_handlers direct_triggers registers a handler for Cluster changes (type_id, group_id, site_id). But there's no test in test_configcontext_cache.py that exercises a Cluster.type or Cluster.group change and verifies the associated VM's cache is invalidated. Given that cluster-type and cluster-group are first-class ConfigContext scope dimensions, this is a gap worth covering.

6. No test for `Tenant.group` change → Device/VM cache invalidation

Similarly, ('tenancy', 'Tenant', ('group_id',), 'tenant_id', 'tenant_id') is in direct_triggers but the test suite doesn't have a test for "change tenant.group → device/VM cache is invalidated".

7. `RENDER_CONFIG_CONTEXT_MAX_PASSES = 100` — consider adding a warning for unusual pass counts

netbox/extras/jobs.py, line 22:

The multi-pass loop is correct and well-motivated. However, in normal operation, a single pass should suffice. More than 2–3 passes would indicate either a very busy system or an unexpected churn loop. Consider logging a warning after (say) 3 passes so operators have visibility into abnormal behavior, rather than silently burning through all 100 passes.

8. Upgrade time: `rebuild_config_context_cache` in `upgrade.sh` is synchronous

upgrade.sh, lines 104–109:

The upgrade script now blocks until all devices and VMs have their config context rendered. For large deployments this could add meaningful upgrade time. The docs mention this, but the command has no progress indicator (only a final count). The iterator(chunk_size=500) in the management command is good for memory, but a per-chunk progress log (e.g., self.stdout.write(f' ... rendered {n} so far')) would improve operator experience for large datasets.

Minor Notes

ConfigContextModel.serialize_object override (configs.py): correctly excludes _config_context_data and _config_context_generation from changelog snapshots. The list(exclude or []) pattern avoids mutable-default pitfalls. ✓
CAS race condition protection in both RenderConfigContextJob and rebuild_config_context_cache is consistent — same pattern, same rationale. The duplication is acceptable given the context difference (background job vs. management command).
Warning comment in querysets.py about the dual-maintenance requirement between get_for_object and get_affected_objects is important and appropriately placed. The parity test provides a strong safety net for this.
weak=False on all dynamically connected signal handlers — correct; without this, the closures could be garbage collected.
transaction.on_commit deferral of job enqueue — correct pattern to prevent the background worker from reading uncommitted state.
annotate_config_context_data() still available on the queryset and still used by the job/management command for the on-demand render path. The query-count test confirms zero queries in warm-cache reads. ✓

Summary

This is a well-engineered optimization. The core design (nullable JSON cache + generation counter + signal-based invalidation + background sweep) is solid, the parity test provides strong correctness guarantees, and the test coverage is thorough. The main issues are:

Cross-app top-level imports in cache.py (should be lazy, minor style issue)
TenantGroup in mptt_triggers causing harmless over-invalidation (confirm it's intentional)
Missing invalidation tests for Cluster.type/group and Tenant.group upstream changes
No per-pass warning in RenderConfigContextJob for pathological churn scenarios

…le (#22364)

jeremystretch · 2026-06-04T16:05:59Z

@claude /review Please review again, considering the following responses to the above concerns:

Not an issue. (Bizarre that this was even flagged.)
Fixed
Acceptable for now. We'll likely explore improving config context "plugability" in the future.
Known; will flag this in the release notes.
Added tests
Added tests
Fixed
Acceptable for now. We might adopt a more robust approach after testing at scale.

github-actions · 2026-06-04T16:06:17Z

Claude finished @jeremystretch's task in 1m 24s —— View job

PR Review (Re-review): #21025 Optimize rendering of config context data

Gather context
Read updated files to verify fixes
Analyze updated code
Post updated review findings

Re-review Summary

I've verified each of the eight responses against the current state of the code. All items are accounted for and the PR is in good shape.

Item-by-item verification

1. Cross-app top-level imports in extras/cache.py — acknowledged

Fair enough; cache.py is only imported during AppConfig.ready() at which point all apps are fully initialized, so the import order concern doesn't apply in practice. Accepted.

2. TenantGroup in mptt_triggers — confirmed fixed ✓

mptt_triggers in signals.py (lines 307–314) now contains only Region, SiteGroup, DeviceRole, Platform, and Location. TenantGroup was moved out — it now appears only in _connect_upstream_delete_handlers (as tenant_groups) and in invalidate_for_scope_delta's direct_attrs dict (line 101 of cache.py). This is the correct placement: direct-equality matching, not ancestor traversal.

3. get_affected_objects() ordering — acknowledged

Accepted as-is for now.

4. ?exclude=config_context breaking change — acknowledged

Release notes will cover it.

5. Missing test for Cluster.type/group change → VM cache invalidation — confirmed added ✓

test_direct_upstream_cluster_type_change_invalidates_vm (line 464) and test_direct_upstream_cluster_group_change_invalidates_vm (line 478) are present and correctly exercise the path: create cluster with type/group, attach a VM, warm the cache, change the cluster attribute, assert the VM cache is NULLed.

6. Missing test for Tenant.group change → Device/VM cache invalidation — confirmed added ✓

test_direct_upstream_tenant_group_change_invalidates_device (line 493) and test_direct_upstream_tenant_group_change_invalidates_vm (line 509) cover both sides.

7. No warning for exhausted render passes — confirmed fixed ✓

RenderConfigContextJob.run() (lines 68–76 of jobs.py) now has an else clause on the for loop that fires a self.logger.warning(...) when all 100 passes complete without converging. The message is clear and actionable.

8. rebuild_config_context_cache blocking in upgrade.sh — acknowledged

Acceptable for now.

Conclusion

All previously raised concerns are either resolved, accepted, or acknowledged with a plan. The implementation is solid: the CAS generation counter, the enqueue_once coalescing strategy, the scope-parity test, and the comprehensive upstream invalidation graph are all well-executed. This is ready to merge from a code review perspective.

arthanson

Everything looks good, just one small item I'm not sure is a bug with returning the cached item that gets mutated, it is a change from previous and probably should return a copy to avoid inadvertent side-effects.

arthanson · 2026-06-10T17:35:31Z

+        """
+        from extras.models.tags import TaggedItem
+
+        def _mptt_descendants(m2m):


will need to double-check this with the Ltree stuff once merged. at least rename it

arthanson · 2026-06-10T18:06:32Z

+        """
+        cached = getattr(self, '_config_context_data', None)
+        if cached is not None:
+            return cached


callers are mutating this (extras/views.py:1288) I think me may want to return a new copy here instead of the direct cached object?

jeremystretch force-pushed the 21025-pre-render-config-context-data branch 4 times, most recently from c273bc8 to 53fff7c Compare June 1, 2026 18:49

jeremystretch force-pushed the 21025-pre-render-config-context-data branch from 53fff7c to db44a3b Compare June 3, 2026 16:18

jeremystretch changed the base branch from main to feature June 3, 2026 16:18

jeremystretch force-pushed the 21025-pre-render-config-context-data branch 2 times, most recently from 34c7ae9 to 08a1b01 Compare June 3, 2026 19:17

#21025: WIP

c940bb8

jeremystretch force-pushed the 21025-pre-render-config-context-data branch 4 times, most recently from 2b90545 to 3ea5839 Compare June 4, 2026 13:42

Fixes #22357: Remove unused local_context_data field from dcim.Modu…

d43893f

…le (#22364)

jeremystretch force-pushed the 21025-pre-render-config-context-data branch from 3ea5839 to d43893f Compare June 4, 2026 16:05

jeremystretch marked this pull request as ready for review June 4, 2026 16:10

jeremystretch requested review from a team and arthanson and removed request for a team June 4, 2026 16:10

arthanson requested changes Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#21025: Optimize rendering of config context data#22294

#21025: Optimize rendering of config context data#22294
jeremystretch wants to merge 2 commits into
featurefrom
21025-pre-render-config-context-data

jeremystretch commented May 26, 2026 •

edited

Loading

Uh oh!

jeremystretch commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

jeremystretch commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

arthanson left a comment

Uh oh!

arthanson Jun 10, 2026

Uh oh!

arthanson Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeremystretch commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Closes: #21025

Uh oh!

jeremystretch commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: #21025 Optimize rendering of config context data

Overview

Issues Found

1. Top-level cross-app imports in extras/cache.py

2. TenantGroup in mptt_triggers vs direct-equality matching

3. get_affected_objects() docstring says it returns a tuple, but returns a generator-like tuple

4. ?exclude=config_context is silently dropped — API breaking change not highlighted in PR body

5. No test for Cluster.type/group change → VM cache invalidation

6. No test for Tenant.group change → Device/VM cache invalidation

7. RENDER_CONFIG_CONTEXT_MAX_PASSES = 100 — consider adding a warning for unusual pass counts

8. Upgrade time: rebuild_config_context_cache in upgrade.sh is synchronous

Minor Notes

Summary

Uh oh!

jeremystretch commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review (Re-review): #21025 Optimize rendering of config context data

Re-review Summary

Item-by-item verification

Conclusion

Uh oh!

arthanson left a comment

Choose a reason for hiding this comment

Uh oh!

arthanson Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

arthanson Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeremystretch commented May 26, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading

1. Top-level cross-app imports in `extras/cache.py`

2. `TenantGroup` in `mptt_triggers` vs direct-equality matching

3. `get_affected_objects()` docstring says it returns a tuple, but returns a generator-like tuple

4. `?exclude=config_context` is silently dropped — API breaking change not highlighted in PR body

5. No test for `Cluster.type`/`group` change → VM cache invalidation

6. No test for `Tenant.group` change → Device/VM cache invalidation

7. `RENDER_CONFIG_CONTEXT_MAX_PASSES = 100` — consider adding a warning for unusual pass counts

8. Upgrade time: `rebuild_config_context_cache` in `upgrade.sh` is synchronous

github-actions Bot commented Jun 4, 2026 •

edited

Loading