Skip to content

#21418 - Replace MPTT wtih PostgreSQL Ltree#22296

Open
arthanson wants to merge 44 commits into
featurefrom
21488-ltree
Open

#21418 - Replace MPTT wtih PostgreSQL Ltree#22296
arthanson wants to merge 44 commits into
featurefrom
21488-ltree

Conversation

@arthanson

@arthanson arthanson commented May 27, 2026

Copy link
Copy Markdown
Collaborator

Closes: #21418

Replaces django-mptt with PostgreSQL ltree for all hierarchical models in NetBox core. Eliminates the read/write/maintenance costs of MPTT's lft/rght/tree_id/level columns and the table-wide lock MPTT takes on every insert/reparent, in favor of a single GiST-indexed path column maintained entirely by database triggers.

Expected Performance Improvements

The biggest gains are on writes and concurrency, and they scale with tree size - so the speedup is data-dependent rather than a fixed factor. An MPTT insert/move is O(N) in the tree (it renumbers lft/rght across much of the tree, and shifts tree_id across trees on a root insert), whereas ltree only recomputes the affected node and its subtree (≈O(1) for a leaf insert, O(subtree) for a move) and maintains one path column instead of MPTT's four.

For small/shallow hierarchies the two are roughly on par; for large hierarchies a single write touches one row instead of thousands, so it can be an order of magnitude or more faster. Separately, MPTT's per-model lock serialized every write to the model; ltree locks per tree, so writes to different trees (e.g. components on different devices) run concurrently - bulk imports and automation that previously queued on one lock should see throughput improve roughly in proportion to how many independent trees are written in parallel.

Read performance for ancestor/descendant lookups is comparable (both index-backed), so queries are unaffected.

New infrastructure (netbox/netbox/models/ltree.py)

  • LtreeField - Django field backed by PostgreSQL's ltree type with ancestor / descendant / descendant_or_equal lookups.
  • LtreeModel - abstract base providing the MPTT-compatible API surface: get_ancestors, get_descendants, get_children, get_root, get_family, get_siblings, get_descendant_count, level, is_root_node, is_leaf_node, is_child_node, move_to, insert_at, plus tree_id/level properties.
  • LtreeManager / LtreeQuerySet - extends RestrictedQuerySet and mirrors TreeManager.add_related_count for cumulative descendant counts (FK, M2M, and the scope_type/scope_id GenericFK pattern).
  • Triggers - installs per-table BEFORE INSERT OR UPDATE OF parent_id (computes NEW.path) and AFTER UPDATE OF parent_id, path (cascades to descendants) triggers. Optional sort_path text column carries chr(1)-separated ancestor names, preserving MPTT's order_insertion_by=('name',) flatten ordering.

NestedLtreeGroupModel is the ltree-backed base for hierarchical group models (Region, Location, Platform, etc.), carrying two trigger-maintained columns: path, an ltree of ancestor PKs for fast ancestor/descendant queries, and sort_path, a chr(9)-joined chain of ancestor names. sort_path exists because ordering by path sorts siblings by PK (insertion order), whereas ORDER BY sort_path reproduces MPTT's order_insertion_by=('name',) - depth-first tree-flatten with siblings in name order (the TAB separator keeps each subtree grouped under its parent), and its collation tracks name's automatically.

Base model changes (netbox/netbox/models/init.py)

  • NestedGroupModelMixin - shared fields (name, slug, description, comments) and self-as-descendant clean() check.
  • NestedLtreeGroupModel - new ltree-backed base used by all NetBox core hierarchical models.
  • NestedGroupModel - kept as deprecated MPTT-backed base for plugin backwards compatibility (will be removed in a future release).

Models converted

  • dcim: Region, SiteGroup, Location, DeviceRole, Platform, ModuleBay, InventoryItem, InventoryItemTemplate
  • tenancy: ContactGroup, TenantGroup
  • wireless: WirelessLANGroup

ModuleBay, InventoryItem, and InventoryItemTemplate use LtreeModel directly (not the nested-group base) since they aren't NestedGroupModel subclasses.

Behavioral changes vs MPTT

  • Cycle validation - assigning a node's own self or a descendant as its parent now raises ValidationError both in clean() (forms/serializers) and in save() (scripts/bulk), mirroring MPTT's save-time InvalidMove. ModuleBay opts out (its parent is derived from its installed module).
  • Renames now update sort_path immediately and cascade to descendants. MPTT's order_insertion_by left descendants stale until a manual rebuild. LtreeModel.rebuild_sort_paths() remains for repairing raw writes that bypass the triggers.
  • Concurrency - MPTT's per-model advisory lock (which serialized every write to the entire table) is replaced by a per-tree lock: each insert/move takes a transaction advisory lock keyed on the tree's root (both roots for a cross-tree move, in sorted order). Different trees proceed in parallel; within a tree, inserts/moves serialize against a concurrent ancestor reparent so no stale path is persisted. Crossing/ancestor-descendant concurrent moves may deadlock (retryable); plain inserts never do.
  • bulk_create - within a single batch, a child row must appear after its parent (the BEFORE trigger resolves the parent's path by querying the table).

GraphQL

The MPTT-internal lft/rght/tree_id columns are no longer part of the schema. level is preserved as a resolved field (nlevel(path) - 1).

Removed

MPTTLockedMixin and the per-model MPTT tree locks - i.e. the MPTT-specific entries (region, sitegroup, location, tenantgroup, contactgroup, wirelesslangroup, inventoryitem, inventoryitemtemplate, platform) in ADVISORY_LOCK_KEYS; the delay_mptt_updates() / rebuild() wrappers in the bulk import/edit/rename views; and utilities/templatetags/mptt.py.

django-pglocks and ADVISORY_LOCK_KEYS themselves are retained — they're still used for available-prefix/IP/VLAN/ASN allocation and job scheduling. django-mptt is retained only for historical migrations and the deprecated NestedGroupModel.

Migrations

Per-app migrations enable the ltree extension, add the path (and sort_path where applicable) columns, backfill existing rows via a recursive CTE, add GiST indexes, and install the triggers. No operator action is required beyond running migrations.

And the inventory-item ordering line (separate, in case you place it under the conversion notes):

Test Script

Below is a script that adds nested items to ModuleBays, Inventory Items, etc.. to check screens on MPTT and LTree. So run it once on MPTT version, open all the tabs, then run it on LTree version and in new browser window check same tabs. can do command like:

python netbox/manage.py shell < ./pr_reviews/demo_tree_ordering.py > /tmp/ltree.out 2>&1

demo_tree_ordering.py
"""
Create demo data for every tree-ordered model so list ordering can be compared
between the MPTT branch (main) and the ltree branch.

Run identically on either branch, then diff the output:

  python manage.py shell < /Users/ahanson/dev/work/netbox/pr_reviews/demo_tree_ordering.py

Idempotent (removes prior demo rows first); data is committed so it shows in the UI.

All nodes/bays/items are created in a deliberately scrambled order (e.g. Slot 2, 11,
1, 10 — not pre-sorted), so a correctly sorted listing proves the ordering comes from
the tree (sort_path / MPTT), not from insertion order.

WHAT EACH SECTION EXERCISES
---------------------------
Group models — single hierarchy each, run for ALL the nested-group models the PR
converts: Region, SiteGroup, DeviceRole, Platform (dcim) and TenantGroup,
ContactGroup, WirelessLANGroup (tenancy/wireless). Expected to be PARITY: main's
MPTT TreeManager emits tree-flatten (tree_id, lft) with siblings name-sorted via
order_insertion_by, and ltree reproduces it via sort_path. The alpha nodes plus a
numbered block (Demo 1/2/10/11, with 'Demo 1' having a child) check two edge cases:

* natural-sort collation — TenantGroup, WirelessLANGroup and ModuleBay use the
  `natural_sort` collation, so their numbered roots must order 1, 2, 10, 11. The
  plain-collation models (Region/SiteGroup/DeviceRole/Platform/ContactGroup/
  Location) order them lexically 1, 10, 11, 2. Each model should look the SAME on
  both branches; the contrast is between models, not between branches.
* prefix / separator edge — 'Demo 1' has a child, and 'Demo 11' is a sibling root.
  sort_path joins ancestor names with chr(9) (TAB); because TAB sorts below any
  printable char, 'Demo 1's subtree must appear right after 'Demo 1' and BEFORE
  'Demo 11' on both branches. If the separator were wrong the child would
  interleave past 'Demo 11'.

Location — TWO sites; root locations are named so location-name order is the
REVERSE of site-creation order. PARITY: the global list is alphabetical on both.
  main  -> ORDER BY tree_id, lft  : root locations by NAME across sites
           -> Site B's "Demo Alpha" tree before Site A's "Demo Zulu".
  ltree -> ORDER BY sort_path, pk : sort_path begins with the root name, so also
           alphabetical across sites -> "Demo Alpha" before "Demo Zulu".

ModuleBay — TWO devices, nested bays. PARITY (natural sort + nesting + prefix edge,
and the cross-device global ordering):
  * Device A (created first) gets numbered top-level bays Slot 1/2/10/11, and a
    module installed in 'Slot 1' brings nested bays Sub 1/2/10 (children of Slot 1).
  * Device B (created second) gets one bay 'Slot 0', whose name sorts BEFORE every
    bay on device A.
  main  -> ModuleBayManager ORDER BY root-name, lft.
  ltree -> ORDER BY sort_path, pk. sort_path begins with the tree's root-bay name
           (natural_sort collation), so this reproduces main: the global list groups
           by root-bay name ACROSS devices ('Slot 0' from every device first, then
           'Slot 1' + its Sub children, ...), not device-by-device.

InventoryItem — nested items on one device. InventoryItem has NO sort_path and is
intentionally NOT tree-ordered under ltree:
  main  -> TreeManager forces tree-flatten (tree_id, lft) globally.
  ltree -> Meta.ordering ('device', 'parent', 'name') — a flat sort (roots, whose
           parent is NULL, sort LAST). The indented device Inventory tab is kept
           correct on ltree via get_children().order_by('path').
"""
from django.db import transaction
from django.utils.text import slugify

from dcim.models import (
  Device,
  DeviceRole,
  DeviceType,
  InventoryItem,
  Location,
  Manufacturer,
  Module,
  ModuleBay,
  ModuleBayTemplate,
  ModuleType,
  Platform,
  Region,
  Site,
  SiteGroup,
)
from tenancy.models import ContactGroup, TenantGroup
from wireless.models import WirelessLANGroup

# ---------------------------------------------------------------------------
# Group-model hierarchy (parent precedes children). Alpha roots + a numbered
# block to exercise natural-sort collation and the prefix/separator edge.
# ---------------------------------------------------------------------------
# Nodes are created deliberately OUT OF sorted order so the listing proves the
# ordering comes from name/sort_path, not insertion order. Expected listed order:
#   natural-sort models: Demo 1, [Demo 1 Child], Demo 2, Demo 10, Demo 11, Demo Mike, Demo Zulu
#   plain-collation:     Demo 1, [Demo 1 Child], Demo 10, Demo 11, Demo 2, Demo Mike, Demo Zulu
NODES = [
  # Roots, created Zulu→Mike and 10→2→11→1 (neither in sorted order).
  ('Demo Zulu', None),
  ('Demo Mike', None),
  ('Demo 10', None),
  ('Demo 2', None),
  ('Demo 11', None),
  ('Demo 1', None),
  # Children, also created out of order within a parent (Yankee before Alpha).
  ('Demo Yankee', 'Demo Zulu'),
  ('Demo Alpha', 'Demo Zulu'),
  ('Demo Sierra', 'Demo Alpha'),
  ('Demo Bravo', 'Demo Mike'),
  ('Demo 1 Child', 'Demo 1'),   # must sort right after 'Demo 1' and before 'Demo 11'
]
DEMO_SLUGS = [slugify(name) for name, _ in NODES]

# Group models that own a self-nesting hierarchy with name/slug/parent.
GROUP_MODELS = [
  ('Region', Region),
  ('SiteGroup', SiteGroup),
  ('DeviceRole', DeviceRole),
  ('Platform', Platform),
  ('TenantGroup', TenantGroup),       # natural_sort name
  ('ContactGroup', ContactGroup),
  ('WirelessLANGroup', WirelessLANGroup),  # natural_sort name
]

# ---------------------------------------------------------------------------
# Location: two sites (Site A created first -> lower site_id).
# ---------------------------------------------------------------------------
DEMO_SITE_SLUGS = ['demo-site-a', 'demo-site-b']
LOCATION_SITES = [
  ('Demo Site A', 'demo-site-a', [
      ('Demo Zulu', None),         # sorts LAST by name, but this site is created FIRST
      ('Demo Yankee', 'Demo Zulu'),
  ]),
  ('Demo Site B', 'demo-site-b', [
      ('Demo Alpha', None),        # sorts FIRST by name, but this site is created SECOND
      ('Demo Bravo', 'Demo Alpha'),
  ]),
]

# ---------------------------------------------------------------------------
# Devices for ModuleBay + InventoryItem demos.
# ---------------------------------------------------------------------------
DEV_A = 'Demo ModBay A'           # created first  -> lower device_id
DEV_B = 'Demo ModBay B'           # created second -> higher device_id
DEV_INV = 'Demo Inventory Device'
DEVICE_NAMES = [DEV_A, DEV_B, DEV_INV]

# Created out of order; both must LIST in natural order (Slot 1, 2, 10, 11 / Sub 1, 2, 10).
DEV_A_BAYS = ['Slot 2', 'Slot 11', 'Slot 1', 'Slot 10']  # nested under 'Slot 1' below
MODULE_SUBBAYS = ['Sub 10', 'Sub 1', 'Sub 2']            # nested under 'Slot 1' via a module
DEV_B_BAYS = ['Slot 0']                                  # sorts before every Device A bay

INV_NODES = [
  ('Item Zulu', None),
  ('Item Mike', None),
  ('Item Yankee', 'Item Zulu'),   # created before 'Item Alpha' (tests sibling order)
  ('Item Alpha', 'Item Zulu'),
  ('Item Bravo', 'Item Mike'),
]


def make_tree(model, **extra):
  created = {}
  for name, parent_name in NODES:
      obj = model(
          name=name,
          slug=slugify(name),
          parent=created.get(parent_name) if parent_name else None,
          **extra,
      )
      obj.save()
      created[name] = obj


def show_indented(label, queryset, suffix=lambda obj: ''):
  print(f"\n=== {label} ===")
  for obj in queryset:
      print(f"  {'    ' * obj.level}{obj.name}{suffix(obj)}")


with transaction.atomic():
  # --- Clean up prior demo rows (devices first; cascades bays/modules/items) ---
  Device.objects.filter(name__in=DEVICE_NAMES).delete()
  DeviceType.objects.filter(slug='demo-device-type').delete()
  ModuleType.objects.filter(model='Demo Module Type').delete()
  Manufacturer.objects.filter(slug='demo-mfr').delete()
  DeviceRole.objects.filter(slug='demo-equipment-role').delete()
  Location.objects.filter(site__slug__in=DEMO_SITE_SLUGS + ['demo-device-site']).delete()
  Site.objects.filter(slug__in=DEMO_SITE_SLUGS + ['demo-device-site']).delete()
  for _, model in GROUP_MODELS:
      model.objects.filter(slug__in=DEMO_SLUGS).delete()

  # --- Group models: single hierarchy each ---
  for _, model in GROUP_MODELS:
      make_tree(model)

  # --- Location: two sites ---
  for site_name, site_slug, nodes in LOCATION_SITES:
      site = Site.objects.create(name=site_name, slug=site_slug)
      created = {}
      for name, parent_name in nodes:
          obj = Location(
              name=name,
              slug=slugify(f'{site_slug}-{name}'),
              site=site,
              parent=created.get(parent_name) if parent_name else None,
          )
          obj.save()
          created[name] = obj

  # --- Shared scaffolding for the device demos ---
  mfr = Manufacturer.objects.create(name='Demo Mfr', slug='demo-mfr')
  dtype = DeviceType.objects.create(manufacturer=mfr, model='Demo Device Type', slug='demo-device-type')
  erole = DeviceRole.objects.create(name='Demo Equipment Role', slug='demo-equipment-role')
  dsite = Site.objects.create(name='Demo Device Site', slug='demo-device-site')

  # ModuleType carrying nested module-bay templates (instantiated on install).
  mtype = ModuleType.objects.create(manufacturer=mfr, model='Demo Module Type')
  for n in MODULE_SUBBAYS:
      ModuleBayTemplate.objects.create(module_type=mtype, name=n)

  # Device A (first): numbered top-level bays; install a module in 'Slot 1' to
  # create the nested Sub bays beneath it.
  dev_a = Device.objects.create(name=DEV_A, device_type=dtype, role=erole, site=dsite, status='active')
  a_bays = {}
  for n in DEV_A_BAYS:
      bay = ModuleBay(device=dev_a, name=n)
      bay.save()
      a_bays[n] = bay
  Module.objects.create(device=dev_a, module_type=mtype, module_bay=a_bays['Slot 1'], status='active')

  # Device B (second): one bay that name-sorts before every Device A bay.
  dev_b = Device.objects.create(name=DEV_B, device_type=dtype, role=erole, site=dsite, status='active')
  for n in DEV_B_BAYS:
      ModuleBay(device=dev_b, name=n).save()

  # InventoryItem device.
  dev_inv = Device.objects.create(name=DEV_INV, device_type=dtype, role=erole, site=dsite, status='active')
  inv = {}
  for name, parent_name in INV_NODES:
      item = InventoryItem(device=dev_inv, name=name, parent=inv.get(parent_name) if parent_name else None)
      item.save()
      inv[name] = item


# --- Verification (mirrors the UI's default ordering) ---------------------
NATURAL = {'TenantGroup', 'WirelessLANGroup'}
for label, model in GROUP_MODELS:
  collation = 'natural_sort name' if label in NATURAL else 'plain name'
  show_indented(f'{label}: default list ordering [{collation}]', model.objects.filter(slug__in=DEMO_SLUGS))

show_indented(
  'Location: default list ordering (parity: alphabetical across sites on both)',
  Location.objects.filter(site__slug__in=DEMO_SITE_SLUGS),
  suffix=lambda obj: f'  [{obj.site.name}]',
)

show_indented(
  'ModuleBay: GLOBAL list ordering across devices '
  "(parity: grouped by root-bay name across devices — 'Slot 0' first, then 'Slot 1' ...)",
  ModuleBay.objects.filter(device__name__in=[DEV_A, DEV_B]),
  suffix=lambda obj: f'  [{obj.device.name}]',
)
for dev in (DEV_A, DEV_B):
  show_indented(f'ModuleBay: device tab for {dev}', ModuleBay.objects.filter(device__name=dev))

# InventoryItem: default order differs (tree-flatten on main, flat parent/name on
# ltree); the device tab is path-ordered on ltree, tree-ordered by default on main.
show_indented(
  'InventoryItem: default list ordering (tree-flatten on main; parent/name on ltree)',
  InventoryItem.objects.filter(device__name=DEV_INV),
)
inv_qs = InventoryItem.objects.filter(device__name=DEV_INV)
if any(f.name == 'path' for f in InventoryItem._meta.get_fields()):
  inv_qs = inv_qs.order_by('path')   # ltree device-tab ordering
show_indented('InventoryItem: device tab ordering (indented hierarchy)', inv_qs)

print(f"\nDevices created: {DEV_A!r}, {DEV_B!r}, {DEV_INV!r}")
print('Done.')

````python

@netbox-community netbox-community deleted a comment from github-actions Bot Jun 2, 2026
@arthanson arthanson changed the title #21488 - Replace MPTT wtih PostgreSQL Ltree #21418 - Replace MPTT wtih PostgreSQL Ltree Jun 2, 2026
@netbox-community netbox-community deleted a comment from github-actions Bot Jun 2, 2026
@netbox-community netbox-community deleted a comment from github-actions Bot Jun 2, 2026
@arthanson

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@arthanson

Copy link
Copy Markdown
Collaborator Author

@claude /review Only report on issues of potential concern; we don't need commentary on anything that doesn't require further consideration.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Claude finished @arthanson's task in 12m 0s —— View job


PR Review: #21418 — Replace MPTT with PostgreSQL Ltree

  • Read core ltree infrastructure (netbox/netbox/models/ltree.py)
  • Read model changes (dcim, tenancy, wireless)
  • Read migrations
  • Read test coverage
  • Analyze concerns and post review

The overall implementation is well-engineered — the trigger design, advisory-lock concurrency model, and MPTT-compatible API surface are all sound. The test suite is thorough. Below are issues that warrant attention.


1. diff_exclude_fields missing None guard — crash risk

core/models/change_logging.py:159–175

changed_object_type.model_class() returns None when the model's app is no longer installed (e.g., a plugin that was later removed). The two new issubclass calls added by this PR will raise TypeError in that case:

model = self.changed_object_type.model_class()   # can be None
attrs = set()
if issubclass(model, ChangeLoggingMixin):         # pre-existing — crashes on None
    ...
if issubclass(model, LtreeModel):                 # NEW — also crashes on None
    ...
if issubclass(model, MPTTModel):                  # NEW — also crashes on None
    ...

The pre-existing ChangeLoggingMixin check was already vulnerable; the PR adds two more without adding a guard. A simple early exit — if model is None: return attrs — after line 160 would fix all three. Fix this →


2. reapply_model_ordering exemption is too broad

utilities/query.py:72

The exemption from reapply_model_ordering is keyed on LtreeManager presence, but InventoryItem (ordering = ('name', 'pk')) and InventoryItemTemplate (ordering = ('device_type__id', 'parent__id', 'name')) use LtreeManager strictly for path maintenance — not for sort_path/path-based tree ordering. The comment says the exemption protects sort_path ordering from being clobbered by .annotate(); it doesn't apply to these models.

Consequence: when a filterset or serializer adds an annotation and reapply_model_ordering is called (e.g., in NetBoxModelViewSet.get_queryset), the conventional ordering for these two models won't be re-applied after Django's #32811 strips it. The API response ordering could become non-deterministic.

The guard should check whether the model's Meta.ordering actually references sort_path or path, rather than testing for LtreeManager alone. Fix this →


3. bulk_create comment overstates the ordering constraint

netbox/models/ltree.py:123–138

The docstring and the LtreeModel class docstring both say:

"within a single batch, a child row must appear after its parent (the BEFORE trigger resolves the parent's path by querying the table)"

This is only true for unsaved parents — the Python check rejects rows where parent.pk is None. For a batch where all parents are already saved in the database, the BEFORE trigger correctly resolves the path via a DB lookup regardless of batch order. The constraint is narrower than stated and may lead callers to unnecessarily sort batches. Consider tightening the wording. Fix this →


4. PR description's "Removed: django_pglocks / ADVISORY_LOCK_KEYS" is misleading

django-pglocks and ADVISORY_LOCK_KEYS are still used in ipam/api/views.py and netbox/jobs.py (for available-IP locking and job-scheduling). The PR removed only the MPTT-specific tree-lock entries from ADVISORY_LOCK_KEYS, not the dict or the library itself. The description should clarify this to avoid confusing operators and plugin authors who read the change notes.


Minor notes

  • ModuleBay.Meta.ordering = ('sort_path', 'pk') (no device column): The global module bay list interleaves bays from different devices sorted by root-bay name, not grouped by device. The code comment notes this reproduces MPTT's ('_root_name', 'lft') behaviour. Worth a UX pass to confirm this is the intended list-view experience.

  • django-mptt retained as a hard dependency: Only needed for historical migrations and the deprecated NestedGroupModel plugin-compat base. Consider noting a target version for removal in the deprecation comment so plugin authors have a concrete timeline.

@arthanson

Copy link
Copy Markdown
Collaborator Author

Claide review comments addressed.

@arthanson arthanson marked this pull request as ready for review June 4, 2026 20:11
@arthanson arthanson requested review from a team, bctiemann and jeremystretch and removed request for a team June 4, 2026 20:11

@jeremystretch jeremystretch left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! I haven't dug into the triggers or reviewed any of the new tests yet, but wanted to push through some general feedback/questions. A couple general thoughts:

  • I don't know how closely we care to mimic the MPTT API. IMO we should provide only methods/properties that have a solid use case. (We can always extend these later, but will need to maintain indefinitely whatever we initially expose.)
  • We'll need to do a ton of testing at scale to catch performance issues, especially concerning the initial migrations.
  • We should probably break up the netbox.models.ltree module as it's quite large.
  • This is the most optimistic I've ever felt about finally ditching MPTT. 🙂

Comment thread netbox/dcim/models/sites.py Outdated
Comment thread netbox/dcim/models/device_components.py
Comment thread netbox/dcim/models/device_components.py Outdated
Comment thread netbox/dcim/models/device_component_templates.py Outdated
Comment thread base_requirements.txt Outdated
Comment thread netbox/netbox/models/ltree.py Outdated
Comment thread netbox/netbox/models/ltree.py Outdated
Comment thread netbox/netbox/models/ltree.py Outdated
Comment thread netbox/netbox/models/ltree.py Outdated
Comment thread netbox/netbox/models/ltree.py Outdated
@arthanson

Copy link
Copy Markdown
Collaborator Author

@jeremystretch removed un-used MPTT methods, cleaned up comments and refactored. Please note two comments on your suggestions above.

@arthanson arthanson requested a review from jeremystretch June 8, 2026 18:02
@arthanson

Copy link
Copy Markdown
Collaborator Author

@jeremystretch other review comments addressed.

@jeremystretch jeremystretch left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks clean AFAICT. However, Claude did raise a performance concern during review, which I'll share below. The proposed modification to _LOCK_TREE_ROOTS_SQL seems reasonable, but I wanted to get your take @arthanson.


Advisory-lock pool exhaustion on large single-transaction bulk inserts (worth confirming)

The BEFORE trigger takes a pg_advisory_xact_lock keyed per-tree, held until commit. For new root nodes, dest_root is the row's own label, so each new root is a distinct key. A single atomic transaction that bulk-creates many roots (e.g. a large CSV import of Regions/TenantGroups, or a script inside transaction.atomic()) accumulates one advisory lock per root, all held simultaneously, drawing from the shared lock table (max_locks_per_transaction × (max_connections + )).

MPTT took a single per-model lock; this is strictly more locks for the wide-root case, and at sufficient volume can fail with out of shared memory / You might need to increase max_locks_per_transaction. The per-tree concurrency win is real and worth it — but has this been load-tested with a large single-transaction import of many roots? If it's a known limit, a note in the docs/release notes would help operators.

Primary fix: don't lock on root inserts

The exhaustion comes from one fact: a brand-new root node takes an advisory lock keyed on its own label, held until commit — but that lock can never contend with
anything. A root being INSERTed with parent_id IS NULL starts its own singleton tree that no concurrent transaction can see or reference yet (MVCC — the uncommitted
row isn't visible, and nobody else has its PK). So:

  • It can't be reparented concurrently (doesn't exist for other txns).
  • Its subtree is empty.
  • No other transaction's AFTER cascade can target it (it's in nobody's tree).

The lock the trigger's own comment justifies — "serialize an insert against a concurrent reparent of one of its ancestors" — is meaningless for a node that has no
ancestor. So the lock on a new-root key is pure overhead and the entire pool-consumption vector, since each root is a distinct key.

The change is a guard at the top of _LOCK_TREE_ROOTS_SQL:

-- A brand-new root (INSERT, no parent) starts its own singleton tree that no
-- concurrent transaction can yet see or reference, so there is nothing to
-- serialize against. Skipping the lock here is what keeps a bulk import of many
-- top-level objects from accumulating one xact-lock per root and exhausting the
-- shared lock table (max_locks_per_transaction).
IF TG_OP = 'INSERT' AND NEW.parent_id IS NULL THEN
  -- no lock required
ELSE
  -- ... existing dest_root / old_root / pg_advisory_xact_lock logic ...
END IF;

This is safe and narrow:

  • A reparent-to-root (UPDATE … parent_id = NULL) still locks — the node exists, has descendants, and the cascade rewrites a real subtree.
  • A child insert into an existing tree still locks.
  • Only the provably race-free case is exempted.

Critically, this kills the exact scenario I flagged: importing N top-level objects (Regions, TenantGroups, or one inventory item per device — InventoryItem trees are
rooted at top-level items, so each is its own root) now takes zero advisory locks instead of N. The dominant bulk pattern becomes lock-free.

Residual case + operator fallback

The primary fix doesn't cover bulk-inserting children into many distinct existing trees in one transaction (e.g. one child under each of 50k existing trees) — that
still accumulates one lock per distinct tree. This is a much rarer pattern, and for it the right answer is operator guidance:

  • Document that very large hierarchical bulk writes scale with the number of distinct existing trees touched per transaction, and that max_locks_per_transaction is
    the tuning knob if out of shared memory appears. This is the standard Postgres remedy and a one-line config bump.

I'd put that in the migration/upgrade notes alongside the existing operator warning.

What I'd avoid

  • Releasing the lock mid-transaction (session-level pg_advisory_lock + early unlock) is not safe: the correctness argument requires holding until commit, or a
    concurrent reparent could interleave between the path computation and commit and leave a stale path this txn can no longer fix. Don't go there.
  • Per-batch commits in bulk import would release locks between batches but breaks NetBox's all-or-nothing atomic import semantics — not worth it.
  • Coarse whole-table lock for bulk defeats the per-tree concurrency that's the whole point of the PR.

Verification

Add a regression test that pins the behavior: inside a transaction, create a root and assert no advisory locks are held —

with transaction.atomic():
  Region.objects.create(name='Root', slug='root')
  with connection.cursor() as cur:
      cur.execute("SELECT count(*) FROM pg_locks WHERE locktype = 'advisory'")
      self.assertEqual(cur.fetchone()[0], 0)

and a companion asserting a child insert (or reparent-to-root) does hold exactly one, so the optimization can't silently over-broaden later.

@arthanson

Copy link
Copy Markdown
Collaborator Author

Updated the lock code.

@arthanson arthanson requested a review from jeremystretch June 12, 2026 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants