fix(record): stop leaking same-origin iframe state on removal#3570
fix(record): stop leaking same-origin iframe state on removal#3570gruessi wants to merge 4 commits into
Conversation
Same-origin iframes mounted and unmounted while session recording is active leaked the iframe Document, every node serialized into the mirror, and one MutationObserver per mount. Per-iframe retained cost ≈300 KB; for hosts that mount/unmount many iframes (e.g. flow editors that render previews via blob URLs) memory can grow into the hundreds of MB over an extended session. Nine retainer chains had to be closed: - onceIframeLoaded (snapshot.ts) never removed its load listener; the closure pinned the iframe via captured `mirror` / `iframeManager`. It now returns a disposer the caller can invoke on detach. The branch that re-registered a load listener after the iframe was already loaded is collapsed into the disposer-tracked path. The blank-frame-during-navigation guard from the original code is preserved: when win.location.href is about:blank but iframeEl.src is a non-empty/non-about:blank URL we register only the load listener (no immediate serialization), so the recording does not capture the transient about:blank document Chrome assigns before the real navigation completes. - IframeManager.attachIframe registered an anonymous `pagehide` handler with no way to remove it. Convert it to a named function and track per iframe in `pageHideHandlers`. attachIframe runs once per iframe load, so a single iframe can collect handlers across several Windows; the map stores a Set so every handler is detached on removeIframeById, not just the latest. Same shape applied to load-listener disposers via `loadListenerDisposers`. - The cleanup branch in `record/index.ts`'s wrappedMutationEmit was gated on `recordCrossOriginIframes`; same-origin iframes never had their iframeObserverCleanups entry called or iframeManager.removeIframeById invoked. Drop the gate. Also add IframeManager.cleanupDetachedIframes so iframes removed inside a removed subtree (where only the ancestor's id appears in m.removes) get cleaned up too. - Hosts can swap `iframe.src` to `about:blank` before removing the iframe — typical React unmount pattern. By detach time `iframe.contentDocument` and `iframe.contentWindow` no longer match what we attached. Capture every (Document, Window) we ever saw per iframe in `attachedDocuments` / `attachedWindows` (WeakMaps keyed by iframe element, holding a Set of values), and walk the captured sets on detach to release `mirror.idNodeMap` entries and to unwind `nestedIframeListeners` (a strong Map keyed by Window). - Iframes can be removed before their first load fires. attachIframe has not run yet so attachedIframes has no entry, and by the time wrappedMutationEmit calls removeIframeById the mirror entry has typically already been cleared by MutationBuffer.emit, so the fallback mirror.getNode lookup also returns null. New `iframeElementsById` Map (populated in registerLoadListenerDisposer and cleared in removeIframeById) lets the cleanup path find the iframe and run its disposer in this pre-load case. - Iframe navigations overwrite per-iframe observer cleanups. loadListener fires once per load, but iframeObserverCleanups stored only one cleanup per iframe id — repeated navigations pushed every cleanup into `handlers[]` while leaving only the most recent one reachable for runAndDetachIframeCleanup. Earlier MutationObservers and MutationBuffers stayed alive until stopRecording. iframeObserverCleanups is now `Map<number, Set<listenerHandler>>`; detach iterates the Set, splicing each entry out of `handlers[]`. - Reparenting an iframe must not be confused with removal. MutationBuffer.emit clears the mirror entry before re-serializing the add, so a moved iframe can come back with a fresh id — the previous same-id check in wrappedMutationEmit missed that case and disposed the (still-needed) listener and observers, breaking recording for the moved iframe until the next full snapshot. Cross-check by element identity: walk m.adds, look up each new id in the mirror, and if the removed-id's element matches one of those, treat it as a move and only drop the stale id mapping (new IframeManager.forgetIframeId helper) without disposing anything. - IframeManager.destroy() reassigned `loadListenerDisposers` and `pageHideHandlers` to fresh WeakMaps without iterating them, so the registered DOM listeners and pending onceIframeLoaded timers outlived stopRecording(). A late iframe load could then call back into iframeManager.attachIframe and emit mutations after the user thought recording had stopped. destroy() now enumerates tracked iframes via the iterable id-keyed Maps and runs disposers/removers before dropping the WeakMaps. - The global `mutationBuffers[]` array (observer.ts) and the local `handlers[]` array (record/index.ts) accumulated per-iframe entries forever — `findAndRemoveIframeBuffer` was only invoked on pagehide, and `handlers[]` was never cleaned up at all. Splice both on iframe detach via a new `runAndDetachIframeCleanup` helper. `findAndRemoveIframeBuffer` now accepts an optional captured-docs set so it can match a buffer by its original document even after iframe.contentDocument has been swapped. `MutationBuffer.reset()` clears `this.doc` as defence in depth. Validated end-to-end with a host page that mounts and unmounts five blob-URL iframes every 2s for 110s. In a clean Chrome profile (no extensions, no PostHog identity): - Before: post-GC heap +118 MB, +390 leaked HTMLDocuments, +391 MutationObservers. - After: post-GC heap ~0 MB, +1 leaked HTMLDocument, 0 MutationObservers. 30/30 unit tests in test/record/memory-leaks.test.ts pass; new tests cover the captured-doc walk, same-origin cleanup without recordCrossOriginIframes, nested-removal sweep, pre-load removal, iframe reparenting, and post-stop disposer execution.
|
@gruessi is attempting to deploy a commit to the PostHog Team on Vercel. A member of the Team first needs to authorize it. |
# Conflicts: # packages/rrweb/rrweb/src/record/observer.ts
|
- mutation.ts: replace receiver-cast on private `doc` with value-side cast using `observerParam['doc']`, so renames stay type-safe. - memory-leaks.test.ts: extract a `makeIframeManager` factory in the `IframeManager.attachedDocuments` describe block; both tests use it. Pure refactor — no runtime behavior change. 31/31 tests still pass.
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
packages/rrweb/rrweb-snapshot/src/snapshot.ts:404-423
`onSubsequentLoad` is a one-liner wrapper that simply forwards to `listener` with no argument transforms. It's a superfluous wrapper per the "no superfluous parts" rule — using `listener` directly as the event-listener reference is equivalent and removes the extra allocation.
```suggestion
if (
winLocationHref !== blankUrl ||
iframeEl.src === blankUrl ||
iframeEl.src === ''
) {
// Genuinely loaded — fire on the next tick so the host mutation that
// surfaced this iframe has time to commit; also re-fire on later loads.
const initialTimer = setTimeout(listener, 0);
iframeEl.addEventListener('load', listener);
return () => {
clearTimeout(initialTimer);
iframeEl.removeEventListener('load', listener);
};
}
// Transient blank during navigation — wait for the real load.
iframeEl.addEventListener('load', listener);
return () => {
iframeEl.removeEventListener('load', listener);
};
```
### Issue 2 of 2
packages/rrweb/rrweb/src/record/iframe-manager.ts:496-503
**`findAndRemoveIframeBuffer` not called when `capturedDocs` is absent**
When an iframe is removed before `attachIframe` was ever called (so `capturedDocs` is `undefined`), `findAndRemoveIframeBuffer` is never invoked from `removeIframeById`. In that scenario no `MutationBuffer` exists either (the buffer is created inside `loadListener` which is only called from `attachIframe`), so the omission is currently safe. However, this silent invariant is easy to break: a future path that creates a buffer without going through `attachIframe` would leak silently. A brief comment noting the invariant — or an unconditional `findAndRemoveIframeBuffer(iframe)` call outside the `if (capturedDocs)` block — would make the intent explicit.
Reviews (2): Last reviewed commit: "chore(record): tidy up per Greptile revi..." | Re-trigger Greptile |
pauldambra
left a comment
There was a problem hiding this comment.
looks safe to me
i feel like we're accruing a very similar "disposables" pattern across the code
but that's not the fault of this PR, or blocking here
pauldambra
left a comment
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
QA Swarm review complete. See inline comments and the summary comment for the full verdict.
| } | ||
| // Forward-declared; assigned inside the try{} block where `handlers` is in scope. | ||
| let runAndDetachIframeCleanup: (iframeId: number) => void = () => {}; | ||
| let cleanupDetachedIframeObservers: () => void = () => {}; |
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
[convergent: paul + xp + qa-team/frontend] 🟡 MEDIUM
Forward-declaring runAndDetachIframeCleanup / cleanupDetachedIframeObservers as () => {} and reassigning inside the try{} block makes construction order load-bearing in a quiet way. Anything that ends up invoking wrappedMutationEmit before the try-block executes gets a silent no-op cleanup — there's no warning, the leak just comes back.
Three refactors that would tighten this:
- Extract
handlers+iframeObserverCleanupsinto a smallIframeObserverRegistryclass withregister(id, handler)/detach(id)/cleanupDetached()methods. The class owns its own state, no forward-declared no-ops needed. - Or:
let runAndDetachIframeCleanup: ((id: number) => void) | undefinedand callrunAndDetachIframeCleanup?.(id)— silent misuse becomes a no-op only when intentionally unset, and a future caller that fires too early visibly does nothing rather than swallowing a leak. - Or: lift
handlersto outer scope so the closures don't need to be reassigned at all.
| const addedIframeElements = new Set<HTMLIFrameElement>(); | ||
| if (m.adds.length > 0) { | ||
| for (const add of m.adds) { | ||
| const node = mirror.getNode(add.node.id); |
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
[paul] 🟠 HIGH
the reparent detection uses mirror.getNode(add.node.id) to map the new id back to the element. that only works because MutationBuffer.emit drains mapRemoves before pushAdd serialises the new id. that's a deep cross-file coupling — a change to emit's ordering would silently corrupt reparent detection and we'd start tearing down observers on every iframe move.
could we add a comment over in MutationBuffer.emit pointing at this expectation, or a test that fails when emit's drain order changes? a silent regression here would look like "iframes occasionally stop recording after a move" — exactly the sort of thing that's painful to track down later. ship as you see fit, but i'd want a tripwire.
| }); | ||
|
|
||
| // Safety net: cleanup any iframes that have become detached or inaccessible | ||
| // Catch iframes removed inside a removed subtree (only the ancestor's |
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
[qa-team/reliability] 🟡 MEDIUM
The two cleanup orderings disagree. On the explicit m.removes path (around lines 282-284) runAndDetachIframeCleanup(id) runs before iframeManager.removeIframeById(id), so the MutationObserver is disconnected before findAndRemoveIframeBuffer resets the buffer and nulls this.doc. But the safety-net path here runs iframeManager.cleanupDetachedIframes() (which transitively calls buf.reset() -> this.doc = null) before cleanupDetachedIframeObservers() disconnects the observer.
Combined with the new reset() nulling this.doc, a mutation queued in the microtask window between those two calls would land on a buffer whose doc is null and throw inside pushAdd's serializeNodeWithId({ doc: this.doc, ... }). The race window is narrow (browsers deliver MutationObserver callbacks via microtasks) but real.
Suggested fix: swap the order to cleanupDetachedIframeObservers() first, then iframeManager.cleanupDetachedIframes(). That matches the explicit-remove path and closes the race.
| this.shadowDomManager.reset(); | ||
| this.canvasManager.reset(); | ||
| // Defence in depth: clear doc in case a stale closure still pins the buffer. | ||
| this.doc = null as unknown as observerParam['doc']; |
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
[convergent: paul + xp + qa-team/reliability] 🟡 MEDIUM
this.doc = null as unknown as observerParam['doc'] — three reviewers independently snagged on this. Either:
docreally can be null after reset, in which case make the field's type honest (Document | null) and fix the few call sites that read it (pushAdd'sserializeNodeWithId({ doc: this.doc, ... }),addShadowRoot(..., this.doc)), or- The buffer simply isn't used post-reset and the null-assignment isn't needed at all — let the GC collect the buffer naturally now that it's spliced from
mutationBuffers[].
The "defence in depth" comment + the double-cast suggests neither path was fully committed to. Pick a story. Bonus: public bufferDoc(): Document | null on the next line claims nullability but the field is typed non-null — that inconsistency is the same smell.
If you keep the null-assignment, please also harden processMutation/emit with a if (!this.doc) return; guard so a late MutationObserver callback during the cleanup race (see the related comment on wrappedMutationEmit) is harmless instead of an exception.
| private attachedWindows: WeakMap<HTMLIFrameElement, Set<Window>> = | ||
| new WeakMap(); | ||
| private attachedDocuments: WeakMap<HTMLIFrameElement, Set<Document>> = | ||
| new WeakMap(); |
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
[convergent: xp + paul + qa-team/performance + qa-team/compatibility] 🟡 MEDIUM
Four-plus WeakMaps/Maps keyed by HTMLIFrameElement (attachedWindows, attachedDocuments, loadListenerDisposers, pageHideHandlers, plus the strong-keyed iframeElementsById) are a textbook Data Clump. They're populated together in attachIframe and cleared together in removeIframeById / destroy. Every retainer chain ends up with its own let bucket = …; if (!bucket) … boilerplate.
One WeakMap<HTMLIFrameElement, IframeAttachmentState> where IframeAttachmentState = { windows: Set<Window>; documents: Set<Document>; loadDisposers: Set<() => void>; pageHide: Set<{ win: Window; handler: () => void }> } would name the concept once. Cleanup becomes one delete; the "one bucket per element" pattern lives in a single helper.
Secondary observation: iframeElementsById is a strong Map, while the others are WeakMap — that's deliberate (you need to look up by id) but worth a comment that this map MUST be cleared on every detach pathway or it pins iframe elements forever.
Observability nudge while you're in here — five separate retainer chains is a lot to keep right forever. A debug-only counter that logs attachedDocuments.size / loadDisposers.size etc. crossing a threshold would let us see the next regression before users do.
| fired = true; | ||
| let fired = false; | ||
| let timer: ReturnType<typeof setTimeout> | null = null; | ||
| const onLoad = () => { |
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
[xp] 🟡 MEDIUM
The "still loading" branch's onLoad calls iframeEl.removeEventListener('load', onLoad) after firing, so subsequent loads no longer re-fire the listener. Compare to the "complete" branch right below this, which carefully wires onSubsequentLoad precisely to keep firing on later loads.
This asymmetry is either:
- Intentional — then please leave a comment explaining why a still-loading iframe doesn't deserve subsequent-load handling, or
- An oversight — the PR description says "Accumulate every contentDocument across loads", which strongly suggests this branch should also keep the listener attached after the first load (just behind the
firedguard) so aniframe.srcswap while the iframe is still loading the initial document gets its second contentDocument tracked too.
The latter feels more consistent with the rest of the diff's intent.
|
|
||
| callSafely(() => | ||
| iframeEl.contentWindow?.addEventListener('pagehide', () => { | ||
| callSafely(() => { |
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
[qa-team/generalist-a] 🟢 LOW
attachIframe adds a pagehide listener on every call without checking for an existing entry. A reparented iframe (which calls attachIframe again from MutationBuffer.emit's onIframeLoad after the move) will accumulate N pagehide listeners on the same Window. The handlers are idempotent (mirror.removeNodeFromMap is a no-op for an unknown id, etc.), so this isn't a correctness bug — but the pageHideHandlers Set grows on every reparent, and every listener fires once per event.
Dedupe before adding:
if (Array.from(bucket).some(({ win }) => win === pageHideWindow)) return;|
|
||
| this.iframeElementsById.delete(iframeId); | ||
|
|
||
| if (iframe) { |
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
[xp] 🟢 LOW
removeIframeById is now ~60 lines doing seven distinct things: resolve element, clear nested-Window listeners, clear legacy-Window path, clear crossOrigin map, drop iframe, dispose load listeners, remove pagehide, walk captured docs. Each block has a clarifying comment — those comments are extract-method opportunities (detachNestedWindowListeners(iframe), releaseCapturedDocuments(iframe), disposeLoadListeners(iframe), etc.).
Kent Beck's ComposedMethod: keep all operations in a method at the same level of abstraction. Right now the method mixes the "detach observers" abstraction with the "release captured DOM refs" abstraction with the "forget the iframe" abstraction. The next maintainer has to read the whole thing to follow one retainer chain. Not blocking — works today — but the shape is worth pressing on before this file grows another chain.
| await new Promise((r) => setTimeout(r, 50)); | ||
|
|
||
| // Move must not have spliced any per-iframe MutationBuffer entries. | ||
| expect(mutationBuffers.length).toBeGreaterThanOrEqual(buffersBeforeMove); |
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
[paul] 🟢 LOW
the reparent test asserts mutationBuffers.length >= buffersBeforeMove. that's deliberately permissive and fine for catching the bug here — but a future regression that adds buffers per move would still pass.
worth tightening to === if the count is stable across the reparent, or splitting into a separate assertion on the iframe-specific buffer count (e.g. count buffers whose doc matches the moved iframe's contentDocument). otherwise we've shipped a test that's load-bearing for one direction of regression and silent on the other.
|
Note 🤖 Automated comment by QA Swarm — not written by a human Multi-perspective review: qa-team (specialists + generalists), paul-reviewer, xp-reviewer Verdict:
|
| Convergent finding | Reviewers | Location |
|---|---|---|
| Forward-declared no-op cleanup pattern | paul + xp + qa-team/frontend | index.ts:186-188 |
this.doc = null cast / bufferDoc(): | null inconsistency |
paul + xp + qa-team/reliability | mutation.ts:254-261 |
| Data Clump across iframe state WeakMaps | xp + paul + qa-team/performance + qa-team/compatibility | iframe-manager.ts:36-49 |
Reviewer summaries
| Reviewer | Assessment |
|---|---|
| 🔍 qa-team | Substantial, well-motivated fix. Main concern is the cleanup-ordering inconsistency in wrappedMutationEmit between the explicit-remove path and the safety-net path now that MutationBuffer.reset() nulls this.doc. Other findings are LOW/NIT. New tests cover the right regressions. No security, data-integrity, copy, or compatibility concerns. |
| 👤 paul | Solid work, +118MB -> ~0 is a real receipt. Biggest concern is the new cross-file coupling between wrappedMutationEmit's reparent detection and MutationBuffer.emit's drain ordering — wants a comment or test. The this.doc = null as unknown as ... cast is a small smell. Everything else is preference. Ship after a brief chat about the reparent invariant and the cast. |
| 📐 xp | Closes a real, validated leak with thorough tests. Passes the four rules of simple design overall, but the main smell is the Data Clump in IframeManager — four parallel WeakMaps keyed by the same element want to be one state object. Secondary smells: subtle behaviour change in onceIframeLoaded's "still loading" branch, forward-declared mutable function bindings papering over a scoping issue, and a small encapsulation leak via MutationBuffer.bufferDoc(). |
Automated by QA Swarm — not a human review
Problem
Recording a page that mounts and unmounts same-origin iframes leaks each iframe's
Document, every node serialized into the mirror, oneMutationObserverper mount, and itsWindow. End-to-end heap test against a host that cycles 5 blob-URL iframes every 2s for 110s: +118 MB / +390 leakedHTMLDocuments, ~300 KB per mount. Long-running tabs eventually hit the browser's memory limit.Cross-origin and
about:blankiframes go through a different code path and weren't affected.Changes
Eight independent retention chains in
packages/rrweb/. Each was masking the next — fixing any subset still showed leak in heap snapshots.onceIframeLoadednow returns a disposer (load listener + timer);IframeManager'spagehidehandler is named and tracked. Both are removed inremoveIframeById.recordCrossOriginIframescleanup gate dropped so the same-origin path actually runs cleanup.cleanupDetachedIframes()sweeps orphans when an iframe is removed via its parent.iframe.src = 'about:blank'before unmount (typical React cleanup) swapscontentDocument/contentWindowso the mirror walks an empty doc andnestedIframeListeners: Map<Window>misses its entry. NewattachedDocuments/attachedWindowsWeakMaps capture every doc/window observed across the iframe's lifetime; detach walks all of them.mutationBuffers[]andhandlers[]had no iframe-removal path.findAndRemoveIframeBuffernow matches by original doc; newrunAndDetachIframeCleanupsplices closures out ofhandlers[].Per-chain detail (file-by-file)
rrweb-snapshot/src/snapshot.ts—onceIframeLoaded. Returns a disposer that removes the listener and clears the pending timer. ThereadyState === 'complete'branch previously registered a second listener next tosetTimeout(listener, 0)— that one is now named (onSubsequentLoad) and removed by the same disposer.serializeNodeWithId/snapshotgain an optionalonIframeListenerRegistered?(iframe, disposer)callback so the disposer travels back toIframeManager.IframeManager.attachIframe. Anonymouspagehidehandler became a named function tracked inpageHideHandlers: WeakMap<HTMLIFrameElement, Set<{win, handler}>>. Set-of-handlers because an iframe can run through several Windows during its lifetime.record/index.tswrappedMutationEmit. Cleanup gate onrecordCrossOriginIframesremoved — same-origin leaks too.IframeManager.cleanupDetachedIframes(). SweepsattachedIframesfor ids no longer in the mirror to catch nested iframe removals (mutation buffer only reports the directly-removed ancestor).attachedDocuments: WeakMap<HTMLIFrameElement, Set<Document>>. Captures every contentDocument observed across an iframe's lifetime (initial blank → loaded src → cleanup blank). Detach walks the Set and callsmirror.removeNodeFromMap(doc)on each.mutationBuffers[]cleanup.MutationBuffer.bufferDoc()exposes the doc;findAndRemoveIframeBuffertakes optionalknownDocs: Set<Document>to match by original contentDocument.reset()nullsthis.docas defence in depth.handlers[]cleanup.runAndDetachIframeCleanup(iframeId)runs the cleanup and splices the closure from bothiframeObserverCleanupsandhandlers.iframeObserverCleanupsisMap<number, Set<cleanup>>so navigations don't overwrite earlier cleanups.attachedWindows: WeakMap<HTMLIFrameElement, Set<Window>>. MirrorsattachedDocuments.removeIframeByIdwalks the captured Set and removes each entry fromnestedIframeListenersandcrossOriginIframeMap. Fixes a stale code comment that incorrectly claimedMap<Window>keys don't prevent GC.Validation
End-to-end heap snapshot, same browser, same recorder config, 61 cycles per run (≈305 mount/unmount events).
native:HTMLDocumentΔnative:MutationObserverΔnative:MutationObserverRegistrationΔclosure:onIframeLoadΔThe negative post-GC delta is because the heap-snapshot operation forces a thorough GC that releases other accumulated state too — the real read is the constructor-level deltas. Top growth in the post-cycle snapshot is V8/JIT internals and transient
PerformanceLongTaskTimingentries, not leaked iframe state.Tests
packages/rrweb/rrweb/test/record/memory-leaks.test.ts— 30/30 pass (4 new regressions for this fix: same-origin cleanup, nested parent removal, captured-doc walk, multi-doc lifecycle).check-typesclean for both@posthog/rrweband@posthog/rrweb-snapshot.Release info Sub-libraries affected
Libraries affected
The changeset bumps
@posthog/rrweband@posthog/rrweb-snapshot(patch).updateInternalDependencies: "patch"cascades the bump throughrrweb-record→posthog-js.Checklist
No public API changes. The new
onIframeListenerRegistered?option is optional and additive; everything else is internal toIframeManager/MutationBuffer/record/index.ts. Bundle-size impact is small (~handful of WeakMap declarations + cleanup helpers) — worth confirming via bundle-size CI on this PR.If releasing new changes
pnpm changesetto generate a changeset file