Skip to content

perf(autoreload): skip stdlib/site-packages on per-cell check#9629

Merged
mscolnick merged 5 commits into
mainfrom
fix/autoreload-per-cell-overhead
May 21, 2026
Merged

perf(autoreload): skip stdlib/site-packages on per-cell check#9629
mscolnick merged 5 commits into
mainfrom
fix/autoreload-per-cell-overhead

Conversation

@mscolnick
Copy link
Copy Markdown
Contributor

This pull request was authored by a coding agent.

Fixes #9628.

With auto_reload set to lazy or autorun, every cell run was calling ModuleReloader.check(sys.modules, reload=True), which iterates all of sys.modules and does os.stat on each entry. With ~1000 modules in scope (typical), that adds 16–80ms per cell — compounded across the dozen cells re-running on a UI interaction it becomes a >1s lag.

This change adds an opt-in skip_non_user_modules=True flag on ModuleReloader.check. When set, stdlib and site-packages module names are recorded in a persistent skip set (classified by sysconfig prefixes) and short-circuited on subsequent calls.

AutoreloadManager.cell_scope (the hot per-cell path) opts in. The background ModuleWatcher keeps the default behavior and continues to scan every module on its 1s loop, so edits inside an installed package are still detected — just at watcher latency rather than cell-entry latency. Editable installs (pip install -e ., uv add --editable) have __file__ outside site-packages, so they are correctly classified as user code and reload with no latency change.

Benchmark

Driving ModuleReloader.check() directly, 200 iterations post-warmup. Issue-shaped workload: ~2.5k modules (heavy stdlib + numpy/pandas/etc.) + 5 user files in a tmp dir.

path median p95
before 4.88 ms 6.15 ms
after 0.91 ms 1.01 ms

~4 ms saved per cell run, 5.4× median speedup.

Scale curve (median µs, varying user-module count):

user mods sys.modules before after speedup
0 2514 5037 873 5.8×
5 2519 5245 802 6.5×
25 2539 6082 1693 3.6×
100 2614 8342 4421 1.9×
500 3014 12489 8398 1.5×

The win narrows as user-code grows, by design: the optimization only filters out non-user-code.

Every cell run with auto_reload enabled was stat-ing every entry in
sys.modules (often 1000+), adding 16-80ms of overhead per cell.

Add an opt-in skip_non_user_modules flag on ModuleReloader.check that
caches stdlib/site-packages module names in a persistent skip set.
AutoreloadManager.cell_scope opts in; the background ModuleWatcher
keeps the default full scan so edits inside installed packages remain
detectable at watcher latency.

Fixes #9628
@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 20, 2026 10:14pm

Request Review

@mscolnick mscolnick added the enhancement New feature or request label May 20, 2026
@mscolnick mscolnick requested review from akshayka and dmadisetti May 20, 2026 18:40
@mscolnick mscolnick marked this pull request as ready for review May 20, 2026 18:43
Copilot AI review requested due to automatic review settings May 20, 2026 18:43
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 3 files

Architecture diagram
sequenceDiagram
    participant UI as User/Client
    participant AM as AutoreloadManager
    participant MR as ModuleReloader
    participant MW as ModuleWatcher (Background)
    participant sysmod as sys.modules dict
    participant FS as Filesystem (os.stat)

    Note over AM,FS: Per-cell execution path (hot path)

    UI->>AM: Execute cell (lazy/auto reload)
    AM->>AM: snapshot = set(sys.modules)
    AM->>MR: check(modules=sys.modules, reload=True, skip_non_user_modules=True)
    
    Note over MR: Skip cache populated lazily
    MR->>MR: _non_user_roots from sysconfig (stdlib, purelib, platlib, base_prefix)
    
    loop For each module in sys.modules
        alt Module name in _skip set
            MR->>MR: continue (skip entirely)
        else Module not classified yet
            MR->>MR: _is_user_module(module)
            alt __file__ starts with non_user_root
                MR->>MR: skip.add(modname), continue
            else User module (editable install / source tree)
                MR->>FS: os.stat(module.__file__)
                FS-->>MR: mtime
                MR->>MR: Compare with cached mtime
            end
        end
    end
    
    alt Stale modules found
        MR->>MR: Reload stale modules
        MR-->>AM: Set of modified modules
    else No stale modules
        MR-->>AM: Empty set (fast path)
    end
    
    AM->>AM: Execute cell yield
    AM->>AM: new_modules = sys.modules - snapshot
    AM->>MR: check(new_modules, reload=False, skip_non_user_modules=True)
    
    Note over AM: Cell execution complete

    Note over MW,FS: Background watcher path (1s loop)

    loop Every ~1 second
        MW->>MR: check(modules=sys.modules, reload=False)
        Note over MR: Default behavior - scans ALL modules
        
        loop For each module
            alt User module (not in site-packages)
                MR->>FS: os.stat(n), compare
            else Stdlib / site-packages
                MR->>FS: os.stat(n), compare
            end
        end
        
        alt Modified modules detected
            MR-->>MW: Set of updated module names
            MW->>MW: Trigger reload callback (if auto_reload=autorun)
        end
    end

    Note over AM,FS: New: skip_non_user_modules flag
    Note over AM: User code changes detected immediately
    Note over MW: Site-package changes detected at watcher latency
Loading

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread marimo/_runtime/reload/manager.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves autoreload performance by avoiding per-cell os.stat scans over the full sys.modules set when runtime.auto_reload is enabled, addressing the cell execution latency regression reported in #9628.

Changes:

  • Added skip_non_user_modules option to ModuleReloader.check() and a persistent skip cache for stdlib/site-packages modules.
  • Updated AutoreloadManager.cell_scope() to use the skip behavior on the hot per-cell path.
  • Added targeted tests for user vs non-user module classification and skip-cache behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
marimo/_runtime/reload/autoreload.py Introduces non-user root detection, user-module classification, and a persistent skip cache used by ModuleReloader.check().
marimo/_runtime/reload/manager.py Opts the per-cell autoreload path into skipping non-user modules to reduce per-cell overhead.
tests/_runtime/reload/test_autoreload.py Adds regression tests for skip-cache population and behavior differences between watcher vs hot path.

Comment thread marimo/_runtime/reload/manager.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/manager.py Outdated
Copy link
Copy Markdown
Contributor

@akshayka akshayka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, code style comments

Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread marimo/_runtime/reload/autoreload.py Outdated
akshayka
akshayka previously approved these changes May 20, 2026
Copy link
Copy Markdown
Collaborator

@dmadisetti dmadisetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the continue logic should be tied to skip_non_user_modules

Comment thread marimo/_runtime/reload/autoreload.py
Comment thread marimo/_runtime/reload/autoreload.py Outdated
source tree, so they are correctly classified as user code.
"""
f = safe_getattr(module, "__file__", None)
if not f:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false positive on c libraries? Unsure, but I think so. Maybe that's fine

Comment thread marimo/_runtime/reload/autoreload.py Outdated
@mscolnick
Copy link
Copy Markdown
Contributor Author

thanks @dmadisetti, i had that but removed from the comments. will add back skip_non_user_modules to just the hot path

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread marimo/_runtime/reload/autoreload.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/manager.py
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread marimo/_runtime/reload/autoreload.py
@mscolnick mscolnick merged commit 5700859 into main May 21, 2026
56 of 62 checks passed
@mscolnick mscolnick deleted the fix/autoreload-per-cell-overhead branch May 21, 2026 14:19
@github-actions
Copy link
Copy Markdown

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.7-dev71

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Module autoreload leads to a significant performance drop

4 participants