Skip to content

fix(export): inline public/ images in static HTML export#9627

Merged
mscolnick merged 2 commits into
mainfrom
ms/fix/issue-9625
May 21, 2026
Merged

fix(export): inline public/ images in static HTML export#9627
mscolnick merged 2 commits into
mainfrom
ms/fix/issue-9625

Conversation

@mscolnick
Copy link
Copy Markdown
Contributor

Markdown like mo.md("![alt](public/image.png)") previously produced a
broken <img src="public/image.png"> reference in the exported standalone
HTML — opening the file outside the notebook directory would 404 on every
image. The static HTML export only inlined ./@file/ virtual files, not
relative public/ paths.

This change adds replace_public_files_with_data_uris and runs it during
Exporter.export_as_html (alongside the existing virtual-file pass) so
<img>, <audio>, <video>, and <source> src attributes pointing at
the notebook's public/ folder are embedded as base64 data URIs. The same
10 MB inline limit used for virtual files applies; oversized files are
left as-is. Resolved paths are validated to stay inside the public
directory, rejecting ../ traversal and symlink escapes — matching the
runtime serve_public_file containment check.

Markdown like `mo.md("![alt](public/image.png)")` previously produced a
broken `<img src="public/image.png">` reference in the exported standalone
HTML — opening the file outside the notebook directory would 404 on every
image. The static HTML export only inlined `./@file/` virtual files, not
relative `public/` paths.

This change adds `replace_public_files_with_data_uris` and runs it during
`Exporter.export_as_html` (alongside the existing virtual-file pass) so
`<img>`, `<audio>`, `<video>`, and `<source>` `src` attributes pointing at
the notebook's `public/` folder are embedded as base64 data URIs. The same
10 MB inline limit used for virtual files applies; oversized files are
left as-is. Resolved paths are validated to stay inside the public
directory, rejecting `../` traversal and symlink escapes — matching the
runtime `serve_public_file` containment check.
Copilot AI review requested due to automatic review settings May 20, 2026 14:38
@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 20, 2026 3:53pm

Request Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes static HTML export so that media references like public/image.png (commonly produced by markdown such as ![alt](public/image.png)) are inlined into the exported standalone HTML as base64 data URIs, making exported files self-contained outside the notebook directory.

Changes:

  • Added replace_public_files_with_data_uris to inline public/ (and ./public/) src references for common media tags while enforcing containment checks against traversal/symlink escapes.
  • Updated Exporter.export_as_html to run the new public-file inlining pass in addition to the existing virtual-file inlining.
  • Added unit and integration-style regression tests covering inlining behavior and security constraints (traversal/symlink escape).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tests/_server/export/test_exporter.py Adds export-level regression tests for inlining public/ assets and blocking traversal.
tests/_convert/common/test_dom_traversal.py Adds unit tests for public/ HTML attribute inlining, including security-focused cases.
marimo/_server/export/exporter.py Runs public-folder inlining during HTML export and refactors traversal of string outputs.
marimo/_convert/common/dom_traversal.py Implements public/ path resolution + inlining into data URIs with containment checks.

Comment thread marimo/_convert/common/dom_traversal.py Outdated
Comment thread marimo/_server/export/exporter.py Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 4 files

Architecture diagram
sequenceDiagram
    participant User as User (Browser)
    participant Exporter as Exporter (Server)
    participant DomTraversal as dom_traversal
    participant FileSystem as Notebook Public Dir
    participant Output as HTML Snapshot

    Note over User,Output: Static HTML Export Flow (public/ image inlining)

    User->>Exporter: Request: export_as_html(filename, app, session_view)
    Exporter->>Exporter: Resolve public_dir = Path(filename).parent / "public"
    Exporter->>Exporter: _inline_virtual_files(session_snapshot)
    Exporter->>Exporter: _iter_html_data_strings() yields (data_dict, mime, html_str)
    Exporter->>DomTraversal: replace_virtual_files_with_data_uris(html_str)
    DomTraversal-->>Exporter: processed_html, replaced_files
    Exporter->>Exporter: Update data_dict[mime] = processed_html

    Exporter->>Exporter: _inline_public_files(session_snapshot, public_dir)
    alt public_dir exists
        Exporter->>Exporter: Iterate HTML strings via _iter_html_data_strings()
        Exporter->>Exporter: Check if "public/" in data
        Exporter->>DomTraversal: replace_public_files_with_data_uris(html_str, public_dir)
        DomTraversal->>DomTraversal: Parse HTML attributes (img, audio, video, source)
        DomTraversal->>DomTraversal: Match src values against ^(?:\./)?public/(.+)$
        alt Match found
            DomTraversal->>FileSystem: _resolve_public_file(public_dir, relpath)
            alt File exists & is inside public_dir
                FileSystem-->>DomTraversal: Resolved Path
                DomTraversal->>FileSystem: Read file bytes
                alt File size <= max_inline_bytes (10MB)
                    FileSystem-->>DomTraversal: bytes
                    DomTraversal->>DomTraversal: base64 encode + build_data_url
                    DomTraversal-->>Exporter: Replaced src with data URI
                    Exporter->>Exporter: Add to replaced set
                else File exceeds size limit
                    DomTraversal-->>Exporter: Original src preserved (skip)
                end
            else File missing or outside public_dir
                FileSystem-->>DomTraversal: None (reject)
                DomTraversal-->>Exporter: Original src preserved
            end
        else No match (external URL, other paths)
            DomTraversal-->>Exporter: No change
        end
        Exporter->>Exporter: Update data_dict[mime] = processed_html
    else public_dir does not exist
        Exporter->>Exporter: Skip (no-op)
    end

    Exporter->>Exporter: Generate full HTML document with inlined assets
    Exporter-->>User: Standalone HTML string (self-contained)
Loading

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread marimo/_convert/common/dom_traversal.py Outdated
Comment thread marimo/_convert/common/dom_traversal.py
Comment thread marimo/_server/export/exporter.py
@mscolnick mscolnick added the bug Something isn't working label May 20, 2026
- Check file size via `stat()` before `read_bytes()` so an oversized
  `public/` asset never gets loaded into memory.
- Catch `RuntimeError` from `Path.resolve(strict=True)` to handle
  symlink loops gracefully on Python 3.10–3.12.
- Restrict `_iter_html_data_strings` to `text/html` outputs so non-HTML
  mime entries (`text/plain`, `application/json`, etc.) are never
  HTML-parsed and can never be mangled by attribute replacement.

Each behaviour is pinned by a new test.
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread marimo/_convert/common/dom_traversal.py
@mscolnick mscolnick requested a review from kirangadhave May 20, 2026 20:29
Copy link
Copy Markdown
Member

@kirangadhave kirangadhave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 great fix!

@mscolnick mscolnick merged commit 92a02ee into main May 21, 2026
44 checks passed
@mscolnick mscolnick deleted the ms/fix/issue-9625 branch May 21, 2026 00:25
@github-actions
Copy link
Copy Markdown

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.7-dev67

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants