fix(export): inline public/ images in static HTML export#9627
Conversation
Markdown like `mo.md("")` previously produced a
broken `<img src="public/image.png">` reference in the exported standalone
HTML — opening the file outside the notebook directory would 404 on every
image. The static HTML export only inlined `./@file/` virtual files, not
relative `public/` paths.
This change adds `replace_public_files_with_data_uris` and runs it during
`Exporter.export_as_html` (alongside the existing virtual-file pass) so
`<img>`, `<audio>`, `<video>`, and `<source>` `src` attributes pointing at
the notebook's `public/` folder are embedded as base64 data URIs. The same
10 MB inline limit used for virtual files applies; oversized files are
left as-is. Resolved paths are validated to stay inside the public
directory, rejecting `../` traversal and symlink escapes — matching the
runtime `serve_public_file` containment check.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR fixes static HTML export so that media references like public/image.png (commonly produced by markdown such as ) are inlined into the exported standalone HTML as base64 data URIs, making exported files self-contained outside the notebook directory.
Changes:
- Added
replace_public_files_with_data_uristo inlinepublic/(and./public/)srcreferences for common media tags while enforcing containment checks against traversal/symlink escapes. - Updated
Exporter.export_as_htmlto run the new public-file inlining pass in addition to the existing virtual-file inlining. - Added unit and integration-style regression tests covering inlining behavior and security constraints (traversal/symlink escape).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/_server/export/test_exporter.py | Adds export-level regression tests for inlining public/ assets and blocking traversal. |
| tests/_convert/common/test_dom_traversal.py | Adds unit tests for public/ HTML attribute inlining, including security-focused cases. |
| marimo/_server/export/exporter.py | Runs public-folder inlining during HTML export and refactors traversal of string outputs. |
| marimo/_convert/common/dom_traversal.py | Implements public/ path resolution + inlining into data URIs with containment checks. |
There was a problem hiding this comment.
3 issues found across 4 files
Architecture diagram
sequenceDiagram
participant User as User (Browser)
participant Exporter as Exporter (Server)
participant DomTraversal as dom_traversal
participant FileSystem as Notebook Public Dir
participant Output as HTML Snapshot
Note over User,Output: Static HTML Export Flow (public/ image inlining)
User->>Exporter: Request: export_as_html(filename, app, session_view)
Exporter->>Exporter: Resolve public_dir = Path(filename).parent / "public"
Exporter->>Exporter: _inline_virtual_files(session_snapshot)
Exporter->>Exporter: _iter_html_data_strings() yields (data_dict, mime, html_str)
Exporter->>DomTraversal: replace_virtual_files_with_data_uris(html_str)
DomTraversal-->>Exporter: processed_html, replaced_files
Exporter->>Exporter: Update data_dict[mime] = processed_html
Exporter->>Exporter: _inline_public_files(session_snapshot, public_dir)
alt public_dir exists
Exporter->>Exporter: Iterate HTML strings via _iter_html_data_strings()
Exporter->>Exporter: Check if "public/" in data
Exporter->>DomTraversal: replace_public_files_with_data_uris(html_str, public_dir)
DomTraversal->>DomTraversal: Parse HTML attributes (img, audio, video, source)
DomTraversal->>DomTraversal: Match src values against ^(?:\./)?public/(.+)$
alt Match found
DomTraversal->>FileSystem: _resolve_public_file(public_dir, relpath)
alt File exists & is inside public_dir
FileSystem-->>DomTraversal: Resolved Path
DomTraversal->>FileSystem: Read file bytes
alt File size <= max_inline_bytes (10MB)
FileSystem-->>DomTraversal: bytes
DomTraversal->>DomTraversal: base64 encode + build_data_url
DomTraversal-->>Exporter: Replaced src with data URI
Exporter->>Exporter: Add to replaced set
else File exceeds size limit
DomTraversal-->>Exporter: Original src preserved (skip)
end
else File missing or outside public_dir
FileSystem-->>DomTraversal: None (reject)
DomTraversal-->>Exporter: Original src preserved
end
else No match (external URL, other paths)
DomTraversal-->>Exporter: No change
end
Exporter->>Exporter: Update data_dict[mime] = processed_html
else public_dir does not exist
Exporter->>Exporter: Skip (no-op)
end
Exporter->>Exporter: Generate full HTML document with inlined assets
Exporter-->>User: Standalone HTML string (self-contained)
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
- Check file size via `stat()` before `read_bytes()` so an oversized `public/` asset never gets loaded into memory. - Catch `RuntimeError` from `Path.resolve(strict=True)` to handle symlink loops gracefully on Python 3.10–3.12. - Restrict `_iter_html_data_strings` to `text/html` outputs so non-HTML mime entries (`text/plain`, `application/json`, etc.) are never HTML-parsed and can never be mangled by attribute replacement. Each behaviour is pinned by a new test.
There was a problem hiding this comment.
1 issue found across 4 files (changes from recent commits).
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
|
🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.7-dev67 |
Markdown like
mo.md("")previously produced abroken
<img src="public/image.png">reference in the exported standaloneHTML — opening the file outside the notebook directory would 404 on every
image. The static HTML export only inlined
./@file/virtual files, notrelative
public/paths.This change adds
replace_public_files_with_data_urisand runs it duringExporter.export_as_html(alongside the existing virtual-file pass) so<img>,<audio>,<video>, and<source>srcattributes pointing atthe notebook's
public/folder are embedded as base64 data URIs. The same10 MB inline limit used for virtual files applies; oversized files are
left as-is. Resolved paths are validated to stay inside the public
directory, rejecting
../traversal and symlink escapes — matching theruntime
serve_public_filecontainment check.