Skip to content

Fix #233: prevent YouTube/media thumbnails from being stored as base6…#237

Open
pieer wants to merge 4 commits into
masterfrom
fix/media-preview-diff-suppression
Open

Fix #233: prevent YouTube/media thumbnails from being stored as base6…#237
pieer wants to merge 4 commits into
masterfrom
fix/media-preview-diff-suppression

Conversation

@pieer

@pieer pieer commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Fix #233: Ensure media previews do not trigger MAX_GIT_DIFF_LINE_CHARACTERS

Issue 1 (YouTube → base64): The mechanism is rich-text paste. When a user copies content from YouTube.com, the browser clipboard carries HTML that includes the channel thumbnail as an element. In WYSIWYG mode, Toast UI processes that HTML, extracts the image, and the addImageBlobHook in toast-editor.ts:87 converts it to base64 before storing it in markdown. Intentional design for dropped/local images — unintended for externally-fetched thumbnails in clipboard HTML.

Issue 2 (diff suppression): parseHunks in gitdiff.go:977 truncates any line exceeding MaxGitDiffLineCharacters (5000) and sets IsIncompleteLineTooLong = true, which causes the diff template (box.tmpl:175) to suppress the entire file diff with no load button

Fix 1 — web_src/js/features/toast-editor.ts (prevents the base64 from being created)

Added a paste intercept that runs before Toast UI processes clipboard HTML. When HTML content is pasted (e.g. copied from YouTube with a thumbnail in the rich HTML), it strips any image whose src is an external URL, replacing it with its alt text or removing it. Images with data: URIs (locally dropped/deliberately embedded) are untouched. The cleaned HTML is re-dispatched so Toast UI still processes the rest of the pasted content normally.

Fix 2 — services/gitdiff/gitdiff.go (prevents the diff from being suppressed even if base64 is present)

Added isBase64ImageDiffLine() which detects Markdown image lines containing an inline data:image/...;base64, payload. When such a line exceeds maxLineCharacters, instead of marking the entire file as IsIncompleteLineTooLong (which hides the whole diff with no load option), the base64 payload is replaced with a short placeholder like +[embedded base64 image, ~45 KB]. All other content in the file renders normally.

Co-Author: Opus 4.8

…4 and suppress diff

Two-part fix for issue #233 where adding a YouTube link to an article
triggered 'File diff suppressed because one or more lines are too long':

1. toast-editor.ts: intercept clipboard HTML paste in WYSIWYG mode and
   strip external <img> elements before Toast UI processes them. Images
   copied from media sites (YouTube thumbnails, og:image, etc.) were
   flowing through addImageBlobHook and being stored as very long base64
   strings in the markdown file. Only data: URI images (deliberately
   embedded by the user) are kept.

2. gitdiff.go: detect diff lines that are Markdown images with an inline
   base64 data URI. Instead of marking the entire file as
   IsIncompleteLineTooLong (which hides the whole diff with no load
   option), substitute a short placeholder like
   '[embedded base64 image, ~45 KB]' so the rest of the diff renders
   normally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@pieer

pieer commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator Author

Summary

Two fixes for issue #233: (1) a paste interceptor in toast-editor.ts that strips external <img> elements from clipboard HTML before Toast UI converts them to base64, and (2) a gitdiff.go change that replaces an over-long base64 image diff line with a short placeholder instead of suppressing the whole file diff. After tracing both code paths, neither fix actually works under the default configuration: the backend branch is unreachable for real base64 images, and the frontend re-dispatches the cleaned paste at an element the editor never listens on. The remaining findings concern correctness of the helpers even once those are fixed.

Findings

1. 🔴 Backend base64 placeholder is unreachable for real images (default config)

File: services/gitdiff/gitdiff.go:982-997
Problem: readerSize = max(maxLineCharacters, 4096) = 5000 with the default MaxGitDiffLineCharacters=5000, so any line longer than 5000 bytes makes input.ReadLine() return isFragment=true. The pre-existing if isFragment {…} block (just above the new code) sets curFile.IsIncompleteLineTooLong = true and drains the rest of the line before the new base64 check runs; and because ReadLine only returned the first ~5000 bytes, len(line) > maxLineCharacters (5000 > 5000) is false, so the base64 branch never executes. A real embedded thumbnail is always far larger than 5000 bytes, so the file diff is still fully suppressed — the fix is effectively dead code for the case it targets (it can only run when maxLineCharacters < 4096, a non-default setting).

// Handle the base64 case inside the isFragment path, where over-long lines actually land.
if isFragment {
    if len(line) > 1 && isBase64ImageDiffLine(line[1:]) {
        drained := len(line)
        for isFragment {
            lineBytes, isFragment, err = input.ReadLine()
            if err != nil { return lineBytes, isFragment, fmt.Errorf("unable to ReadLine: %w", err) }
            drained += len(lineBytes)
        }
        decodedKB := drained * 3 / 4 / 1024 // approximate
        line = string(line[0]) + fmt.Sprintf("[embedded base64 image, ~%d KB]", decodedKB)
    } else {
        curFile.IsIncomplete = true
        curFile.IsIncompleteLineTooLong = true
        for isFragment { /* drain as today */ }
    }
}

Status: open

  • Addressed
  • Dismissed

2. 🔴 Frontend re-dispatches the cleaned paste at an element the editor never listens on

File: web_src/js/features/toast-editor.ts:163-169
Problem: The handler preventDefaults the original paste and dispatches a synthetic paste on container (#toast-editor-container). Toast UI's WYSIWYG editor is ProseMirror, which registers paste via handleDOMEvents on view.dom — a descendant of container. A dispatchEvent on an ancestor never reaches a descendant's listener (events do not descend), so the synthetic event is dropped: when any external image is present, the entire paste (including all the text the user wanted) is silently lost. This is a worse regression than the original bug.

// Dispatch on the original target (inside view.dom) so the editor actually receives it.
// The capture listener re-runs but now finds no external <img>, so it won't re-dispatch.
const target = (e.target as HTMLElement) ?? container;
target.dispatchEvent(new ClipboardEvent('paste', {bubbles: true, cancelable: true, clipboardData: dt}));

Status: open

  • Addressed
  • Dismissed

3. 🟡 Synthetic ClipboardEvent with a constructed clipboardData is not reliable cross-browser

File: web_src/js/features/toast-editor.ts:160-168
Problem: Even dispatched at the right target, new ClipboardEvent('paste', {clipboardData: dt}) does not reliably expose dt as event.clipboardData in all engines (Firefox has historically constructed ClipboardEvent with a null clipboardData; Safari is also inconsistent), so the editor may receive an event with no data and insert nothing. Consider stripping at a layer the editor controls instead — e.g. Toast UI's HTML sanitizer / paste hook, or rejecting non-data: images inside addImageBlobHook — rather than synthesizing a paste event.

// More robust: drop external images in the editor's own pipeline, e.g. reject them in
// addImageBlobHook, or register a ProseMirror transformPastedHTML / Toast UI sanitizer
// so the editor still performs the actual insert.

Status: open

  • Addressed
  • Dismissed

4. 🟡 isBase64ImageDiffLine matches any line containing the substrings, not just image markdown

File: services/gitdiff/gitdiff.go:799-802
Problem: The check is strings.Contains(s, "data:image/") && strings.Contains(s, ";base64,") anywhere in the line, so a long line of prose, JSON, or code that merely mentions a data URI would be rewritten to [embedded base64 image, ~N KB] and silently skip the normal truncation/too-long handling — misrepresenting real content. Anchor the match to the Markdown image form.

var base64ImageLineRe = regexp.MustCompile(`!\[[^\]]*\]\(\s*data:image/[a-zA-Z.+-]+;base64,`)
func isBase64ImageDiffLine(s string) bool { return base64ImageLineRe.MatchString(s) }

Status: open

  • Addressed
  • Dismissed

5. ⚪️ Placeholder size estimate is inaccurate and shows "~0 KB" for small images

File: services/gitdiff/gitdiff.go:991-993
Problem: decodedKB := (len(line) - b64Offset - 8) * 3 / 4 / 1024 measures from the first ;base64, to the end of the (truncated) line — it includes the trailing ) and any following content, uses only the first match when several images share a line, and integer-divides to 0 KB for anything under ~1.3 KB decoded. Minor, but the displayed number can be misleading; consider rounding up / showing bytes for small payloads and measuring the actual base64 run length.

Status: open

  • Addressed
  • Dismissed

6. ⚪️ Only paste is intercepted; drag-drop of external images is not, and the alt text is discarded in the diff

File: web_src/js/features/toast-editor.ts:122-129, services/gitdiff/gitdiff.go:994
Problem: Dragging an image from a web page into the editor still flows through addImageBlobHook and becomes base64 (the drop handler only checks file-size limits), so the same issue recurs via drag. Separately, the backend placeholder replaces the whole line, discarding the image's alt text/filename that could make the diff more informative (![logo](…) → keep logo). Both are scope/UX refinements, not correctness blockers.

Status: open

  • Addressed
  • Dismissed

Counts

Severity Count
🔴 High 2
🟡 Medium 2
⚪️ Low 2

pieer and others added 3 commits June 30, 2026 15:48
The base64 placeholder only ran in the `len(line) > maxLineCharacters` branch,
but a real embedded image far exceeds the read buffer (max(maxLineCharacters,
4096), 5000 by default), so ReadLine returns it as a fragment. The pre-existing
`if isFragment` block set IsIncompleteLineTooLong and drained the line before the
placeholder check, and the truncated fragment was never > maxLineCharacters — so
the whole file diff was still suppressed under the default config.

- Detect and substitute the placeholder inside the isFragment path (counting the
  drained bytes for the size estimate); keep the non-fragment branch for the rare
  case where maxLineCharacters is configured below the buffer size.
- Anchor isBase64ImageDiffLine to the Markdown image syntax via regex so prose or
  code merely mentioning a data URI is not misclassified.
- Estimate the decoded size more accurately and format it with base.FileSize, and
  keep the image's alt text in the placeholder.
- Add TestParsePatch_base64Image covering the fragment path and the prose guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ontainer

The paste interceptor rebuilt a DataTransfer without external images but
dispatched the synthetic paste on `container` — an ancestor of the element the
editor listens on (the pseudo-clipboard textarea in markdown mode, the
ProseMirror contenteditable in WYSIWYG). Events don't descend into children, so
the editor never received it and the entire paste was dropped whenever an
external image was present. Dispatch on `e.target` instead; rebuilding the
clipboard without the image item is what preserves the text while dropping the
incidental thumbnail.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…gracefully

new ClipboardEvent('paste', {clipboardData}) is not honored everywhere (Safari
ignores it), so re-dispatching would carry no data and silently drop the paste.
Detect support once; when unavailable, skip the strip/re-dispatch and let the
paste proceed normally — the gitdiff base64 placeholder remains the backend
safety net, so the only downside on those engines is the (now diff-safe) base64
still being stored, rather than losing the user's pasted content.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure media previews do not trigger MAX_GIT_DIFF_LINE_CHARACTERS

1 participant