Skip to content

Loaders, Editor: Improve handling of assets with unicode characters.#33301

Draft
Mugen87 wants to merge 2 commits intomrdoob:devfrom
Mugen87:dev2
Draft

Loaders, Editor: Improve handling of assets with unicode characters.#33301
Mugen87 wants to merge 2 commits intomrdoob:devfrom
Mugen87:dev2

Conversation

@Mugen87
Copy link
Copy Markdown
Collaborator

@Mugen87 Mugen87 commented Mar 31, 2026

Fixed #29963.

Description

The PR improves the loading of glTF assets with URL-encoded UTF8 characters in the editor via drag'n'drop or via "Import". Both the zipped glTF assets as well as the unzipped versions from #29963 can be imported now as expected. Besides, to fix network requests outside of the editor, LoadingManager was also updated.

To further explain: There are different Normalization Forms in Unicode. Only when the normalization is handled consistently, the loading process works as expected. The PR makes sure the editor (zipped and drag'n'drop) as well as plain network requests use a NFC normalization.

@Mugen87 Mugen87 added this to the r184 milestone Mar 31, 2026
@github-actions
Copy link
Copy Markdown

📦 Bundle size

Full ESM build, minified and gzipped.

Before After Diff
WebGL 360.06
85.47
360.06
85.47
+0 B
+0 B
WebGPU 633.13
175.68
633.13
175.68
+0 B
+0 B
WebGPU Nodes 631.25
175.39
631.25
175.39
+0 B
+0 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Before After Diff
WebGL 492.3
120.06
492.3
120.06
+0 B
+0 B
WebGPU 705.19
190.56
705.19
190.56
+0 B
+0 B
WebGPU Nodes 654.41
177.83
654.41
177.83
+0 B
+0 B

@Mugen87 Mugen87 marked this pull request as draft March 31, 2026 19:14
@Mugen87 Mugen87 changed the title Editor: Improve handling of assets with unicode characters. Loaders, Editor: Improve handling of assets with unicode characters. Mar 31, 2026
@Mugen87 Mugen87 modified the milestones: r184, r185 Apr 1, 2026
@Mugen87
Copy link
Copy Markdown
Collaborator Author

Mugen87 commented Apr 1, 2026

TIL Unicode Normalization Forms. Because I was still unsure, I've asked AI why decodeURIComponent() alone isn't sufficient. Here the answer:


decodeURIComponent only undoes percent-encoding — it converts %EC%95%88 back to the bytes ec 95 88, which become the character 안. But it has no opinion about which byte sequence represents that character.

The Korean syllable 안 can be encoded in UTF-8 two ways:

NFC (composed): ec 95 88 — a single codepoint U+C548 안
NFD (decomposed): e1 84 8b e1 85 a1 e1 86 ab — three codepoints: ᄋ + ᅡ + ᆫ (the individual jamo)
Both are valid UTF-8. Both render identically. But they're different byte sequences, so === returns false.

decodeURIComponent faithfully reproduces whichever form was percent-encoded. If the glTF JSON had NFD bytes, decoding gives you the NFD string. If the zip entry used NFC, decoding gives NFC. They won't match.

This is exactly what happens with this test asset:

glTF content (written on macOS, which favors NFD): buffer + NFD jamo → 27 bytes for the Korean part
Zip entry name (stored in NFC): buffer + NFC syllables → 18 bytes for the Korean part
.normalize('NFC') collapses both representations to the same composed form, making them comparable. That's why decodeURIComponent alone isn't enough — it solves the %XX encoding layer but not the Unicode normalization layer underneath.


The same applies to pure network requests. If we fetch a URL from the glTF JSON, we must normalize it to NFC since all network requests expect NFC-encoded requests.

@Mugen87
Copy link
Copy Markdown
Collaborator Author

Mugen87 commented Apr 2, 2026

@donmccurdy I wasn't aware of the topic "Unicode Normalization Forms" until I studied it the last two days. Did you ever encounter a similar issue like #29963 in context of gltf-viewer? Do you think the fix (especially the new line in LoadingManager.resolveURL()) looks good?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GLTFLoader: Assets with url-encoded UTF8 characters in filenames don't load correctly

1 participant