fix(gnovm): respect Unicode range in 64-bit integer-to-string conversions#5807
Open
omarsy wants to merge 3 commits into
Open
fix(gnovm): respect Unicode range in 64-bit integer-to-string conversions#5807omarsy wants to merge 3 commits into
omarsy wants to merge 3 commits into
Conversation
…ions
Converting int/int64/uint/uint64 values to string narrowed the value to
int32 via rune(...) before Go's own range check could run, so out-of-range
values aliased onto valid code points instead of yielding the replacement
character per the Go spec:
string(uint64(0x10001F600)) // was "😀", Go yields "�"
string(int(-4294967231)) // was "A", Go yields "�"
Check that the value fits in an int32 first and return utf8.RuneError
otherwise; in-int32 invalid values (negatives, surrogates, > 0x10FFFF) are
already mapped to "�" by Go's native string(rune) conversion. The
remaining integer kinds are unaffected (see ADR for why uint32 is provably
safe). Covers both the runtime and constant evaluation paths, which share
ConvertTo.
Collaborator
🛠 PR Checks SummaryAll Automated Checks passed. ✅ Manual Checks (for Reviewers):
Read More🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers. ✅ Automated Checks (for Contributors):🟢 Maintainers must be able to edit this pull request (more info) ☑️ Contributor Actions:
☑️ Reviewer Actions:
📚 Resources:Debug
|
davd-gzl
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Per the Go spec (Conversions to and from a string type), converting an integer to a string yields the UTF-8 of the code point, and "values outside the range of valid Unicode code points are converted to
�".GnoVM implemented the conversion for the 64-bit-capable kinds (
int,int64,uint,uint64) asstring(rune(tv.GetInt64()))etc. Therune(...)wrapper truncates to int32 before Go's own range check runs, so an out-of-range value can alias onto a valid code point instead of yielding�:string(uint64(0x10001F600))�😀string(int(-4294967231))�Astring(uint64(0x100000000))�"\x00"The fix checks that the value fits in an int32 first and returns
utf8.RuneErrorotherwise; once a value fits in int32, Go's nativestring(rune)already maps every invalid case (negatives, surrogate halves, > 0x10FFFF) to�. Both the runtime and constant-evaluation paths flow through the sameConvertTo, so one fix covers both (including untyped rune constants like'A' + 0x100000000).The remaining integer kinds are intentionally unchanged — int8/int16/uint8/uint16 cannot exceed the int32 range, Int32 converts without narrowing, and uint32 is provably equivalent (values > MaxInt32 reinterpret as negative runes →
�; values in (0x10FFFF, MaxInt32] are out of range →�). Details in the included ADR.Verification
Found via differential testing against the Go toolchain. Verified byte-for-byte against
go runacross all integer kinds, named types, and untyped constants at every range boundary (0xD7FF/0xD800/0xDFFF/0xE000, 0x10FFFF/0x110000, MaxInt32±1, MinInt32, ±2^32±k, MinInt64/MaxInt64/MaxUint64), in multiple syntactic positions (var init, const decl, concatenation, map keys, switch, struct literals). The new filetest fails on master and passes with the fix; the fullTestFilessuite passes.Note
This PR is AI-assisted (found and developed with Claude); it includes an ADR per AGENTS.md (
gnovm/adr/prxxxx_string_conversion_unicode_range.md— to be renamed with the PR number).Related spec-compliance fixes from the same differential-testing effort: #5784, #5785.
🤖 Generated with Claude Code