Skip to content

[JSC] Reject dangling hyphen in class set under /v flag#180

Open
robobun wants to merge 1 commit intooven-sh:mainfrom
robobun:claude/fix-v-mode-dangling-hyphen
Open

[JSC] Reject dangling hyphen in class set under /v flag#180
robobun wants to merge 1 commit intooven-sh:mainfrom
robobun:claude/fix-v-mode-dangling-hyphen

Conversation

@robobun
Copy link
Copy Markdown

@robobun robobun commented Apr 8, 2026

In UnicodeSets mode (/v), - is a ClassSetSyntaxCharacter per ECMA-262 and is only legal between two ClassSetCharacters as part of a ClassSetRange. A bare or trailing - with no right-hand side must be rejected.

Patterns that currently parse but should not:

  • /[a-]/v
  • /[\d-]/v
  • /[\w-]/v
  • /[a-z\d-]/v

None of V8, SpiderMonkey, or the spec accept them.

Root cause

ClassSetParserDelegate::flushCachedCharacterIfNeeded() only handled CachedCharacter. end() silently accepted CachedCharacterHyphen by emitting both the cached character and a literal -. AfterCharacterClassHyphen fell through entirely. Both states represent an incomplete ClassSetRange — a - with nothing on the right — and must be errors in /v.

Fix

Raise ErrorCode::InvalidClassSetCharacter in both flushCachedCharacterIfNeeded() and end() when either incomplete-range state is reached. The valid-range path (a-z) goes straight from CachedCharacterHyphen into the range-completion branch of atomPatternCharacter() without touching either of those helpers, so it is unaffected.

Verification

Covered by the bun regression test in the companion bun PR. Patterns that currently should error still error, the four newly-detected dangling-hyphen forms now error, and /[a-z]/v, /[a\-]/v, /[\-a]/v, /[a--b]/v, /[a&&b]/v, /[\w--\d]/v still parse and match as before.

Fixes oven-sh/bun#29003.

In UnicodeSets mode (/v), - is a ClassSetSyntaxCharacter per
ECMA-262 and is only legal between two ClassSetCharacters as part
of a ClassSetRange. A bare or trailing - with no right-hand side
(e.g. /[a-]/v, /[\d-]/v, /[\w-]/v, /[a-z\d-]/v) must be rejected.

ClassSetParserDelegate previously silently accepted the
CachedCharacterHyphen and AfterCharacterClassHyphen states in
flushCachedCharacterIfNeeded() and end(), so these patterns parsed
without error and matched both operands and - literally. None of
V8, SpiderMonkey, or the spec agree.

Add an InvalidClassSetCharacter error when either incomplete-range
state is hit at a class-set transition point (nested class boundary,
set operator, or closing ]). The valid-range path
(CachedCharacter -> CachedCharacterHyphen -> completed range) is
unaffected because it does not go through flushCachedCharacterIfNeeded
or end() while the hyphen is pending.

Fixes oven-sh/bun#29003.
robobun added a commit to oven-sh/bun that referenced this pull request Apr 8, 2026
See oven-sh/WebKit#180 for the JSC-side parser fix.

In UnicodeSets mode, - is a ClassSetSyntaxCharacter that is only
legal as part of a full ClassSetRange. A dangling - (e.g. /[a-]/v,
/[\d-]/v, /[\w-]/v, /[a-z\d-]/v) must throw a SyntaxError, matching
V8/SpiderMonkey and the spec.

The fix lives in vendor/WebKit (not tracked in this repo) in
yarr/YarrParser.h. The companion WebKit PR raises
ErrorCode::InvalidClassSetCharacter in flushCachedCharacterIfNeeded()
and end() when the parser is still in a CachedCharacterHyphen or
AfterCharacterClassHyphen state at a class-set transition point.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dc9cb786-2c3c-4fcc-9c4d-61171a00ffd7

📥 Commits

Reviewing files that changed from the base of the PR and between c2010c4 and 3249453.

📒 Files selected for processing (1)
  • Source/JavaScriptCore/yarr/YarrParser.h

Walkthrough

This change modifies YarrParser to properly reject invalid unescaped hyphens in character classes when using the /v (unicodeSets) flag. Previously, certain hyphen patterns were incorrectly allowed; now they trigger syntax errors as per Unicode regex specifications.

Changes

Cohort / File(s) Summary
Character class hyphen validation
Source/JavaScriptCore/yarr/YarrParser.h
Modified ClassSetParserDelegate::flushCachedCharacterIfNeeded() to immediately reject any pending hyphen state as an invalid class set character. Modified ClassSetParserDelegate::end() to reject trailing hyphens in /v mode instead of emitting them as atoms, now handling both CachedCharacterHyphen and AfterCharacterClassHyphen states consistently.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: rejecting dangling hyphens in class sets under the /v flag in JavaScript regular expressions.
Description check ✅ Passed The description comprehensively covers the issue, root cause, fix, and verification, directly following the WebKit template with bug context and detailed explanation.
Linked Issues check ✅ Passed The code changes directly address the linked issue #29003 by rejecting invalid dangling hyphen patterns in /v mode that were previously incorrectly accepted.
Out of Scope Changes check ✅ Passed All changes are focused on fixing the dangling hyphen rejection in YarrParser.h's class set parsing for /v mode, directly addressing the linked issue.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

robobun added a commit to oven-sh/bun that referenced this pull request Apr 8, 2026
- Drop the multi-line prose comments in favor of the single-line
  issue URL (coderabbit nit).
- Split the already-passing regression guards ([-a], [-]) into
  their own test() block so they keep running even if the pending
  /v fix lands late.
- Wrap the not-yet-landed assertions (/[a-]/v, /[\d-]/v, /[\w-]/v,
  /[a-z\d-]/v) in test.todo(). These need oven-sh/WebKit#180 to
  merge and a WEBKIT_VERSION bump; the todos get promoted to test()
  in the same commit as the bump.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent escape character requirement for - in vnicode regular expressions

1 participant