refactor(escape): use regex replace, remove Node < 4 polyfill#2197
refactor(escape): use regex replace, remove Node < 4 polyfill#2197
Conversation
There was a problem hiding this comment.
Pull request overview
Refactors escaping/encoding internals by simplifying escape routines to use String.prototype.replace with a shared replacer callback and by removing legacy Node < 4 codePointAt compatibility code, aligning with the package’s current Node engine requirements.
Changes:
- Replace the
getEscaperfactory + manualregex.execloop withreplace(..., escapeReplacer)forescapeUTF8,escapeAttribute, andescapeText. - Remove the exported
getCodePointpolyfill and switch callers to nativeString.prototype.codePointAt. - Update
encodeXMLandencodeHTMLTrieReto usecodePointAtdirectly.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/escape.ts | Removes getCodePoint polyfill/export; refactors escape helpers to regex replace with shared escapeReplacer. |
| src/encode.ts | Drops getCodePoint import and uses codePointAt for numeric entity emission. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThe pull request refactors character encoding functions throughout the module. It removes the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/escape.ts`:
- Around line 110-134: Add a fast no-match guard to avoid the replace(callback)
overhead for short/unchanged strings: in escapeUTF8, escapeAttribute, and
escapeText, check quickly for the presence of any characters that would require
escaping using simple string.indexOf/lookups (not regex.test on a global regex)
and return the original input early if none found, then fall back to the
existing return data.replace(..., escapeReplacer). For escapeAttribute check for
'"' '&' and '\u00A0'; for escapeText check for '&' '<' '>' and '\u00A0'; for
escapeUTF8 use the minimal set of characters that xmlEscapeRegex would match.
Ensure you do not call regex.test on a /g regex to avoid stateful behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2547eb3b-fb6c-4cf8-9e71-0815353d400e
📒 Files selected for processing (3)
src/decode.spec.tssrc/encode.tssrc/escape.ts
| export function escapeUTF8(data: string): string { | ||
| return data.replace(xmlEscapeRegex, escapeReplacer); | ||
| } | ||
|
|
||
| const attributeEscapeRegex = /["&\u00A0]/g; | ||
|
|
||
| /** | ||
| * Encodes all characters that have to be escaped in HTML attributes, | ||
| * following {@link https://html.spec.whatwg.org/multipage/parsing.html#escapingString}. | ||
| * @param data String to escape. | ||
| */ | ||
| export const escapeAttribute: (data: string) => string = | ||
| /* #__PURE__ */ getEscaper( | ||
| /["&\u00A0]/g, | ||
| new Map([ | ||
| [34, """], | ||
| [38, "&"], | ||
| [160, " "], | ||
| ]), | ||
| ); | ||
| export function escapeAttribute(data: string): string { | ||
| return data.replace(attributeEscapeRegex, escapeReplacer); | ||
| } | ||
|
|
||
| const textEscapeRegex = /[&<>\u00A0]/g; | ||
|
|
||
| /** | ||
| * Encodes all characters that have to be escaped in HTML text, | ||
| * following {@link https://html.spec.whatwg.org/multipage/parsing.html#escapingString}. | ||
| * @param data String to escape. | ||
| */ | ||
| export const escapeText: (data: string) => string = /* #__PURE__ */ getEscaper( | ||
| /[&<>\u00A0]/g, | ||
| new Map([ | ||
| [38, "&"], | ||
| [60, "<"], | ||
| [62, ">"], | ||
| [160, " "], | ||
| ]), | ||
| ); | ||
| export function escapeText(data: string): string { | ||
| return data.replace(textEscapeRegex, escapeReplacer); | ||
| } |
There was a problem hiding this comment.
Short-input performance regression in new replace-based hot paths
The PR benchmark shows a measurable throughput drop (~19% geometric mean). The new callback replace path is likely the source for short strings in escapeUTF8, escapeAttribute, and escapeText.
⚡ Proposed mitigation (fast no-match guard)
const xmlEscapeRegex = /["&'<>]/g;
+const xmlEscapeCheckRegex = /["&'<>]/;
@@
export function escapeUTF8(data: string): string {
- return data.replace(xmlEscapeRegex, escapeReplacer);
+ return xmlEscapeCheckRegex.test(data)
+ ? data.replace(xmlEscapeRegex, escapeReplacer)
+ : data;
}
const attributeEscapeRegex = /["&\u00A0]/g;
+const attributeEscapeCheckRegex = /["&\u00A0]/;
@@
export function escapeAttribute(data: string): string {
- return data.replace(attributeEscapeRegex, escapeReplacer);
+ return attributeEscapeCheckRegex.test(data)
+ ? data.replace(attributeEscapeRegex, escapeReplacer)
+ : data;
}
const textEscapeRegex = /[&<>\u00A0]/g;
+const textEscapeCheckRegex = /[&<>\u00A0]/;
@@
export function escapeText(data: string): string {
- return data.replace(textEscapeRegex, escapeReplacer);
+ return textEscapeCheckRegex.test(data)
+ ? data.replace(textEscapeRegex, escapeReplacer)
+ : data;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/escape.ts` around lines 110 - 134, Add a fast no-match guard to avoid the
replace(callback) overhead for short/unchanged strings: in escapeUTF8,
escapeAttribute, and escapeText, check quickly for the presence of any
characters that would require escaping using simple string.indexOf/lookups (not
regex.test on a global regex) and return the original input early if none found,
then fall back to the existing return data.replace(..., escapeReplacer). For
escapeAttribute check for '"' '&' and '\u00A0'; for escapeText check for '&' '<'
'>' and '\u00A0'; for escapeUTF8 use the minimal set of characters that
xmlEscapeRegex would match. Ensure you do not call regex.test on a /g regex to
avoid stateful behavior.
Summary
getEscaperfactory pattern (manualregex.execloop) with directString.prototype.replacecalls using a sharedescapeReplacerswitch function forescapeUTF8,escapeAttribute, andescapeTextgetCodePointNode < 4 polyfill — uses nativeString.prototype.codePointAtdirectlyencodeXMLandencodeHTMLTrieRe(inencode.ts) to callcodePointAtinstead of the removedgetCodePointwrapperBreaking changes
getCodePointis no longer exported fromescape.ts. Any downstream code importing this helper will break. This was an internal/undocumented export.escapeUTF8,escapeAttribute, andescapeTextexports change fromconstarrow functions to namedfunctiondeclarations. This should not affect normal usage, but code relying on.nameor identity checks may notice.codePointAtpolyfill — Node < 4 is no longer supported.Benchmark results
Geometric mean across 12 text variants (short/medium/long × low/high entity density × named/numeric), via html-entity-benchmarks
compare:escapeUTF8regresses on short inputs where the oldgetEscaperbitset loop was faster thanString.prototype.replace. Longer inputs are closer to parity. The change is primarily a code simplification (−45 lines, −51% escape bundle size).Bundle size (esbuild --minify --tree-shaking, esm)
escapeUTF8encodeHTML🤖 Generated with Claude Opus (Claude Code)
Summary by CodeRabbit
Refactor
Style