Skip to content

refactor(escape): use regex replace, remove Node < 4 polyfill#2197

Open
fb55 wants to merge 2 commits intomainfrom
escape-perf
Open

refactor(escape): use regex replace, remove Node < 4 polyfill#2197
fb55 wants to merge 2 commits intomainfrom
escape-perf

Conversation

@fb55
Copy link
Copy Markdown
Owner

@fb55 fb55 commented Mar 29, 2026

Summary

  • Replaces the getEscaper factory pattern (manual regex.exec loop) with direct String.prototype.replace calls using a shared escapeReplacer switch function for escapeUTF8, escapeAttribute, and escapeText
  • Removes the getCodePoint Node < 4 polyfill — uses native String.prototype.codePointAt directly
  • Updates encodeXML and encodeHTMLTrieRe (in encode.ts) to call codePointAt instead of the removed getCodePoint wrapper

Breaking changes

  • getCodePoint is no longer exported from escape.ts. Any downstream code importing this helper will break. This was an internal/undocumented export.
  • The escapeUTF8, escapeAttribute, and escapeText exports change from const arrow functions to named function declarations. This should not affect normal usage, but code relying on .name or identity checks may notice.
  • Drops the Node < 4 codePointAt polyfill — Node < 4 is no longer supported.

Benchmark results

Geometric mean across 12 text variants (short/medium/long × low/high entity density × named/numeric), via html-entity-benchmarks compare:

Geo mean (ops/s)
entities v8 (published) 2.23M
escape-perf 1.81M
Δ −19%

escapeUTF8 regresses on short inputs where the old getEscaper bitset loop was faster than String.prototype.replace. Longer inputs are closer to parity. The change is primarily a code simplification (−45 lines, −51% escape bundle size).

Bundle size (esbuild --minify --tree-shaking, esm)

Export Before After Δ
escapeUTF8 483 B (319 gz) 238 B (162 gz) −51% (−49% gz)
encodeHTML 14,455 B (6,242 gz) 14,273 B (6,153 gz) −1%
Full bundle 51,718 B (28,091 gz) 51,483 B (27,969 gz) −0.5%

🤖 Generated with Claude Opus (Claude Code)

Summary by CodeRabbit

  • Refactor

    • Simplified internal implementation of encoding and escaping utilities by removing intermediate helper functions and consolidating escape logic into dedicated function implementations.
  • Style

    • Adjusted formatting in test type annotations.

Copilot AI review requested due to automatic review settings March 29, 2026 11:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors escaping/encoding internals by simplifying escape routines to use String.prototype.replace with a shared replacer callback and by removing legacy Node < 4 codePointAt compatibility code, aligning with the package’s current Node engine requirements.

Changes:

  • Replace the getEscaper factory + manual regex.exec loop with replace(..., escapeReplacer) for escapeUTF8, escapeAttribute, and escapeText.
  • Remove the exported getCodePoint polyfill and switch callers to native String.prototype.codePointAt.
  • Update encodeXML and encodeHTMLTrieRe to use codePointAt directly.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/escape.ts Removes getCodePoint polyfill/export; refactors escape helpers to regex replace with shared escapeReplacer.
src/encode.ts Drops getCodePoint import and uses codePointAt for numeric entity emission.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fb55
Copy link
Copy Markdown
Owner Author

fb55 commented Mar 30, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

The pull request refactors character encoding functions throughout the module. It removes the getCodePoint helper function and replaces a generic escaper factory with a shared callback-based approach. Public functions are converted from const assignments to function declarations, and native codePointAt() API is used directly.

Changes

Cohort / File(s) Summary
Test Formatting
src/decode.spec.ts
Type annotation reformatted across multiple lines for improved readability; no runtime logic changes.
Character Encoding Refactoring
src/encode.ts, src/escape.ts
Removed getCodePoint helper; replaced generic getEscaper factory with dedicated regex patterns and shared escapeReplacer callback. Public functions converted from const to function declarations. Native input.codePointAt(index) used directly instead of helper function.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Hopping through the escape routes,
Code points now hop on their own,
No helper needed, just native speed,
Functions declared, not assigned—
A cleaner warren for all to see!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: refactoring escape functions to use regex replace instead of a factory pattern, and removing the Node < 4 polyfill.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch escape-perf

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/escape.ts`:
- Around line 110-134: Add a fast no-match guard to avoid the replace(callback)
overhead for short/unchanged strings: in escapeUTF8, escapeAttribute, and
escapeText, check quickly for the presence of any characters that would require
escaping using simple string.indexOf/lookups (not regex.test on a global regex)
and return the original input early if none found, then fall back to the
existing return data.replace(..., escapeReplacer). For escapeAttribute check for
'"' '&' and '\u00A0'; for escapeText check for '&' '<' '>' and '\u00A0'; for
escapeUTF8 use the minimal set of characters that xmlEscapeRegex would match.
Ensure you do not call regex.test on a /g regex to avoid stateful behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2547eb3b-fb6c-4cf8-9e71-0815353d400e

📥 Commits

Reviewing files that changed from the base of the PR and between 1dad19d and 4831be9.

📒 Files selected for processing (3)
  • src/decode.spec.ts
  • src/encode.ts
  • src/escape.ts

Comment on lines +110 to +134
export function escapeUTF8(data: string): string {
return data.replace(xmlEscapeRegex, escapeReplacer);
}

const attributeEscapeRegex = /["&\u00A0]/g;

/**
* Encodes all characters that have to be escaped in HTML attributes,
* following {@link https://html.spec.whatwg.org/multipage/parsing.html#escapingString}.
* @param data String to escape.
*/
export const escapeAttribute: (data: string) => string =
/* #__PURE__ */ getEscaper(
/["&\u00A0]/g,
new Map([
[34, "&quot;"],
[38, "&amp;"],
[160, "&nbsp;"],
]),
);
export function escapeAttribute(data: string): string {
return data.replace(attributeEscapeRegex, escapeReplacer);
}

const textEscapeRegex = /[&<>\u00A0]/g;

/**
* Encodes all characters that have to be escaped in HTML text,
* following {@link https://html.spec.whatwg.org/multipage/parsing.html#escapingString}.
* @param data String to escape.
*/
export const escapeText: (data: string) => string = /* #__PURE__ */ getEscaper(
/[&<>\u00A0]/g,
new Map([
[38, "&amp;"],
[60, "&lt;"],
[62, "&gt;"],
[160, "&nbsp;"],
]),
);
export function escapeText(data: string): string {
return data.replace(textEscapeRegex, escapeReplacer);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Short-input performance regression in new replace-based hot paths

The PR benchmark shows a measurable throughput drop (~19% geometric mean). The new callback replace path is likely the source for short strings in escapeUTF8, escapeAttribute, and escapeText.

⚡ Proposed mitigation (fast no-match guard)
 const xmlEscapeRegex = /["&'<>]/g;
+const xmlEscapeCheckRegex = /["&'<>]/;
@@
 export function escapeUTF8(data: string): string {
-    return data.replace(xmlEscapeRegex, escapeReplacer);
+    return xmlEscapeCheckRegex.test(data)
+        ? data.replace(xmlEscapeRegex, escapeReplacer)
+        : data;
 }
 
 const attributeEscapeRegex = /["&\u00A0]/g;
+const attributeEscapeCheckRegex = /["&\u00A0]/;
@@
 export function escapeAttribute(data: string): string {
-    return data.replace(attributeEscapeRegex, escapeReplacer);
+    return attributeEscapeCheckRegex.test(data)
+        ? data.replace(attributeEscapeRegex, escapeReplacer)
+        : data;
 }
 
 const textEscapeRegex = /[&<>\u00A0]/g;
+const textEscapeCheckRegex = /[&<>\u00A0]/;
@@
 export function escapeText(data: string): string {
-    return data.replace(textEscapeRegex, escapeReplacer);
+    return textEscapeCheckRegex.test(data)
+        ? data.replace(textEscapeRegex, escapeReplacer)
+        : data;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/escape.ts` around lines 110 - 134, Add a fast no-match guard to avoid the
replace(callback) overhead for short/unchanged strings: in escapeUTF8,
escapeAttribute, and escapeText, check quickly for the presence of any
characters that would require escaping using simple string.indexOf/lookups (not
regex.test on a global regex) and return the original input early if none found,
then fall back to the existing return data.replace(..., escapeReplacer). For
escapeAttribute check for '"' '&' and '\u00A0'; for escapeText check for '&' '<'
'>' and '\u00A0'; for escapeUTF8 use the minimal set of characters that
xmlEscapeRegex would match. Ensure you do not call regex.test on a /g regex to
avoid stateful behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants