From a50d2742544241ed49d8d95cafe86f0361516ccf Mon Sep 17 00:00:00 2001
From: Michael Dyck <jmdyck@ibiblio.org>
Date: Mon, 27 Sep 2021 18:34:54 -0700
Subject: [PATCH 1/3] Editorial: Introduce `IdentifierStartChar` +
 `IdentifierPartChar` (#2392)

Extract `IdentifierStartChar` from `IdentifierStart` and `RegExpIdentifierStart`.
Extract `IdentifierPartChar` from `IdentifierPart` and `RegExpIdentifierPart`.

This has 3 benefits:

- We eliminate some repetition between the productions for
  Identifiers and RegExpIdentifiers.

- We can simplify 4 Early Error rules involving escape sequences,
  because the constraint can now be expressed in terms of a single nonterminal,
  rather than a nonterminal plus some terminals.

- We can eliminate the Early Error rule for `RegularExpressionFlags`
  by instead expressing its constraint in the grammar:
  in the production for `RegularExpressionFlags`,
  replace `IdentifierPart` with `IdentifierPartChar`.

(As a consequence of the last point, this commit undefines the following id:
sec-literals-regular-expression-literals-static-semantics-early-errors
There didn't seem to be a sensible place to relocate it as an oldid.)
---
 spec.html | 43 ++++++++++++++++++-------------------------
 1 file changed, 18 insertions(+), 25 deletions(-)
diff --git a/spec.html b/spec.html
index 43f39fe00a..19a030601d 100644
--- a/spec.html
+++ b/spec.html
@@ -16175,15 +16175,21 @@ <h2>Syntax</h2>
         IdentifierName IdentifierPart
 
       IdentifierStart ::
+        IdentifierStartChar
+        `\` UnicodeEscapeSequence
+
+      IdentifierPart ::
+        IdentifierPartChar
+        `\` UnicodeEscapeSequence
+
+      IdentifierStartChar ::
         UnicodeIDStart
         `$`
         `_`
-        `\` UnicodeEscapeSequence
 
-      IdentifierPart ::
+      IdentifierPartChar ::
         UnicodeIDContinue
         `$`
-        `\` UnicodeEscapeSequence
         &lt;ZWNJ&gt;
         &lt;ZWJ&gt;
 
@@ -16209,13 +16215,13 @@ <h1>Static Semantics: Early Errors</h1>
         <emu-grammar>IdentifierStart :: `\` UnicodeEscapeSequence</emu-grammar>
         <ul>
           <li>
-            It is a Syntax Error if the SV of |UnicodeEscapeSequence| is none of *"$"*, or *"_"*, or ! UTF16EncodeCodePoint(_cp_) for some Unicode code point _cp_ matched by the |UnicodeIDStart| lexical grammar production.
+            It is a Syntax Error if the SV of |UnicodeEscapeSequence| is not ! UTF16EncodeCodePoint(_cp_) for some Unicode code point _cp_ matched by the |IdentifierStartChar| lexical grammar production.
           </li>
         </ul>
         <emu-grammar>IdentifierPart :: `\` UnicodeEscapeSequence</emu-grammar>
         <ul>
           <li>
-            It is a Syntax Error if the SV of |UnicodeEscapeSequence| is none of *"$"*, *"_"*, ! UTF16EncodeCodePoint(&lt;ZWNJ&gt;), ! UTF16EncodeCodePoint(&lt;ZWJ&gt;), or ! UTF16EncodeCodePoint(_cp_) for some Unicode code point _cp_ that would be matched by the |UnicodeIDContinue| lexical grammar production.
+            It is a Syntax Error if the SV of |UnicodeEscapeSequence| is not ! UTF16EncodeCodePoint(_cp_) for some Unicode code point _cp_ matched by the |IdentifierPartChar| lexical grammar production.
           </li>
         </ul>
       </emu-clause>
@@ -17057,22 +17063,12 @@ <h2>Syntax</h2>
 
         RegularExpressionFlags ::
           [empty]
-          RegularExpressionFlags IdentifierPart
+          RegularExpressionFlags IdentifierPartChar
       </emu-grammar>
       <emu-note>
         <p>Regular expression literals may not be empty; instead of representing an empty regular expression literal, the code unit sequence `//` starts a single-line comment. To specify an empty regular expression, use: `/(?:)/`.</p>
       </emu-note>
 
-      <emu-clause id="sec-literals-regular-expression-literals-static-semantics-early-errors">
-        <h1>Static Semantics: Early Errors</h1>
-        <emu-grammar>RegularExpressionFlags :: RegularExpressionFlags IdentifierPart</emu-grammar>
-        <ul>
-          <li>
-            It is a Syntax Error if |IdentifierPart| contains a Unicode escape sequence.
-          </li>
-        </ul>
-      </emu-clause>
-
       <emu-clause id="sec-static-semantics-bodytext" type="sdo">
         <h1>Static Semantics: BodyText</h1>
         <dl class="header">
@@ -34244,19 +34240,14 @@ <h2>Syntax</h2>
           RegExpIdentifierName[?UnicodeMode] RegExpIdentifierPart[?UnicodeMode]
 
         RegExpIdentifierStart[UnicodeMode] ::
-          UnicodeIDStart
-          `$`
-          `_`
+          IdentifierStartChar
           `\` RegExpUnicodeEscapeSequence[+UnicodeMode]
           [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate
 
         RegExpIdentifierPart[UnicodeMode] ::
-          UnicodeIDContinue
-          `$`
+          IdentifierPartChar
           `\` RegExpUnicodeEscapeSequence[+UnicodeMode]
           [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate
-          &lt;ZWNJ&gt;
-          &lt;ZWJ&gt;
 
         RegExpUnicodeEscapeSequence[UnicodeMode] ::
           [+UnicodeMode] `u` HexLeadSurrogate `\u` HexTrailSurrogate
@@ -34418,7 +34409,7 @@ <h1>Static Semantics: Early Errors</h1>
         <emu-grammar>RegExpIdentifierStart :: `\` RegExpUnicodeEscapeSequence</emu-grammar>
         <ul>
           <li>
-            It is a Syntax Error if the CharacterValue of |RegExpUnicodeEscapeSequence| is not the code point value of *"$"*, *"_"*, or some code point matched by the |UnicodeIDStart| lexical grammar production.
+            It is a Syntax Error if the CharacterValue of |RegExpUnicodeEscapeSequence| is not the code point value of some code point matched by the |IdentifierStartChar| lexical grammar production.
           </li>
         </ul>
         <emu-grammar>RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate</emu-grammar>
@@ -34430,7 +34421,7 @@ <h1>Static Semantics: Early Errors</h1>
         <emu-grammar>RegExpIdentifierPart :: `\` RegExpUnicodeEscapeSequence</emu-grammar>
         <ul>
           <li>
-            It is a Syntax Error if the CharacterValue of |RegExpUnicodeEscapeSequence| is not the code point value of *"$"*, *"_"*, &lt;ZWNJ&gt;, &lt;ZWJ&gt;, or some code point matched by the |UnicodeIDContinue| lexical grammar production.
+            It is a Syntax Error if the CharacterValue of |RegExpUnicodeEscapeSequence| is not the code point value of some code point matched by the |IdentifierPartChar| lexical grammar production.
           </li>
         </ul>
         <emu-grammar>RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate</emu-grammar>
@@ -46020,6 +46011,8 @@ <h1>Lexical Grammar</h1>
     <emu-prodref name="IdentifierName"></emu-prodref>
     <emu-prodref name="IdentifierStart"></emu-prodref>
     <emu-prodref name="IdentifierPart"></emu-prodref>
+    <emu-prodref name="IdentifierStartChar"></emu-prodref>
+    <emu-prodref name="IdentifierPartChar"></emu-prodref>
     <emu-prodref name="UnicodeIDStart"></emu-prodref>
     <emu-prodref name="UnicodeIDContinue"></emu-prodref>
     <emu-prodref name="ReservedWord"></emu-prodref>

From 6d2ba3cdf5b28686fa68d9404df351eed8f8566b Mon Sep 17 00:00:00 2001
From: Michael Dyck <jmdyck@ibiblio.org>
Date: Mon, 27 Sep 2021 18:35:01 -0700
Subject: [PATCH 2/3] Editorial: Introduce [RegExp]IdentifierCodePoint[s] SDOs
 (#2392)

This commit introduces SDOs `IdentifierCodePoints` and `IdentifierCodePoint`.

- This allows `StringValue` of _IdentifierName_ to be specified more precisely.

- It also simplifies two Early Error rules (involving _UnicodeEscapeSequence_),
  since they can now be expressed as constraints on a code point, rather than
  having to be translated into the space of String values.

----

Similarly, this commit introduces SDOs `RegExpIdentifierCodePoints` and
`RegExpIdentifierCodePoint`.

- This allows `CapturingGroupName` of _RegExpIdentifierName_ to be specified
  more precisely.

- It also simplifies two Early Error rules (involving surrogate pairs).

(Note that the current algorithm for `CapturingGroupName` only 'normalizes'
escape sequences, whereas this PR's algorithm also normalizes surrogate pairs.
However, since the normalized text is immediately passed to `CodePointsToString`,
the result should be the same. Given the Early Error rules for surrogate pairs,
normalizing them made sense to me.)
---
 spec.html | 102 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 94 insertions(+), 8 deletions(-)

diff --git a/spec.html b/spec.html
index 19a030601d..ce117bf605 100644
--- a/spec.html
+++ b/spec.html
@@ -16215,16 +16215,55 @@ <h1>Static Semantics: Early Errors</h1>
         <emu-grammar>IdentifierStart :: `\` UnicodeEscapeSequence</emu-grammar>
         <ul>
           <li>
-            It is a Syntax Error if the SV of |UnicodeEscapeSequence| is not ! UTF16EncodeCodePoint(_cp_) for some Unicode code point _cp_ matched by the |IdentifierStartChar| lexical grammar production.
+            It is a Syntax Error if IdentifierCodePoint of |UnicodeEscapeSequence| is not some Unicode code point matched by the |IdentifierStartChar| lexical grammar production.
           </li>
         </ul>
         <emu-grammar>IdentifierPart :: `\` UnicodeEscapeSequence</emu-grammar>
         <ul>
           <li>
-            It is a Syntax Error if the SV of |UnicodeEscapeSequence| is not ! UTF16EncodeCodePoint(_cp_) for some Unicode code point _cp_ matched by the |IdentifierPartChar| lexical grammar production.
+            It is a Syntax Error if IdentifierCodePoint of |UnicodeEscapeSequence| is not some Unicode code point matched by the |IdentifierPartChar| lexical grammar production.
           </li>
         </ul>
       </emu-clause>
+
+      <emu-clause id="sec-identifiercodepoints" type="sdo">
+        <h1>Static Semantics: IdentifierCodePoints</h1>
+        <dl class="header">
+        </dl>
+        <emu-grammar>IdentifierName :: IdentifierStart</emu-grammar>
+        <emu-alg>
+          1. Let _cp_ be IdentifierCodePoint of |IdentifierStart|.
+          1. Return &laquo; _cp_ &raquo;.
+        </emu-alg>
+        <emu-grammar>IdentifierName :: IdentifierName IdentifierPart</emu-grammar>
+        <emu-alg>
+          1. Let _cps_ be IdentifierCodePoints of the derived |IdentifierName|.
+          1. Let _cp_ be IdentifierCodePoint of |IdentifierPart|.
+          1. Return the list-concatenation of _cps_ and &laquo; _cp_ &raquo;.
+        </emu-alg>
+      </emu-clause>
+
+      <emu-clause id="sec-identifiercodepoint" type="sdo">
+        <h1>Static Semantics: IdentifierCodePoint</h1>
+        <dl class="header">
+        </dl>
+        <emu-grammar>IdentifierStart :: IdentifierStartChar</emu-grammar>
+        <emu-alg>
+          1. Return the code point matched by |IdentifierStartChar|.
+        </emu-alg>
+        <emu-grammar>IdentifierPart :: IdentifierPartChar</emu-grammar>
+        <emu-alg>
+          1. Return the code point matched by |IdentifierPartChar|.
+        </emu-alg>
+        <emu-grammar>UnicodeEscapeSequence :: `u` Hex4Digits</emu-grammar>
+        <emu-alg>
+          1. Return the code point whose numeric value is the MV of |Hex4Digits|.
+        </emu-alg>
+        <emu-grammar>UnicodeEscapeSequence :: `u{` CodePoint `}`</emu-grammar>
+        <emu-alg>
+          1. Return the code point whose numeric value is the MV of |CodePoint|.
+        </emu-alg>
+      </emu-clause>
     </emu-clause>
 
     <emu-clause id="sec-keywords-and-reserved-words" oldids="sec-reserved-words,sec-keywords,sec-future-reserved-words">
@@ -17672,8 +17711,7 @@ <h1>Static Semantics: StringValue</h1>
           IdentifierName IdentifierPart
       </emu-grammar>
       <emu-alg>
-        1. Let _idText_ be the source text matched by |IdentifierName|.
-        1. Let _idTextUnescaped_ be the result of replacing any occurrences of `\\` |UnicodeEscapeSequence| in _idText_ with the code point represented by the |UnicodeEscapeSequence|.
+        1. Let _idTextUnescaped_ be IdentifierCodePoints of |IdentifierName|.
         1. Return ! CodePointsToString(_idTextUnescaped_).
       </emu-alg>
       <emu-grammar>
@@ -34415,7 +34453,7 @@ <h1>Static Semantics: Early Errors</h1>
         <emu-grammar>RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate</emu-grammar>
         <ul>
           <li>
-            It is a Syntax Error if the result of performing UTF16SurrogatePairToCodePoint on the two code points matched by |UnicodeLeadSurrogate| and |UnicodeTrailSurrogate| respectively is not matched by the |UnicodeIDStart| lexical grammar production.
+            It is a Syntax Error if RegExpIdentifierCodePoint of |RegExpIdentifierStart| is not matched by the |UnicodeIDStart| lexical grammar production.
           </li>
         </ul>
         <emu-grammar>RegExpIdentifierPart :: `\` RegExpUnicodeEscapeSequence</emu-grammar>
@@ -34427,7 +34465,7 @@ <h1>Static Semantics: Early Errors</h1>
         <emu-grammar>RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate</emu-grammar>
         <ul>
           <li>
-            It is a Syntax Error if the result of performing UTF16SurrogatePairToCodePoint on the two code points matched by |UnicodeLeadSurrogate| and |UnicodeTrailSurrogate| respectively is not matched by the |UnicodeIDContinue| lexical grammar production.
+            It is a Syntax Error if RegExpIdentifierCodePoint of |RegExpIdentifierPart| is not matched by the |UnicodeIDContinue| lexical grammar production.
           </li>
         </ul>
         <emu-grammar>UnicodePropertyValueExpression :: UnicodePropertyName `=` UnicodePropertyValue</emu-grammar>
@@ -34710,11 +34748,59 @@ <h1>Static Semantics: CapturingGroupName</h1>
             RegExpIdentifierName RegExpIdentifierPart
         </emu-grammar>
         <emu-alg>
-          1. Let _idText_ be the source text matched by |RegExpIdentifierName|.
-          1. Let _idTextUnescaped_ be the result of replacing any occurrences of `\\` |RegExpUnicodeEscapeSequence| in _idText_ with the code point represented by the |RegExpUnicodeEscapeSequence|.
+          1. Let _idTextUnescaped_ be RegExpIdentifierCodePoints of |RegExpIdentifierName|.
           1. Return ! CodePointsToString(_idTextUnescaped_).
         </emu-alg>
       </emu-clause>
+
+      <emu-clause id="sec-regexpidentifiercodepoints" type="sdo">
+        <h1>Static Semantics: RegExpIdentifierCodePoints</h1>
+        <dl class="header">
+        </dl>
+        <emu-grammar>RegExpIdentifierName :: RegExpIdentifierStart</emu-grammar>
+        <emu-alg>
+          1. Let _cp_ be RegExpIdentifierCodePoint of |RegExpIdentifierStart|.
+          1. Return &laquo; _cp_ &raquo;.
+        </emu-alg>
+        <emu-grammar>RegExpIdentifierName :: RegExpIdentifierName RegExpIdentifierPart</emu-grammar>
+        <emu-alg>
+          1. Let _cps_ be RegExpIdentifierCodePoints of the derived |RegExpIdentifierName|.
+          1. Let _cp_ be RegExpIdentifierCodePoint of |RegExpIdentifierPart|.
+          1. Return the list-concatenation of _cps_ and &laquo; _cp_ &raquo;.
+        </emu-alg>
+      </emu-clause>
+
+      <emu-clause id="sec-regexpidentifiercodepoint" type="sdo">
+        <h1>Static Semantics: RegExpIdentifierCodePoint</h1>
+        <dl class="header">
+        </dl>
+        <emu-grammar>RegExpIdentifierStart :: IdentifierStartChar</emu-grammar>
+        <emu-alg>
+          1. Return the code point matched by |IdentifierStartChar|.
+        </emu-alg>
+        <emu-grammar>RegExpIdentifierPart :: IdentifierPartChar</emu-grammar>
+        <emu-alg>
+          1. Return the code point matched by |IdentifierPartChar|.
+        </emu-alg>
+        <emu-grammar>
+          RegExpIdentifierStart :: `\` RegExpUnicodeEscapeSequence
+
+          RegExpIdentifierPart :: `\` RegExpUnicodeEscapeSequence
+        </emu-grammar>
+        <emu-alg>
+          1. Return the code point whose numeric value is the CharacterValue of |RegExpUnicodeEscapeSequence|.
+        </emu-alg>
+        <emu-grammar>
+          RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate
+
+          RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate
+        </emu-grammar>
+        <emu-alg>
+          1. Let _lead_ be the code unit whose numeric value is that of the code point matched by |UnicodeLeadSurrogate|.
+          1. Let _trail_ be the code unit whose numeric value is that of the code point matched by |UnicodeTrailSurrogate|.
+          1. Return UTF16SurrogatePairToCodePoint(_lead_, _trail_).
+        </emu-alg>
+      </emu-clause>
     </emu-clause>
 
     <emu-clause id="sec-pattern-semantics">

From 1901514ff9aaf2041fc89f9007848910db5bac9e Mon Sep 17 00:00:00 2001
From: Michael Dyck <jmdyck@ibiblio.org>
Date: Mon, 27 Sep 2021 18:35:05 -0700
Subject: [PATCH 3/3] Editorial: Move 2 paragraphs down one level (#2392)

... from 12.6 Names and Keywords
down to 12.6.1 Identifier Names.

I think this makes it clearer that this prose
is mostly saying the same thing as the associated
Early Error rules and SDOs.
---
 spec.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/spec.html b/spec.html
index ce117bf605..c02bcd844f 100644
--- a/spec.html
+++ b/spec.html
@@ -16163,8 +16163,6 @@ <h1>Names and Keywords</h1>
     <emu-note>
       <p>This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an |IdentifierName|, and the code points U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) are permitted anywhere after the first code point of an |IdentifierName|.</p>
     </emu-note>
-    <p>Unicode escape sequences are permitted in an |IdentifierName|, where they contribute a single Unicode code point to the |IdentifierName|. The code point is expressed by the |CodePoint| of the |UnicodeEscapeSequence| (see <emu-xref href="#sec-literals-string-literals"></emu-xref>). The `\\` preceding the |UnicodeEscapeSequence| and the `u` and `{ }` code units, if they appear, do not contribute code points to the |IdentifierName|. A |UnicodeEscapeSequence| cannot be used to put a code point into an |IdentifierName| that would otherwise be illegal. In other words, if a `\\` |UnicodeEscapeSequence| sequence were replaced by the |SourceCharacter| it contributes, the result must still be a valid |IdentifierName| that has the exact same sequence of |SourceCharacter| elements as the original |IdentifierName|. All interpretations of |IdentifierName| within this specification are based upon their actual code points regardless of whether or not an escape sequence was used to contribute any particular code point.</p>
-    <p>Two |IdentifierName|s that are canonically equivalent according to the Unicode standard are <em>not</em> equal unless, after replacement of each |UnicodeEscapeSequence|, they are represented by the exact same sequence of code points.</p>
     <h2>Syntax</h2>
     <emu-grammar type="definition">
       PrivateIdentifier ::
@@ -16209,6 +16207,8 @@ <h2>Syntax</h2>
 
     <emu-clause id="sec-identifier-names">
       <h1>Identifier Names</h1>
+      <p>Unicode escape sequences are permitted in an |IdentifierName|, where they contribute a single Unicode code point to the |IdentifierName|. The code point is expressed by the |CodePoint| of the |UnicodeEscapeSequence| (see <emu-xref href="#sec-literals-string-literals"></emu-xref>). The `\\` preceding the |UnicodeEscapeSequence| and the `u` and `{ }` code units, if they appear, do not contribute code points to the |IdentifierName|. A |UnicodeEscapeSequence| cannot be used to put a code point into an |IdentifierName| that would otherwise be illegal. In other words, if a `\\` |UnicodeEscapeSequence| sequence were replaced by the |SourceCharacter| it contributes, the result must still be a valid |IdentifierName| that has the exact same sequence of |SourceCharacter| elements as the original |IdentifierName|. All interpretations of |IdentifierName| within this specification are based upon their actual code points regardless of whether or not an escape sequence was used to contribute any particular code point.</p>
+      <p>Two |IdentifierName|s that are canonically equivalent according to the Unicode standard are <em>not</em> equal unless, after replacement of each |UnicodeEscapeSequence|, they are represented by the exact same sequence of code points.</p>
 
       <emu-clause id="sec-identifier-names-static-semantics-early-errors">
         <h1>Static Semantics: Early Errors</h1>