ECMAScript® 2024 Language Specification

Draft ECMA-262 / February 15, 2024

12.1 Unicode Format-Control Characters

The Unicode format-control characters (i.e., the characters in category “Cf” in the Unicode Character Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).

It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals, template literals, and regular expression literals.

U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) are format-control characters that are used to make necessary distinctions when forming words or phrases in certain languages. In ECMAScript source text these code points may also be used in an IdentifierName after the first character.

U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text <ZWNBSP> code points are treated as white space characters (see 12.2).

The special treatment of certain format-control characters outside of comments, string literals, and regular expression literals is summarized in Table 35.

Table 35: Format-Control Code Point Usage
Code Point Name Abbreviation Usage
U+200C ZERO WIDTH NON-JOINER <ZWNJ> IdentifierPart
U+200D ZERO WIDTH JOINER <ZWJ> IdentifierPart
U+FEFF ZERO WIDTH NO-BREAK SPACE <ZWNBSP> WhiteSpace