12.1 Unicode Format-Control Characters
The Unicode format-control characters (i.e., the characters in category “Cf” in the Unicode Character Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).
It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals, template literals, and regular expression literals.
U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) are format-control characters that are used to make necessary distinctions when forming words or phrases in certain languages. In
U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In
The special treatment of certain format-control characters outside of comments, string literals, and regular expression literals is summarized in
Code Point | Name | Abbreviation | Usage |
---|---|---|---|
U+200C
|
ZERO WIDTH NON-JOINER | <ZWNJ> |
|
U+200D
|
ZERO WIDTH JOINER | <ZWJ> |
|
U+FEFF
|
ZERO WIDTH NO-BREAK SPACE | <ZWNBSP> |
|