6.1. The HTML Document Character Set
Connected: An Internet Encyclopedia
6.1. The HTML Document Character Set
Up:
Connected: An Internet Encyclopedia
Up:
Requests For Comments
Up:
RFC 1866
Up:
6. Characters, Words, and Paragraphs
Prev: 6. Characters, Words, and Paragraphs
Next: 7. Hyperlinks
6.1. The HTML Document Character Set
6.1. The HTML Document Character Set
The document character set specified in 9.5, "SGML Declaration for
HTML" must be supported by HTML user agents. It includes the graphic
characters of Latin Alphabet No. 1, or simply Latin-1. Latin-1
comprises 191 graphic characters, including the alphabets of most
Western European languages.
NOTE - Use of the non-breaking space and soft hyphen indicator
characters is discouraged because support for them is not widely
deployed.
NOTE - To support non-western writing systems, a larger character
repertoire will be specified in a future version of HTML. The
document character set will be [ISO-10646], or some subset that
agrees with [ISO-10646]; in particular, all numeric character
references must use code positions assigned by [ISO-10646].
In SGML applications, the use of control characters is limited in
order to maximize the chance of successful interchange over
heterogeneous networks and operating systems. In the HTML document
character set only three control characters are allowed: Horizontal
Tab, Carriage Return, and Line Feed (code positions 9, 13, and 10).
The HTML DTD references the Added Latin 1 entity set, to allow
mnemonic representation of selected Latin 1 characters using only the
widely supported ASCII character repertoire. For example:
Kurt Gödel was a famous logician and mathematician.
See 9.7.2, "ISO Latin 1 Character Entity Set" for a table of the
"Added Latin 1" entities, and 13, "The HTML Coded Character Set" for
a table of the code positions of [ISO 8859-1] and the control
characters in the HTML document character set.
Next: 7. Hyperlinks
Connected: An Internet Encyclopedia
6.1. The HTML Document Character Set
|