3.2.1. Data Characters

3.2.1. Data Characters

Any sequence of characters that do not constitute markup (see 9.6 "Delimiter Recognition" of [SGML]) are mapped directly to strings of data characters. Some markup also maps to data character strings. Numeric character references map to single-character strings, via the document character set. Each reference to one of the general entities defined in the HTML DTD maps to a single-character string.

For example,

    abc&lt;def    => "abc","<","def"
    abc&#60;def   => "abc","<","def"

The terminating semicolon on entity or numeric character references is only necessary when the character following the reference would otherwise be recognized as part of the name (see 9.4.5 "Reference End" in [SGML]).

    abc &lt def     => "abc ","<"," def"
    abc &#60 def    => "abc ","<"," def"

An ampersand is only recognized as markup when it is followed by a letter or a `#' and a digit:

    abc & lt def    => "abc & lt def"
    abc &# 60 def    => "abc &# 60 def"

A useful technique for translating plain text to HTML is to replace each '<', '&', and '>' by an entity reference or numeric character reference as follows:

                     ENTITY      NUMERIC
           --------- ----------  -----------  ---------------------
             &       &amp;       &#38;        Ampersand
             <       &lt;        &#60;        Less than
             >       &gt;        &#62;        Greater than

    NOTE - There are SGML mechanisms, CDATA and RCDATA declared content, that allow most `<', `>', and `&' characters to be entered without the use of entity references. Because these mechanisms tend to be used and implemented inconsistently, and because they conflict with techniques for reducing HTML to 7 bit ASCII for transport, they are deprecated in this version of HTML. See, "Example and Listing: XMP, LISTING".

