3.6.1 Canonicalization and Text Defaults
Connected: An Internet Encyclopedia
3.6.1 Canonicalization and Text Defaults
Up:
Connected: An Internet Encyclopedia
Up:
Requests For Comments
Up:
RFC 1945
Up:
3. Protocol Parameters
Up:
3.6 Media Types
Prev: 3.6 Media Types
Next: 3.6.2 Multipart Types
3.6.1 Canonicalization and Text Defaults
3.6.1 Canonicalization and Text Defaults
Internet media types are registered with a canonical form. In
general, an Entity-Body transferred via HTTP must be represented in
the appropriate canonical form prior to its transmission. If the body
has been encoded with a Content-Encoding, the underlying data should
be in canonical form prior to being encoded.
Media subtypes of the "text" type use CRLF as the text line break
when in canonical form. However, HTTP allows the transport of text
media with plain CR or LF alone representing a line break when used
consistently within the Entity-Body. HTTP applications must accept
CRLF, bare CR, and bare LF as being representative of a line break in
text media received via HTTP.
In addition, if the text media is represented in a character set that
does not use octets 13 and 10 for CR and LF respectively, as is the
case for some multi-byte character sets, HTTP allows the use of
whatever octet sequences are defined by that character set to
represent the equivalent of CR and LF for line breaks. This
flexibility regarding line breaks applies only to text media in the
Entity-Body; a bare CR or LF should not be substituted for CRLF
within any of the HTTP control structures (such as header fields and
multipart boundaries).
The "charset" parameter is used with some media types to define the
character set (Section 3.4) of the data. When no explicit charset
parameter is provided by the sender, media subtypes of the "text"
type are defined to have a default charset value of "ISO-8859-1" when
received via HTTP. Data in character sets other than "ISO-8859-1" or
its subsets must be labelled with an appropriate charset value in
order to be consistently interpreted by the recipient.
Note: Many current HTTP servers provide data using charsets other
than "ISO-8859-1" without proper labelling. This situation reduces
interoperability and is not recommended. To compensate for this,
some HTTP user agents provide a configuration option to allow the
user to change the default interpretation of the media type
character set when no charset parameter is given.
Next: 3.6.2 Multipart Types
Connected: An Internet Encyclopedia
3.6.1 Canonicalization and Text Defaults
|