4.1.3. Encoding reserved characters
Connected: An Internet Encyclopedia
4.1.3. Encoding reserved characters
Up:
Connected: An Internet Encyclopedia
Up:
Requests For Comments
Up:
RFC 1630
Up:
4. Recommendations
Up:
4.1. URI syntax
Prev: 4.1.2. Unsafe characters
Next: 4.1.4. Partial (relative) form
4.1.3. Encoding reserved characters
4.1.3. Encoding reserved characters
When a system uses a local addressing scheme, it is useful to provide
a mapping from local addresses into URIs so that references to
objects within the addressing scheme may be referred to globally, and
possibly accessed through gateway servers.
For a new naming scheme, any mapping scheme may be defined provided
it is unambiguous, reversible, and provides valid URIs. It is
recommended that where hierarchical aspects to the local naming
scheme exist, they be mapped onto the hierarchical URL path syntax in
order to allow the partial form to be used.
It is also recommended that the conventional scheme below be used in
all cases except for any scheme which encodes binary data as opposed
to text, in which case a more compact encoding such as pure
hexadecimal or base 64 might be more appropriate. For example, the
conventional URI encoding method is used for mapping WAIS, FTP,
Prospero and Gopher addresses in the URI specification.
- CONVENTIONAL URI ENCODING SCHEME
-
Where the local naming scheme uses ASCII characters which are not
allowed in the URI, these may be represented in the URL by a
percent sign "%" immediately followed by two hexadecimal digits
(0-9, A-F) giving the ISO Latin 1 code for that character.
Character codes other than those allowed by the syntax shall not
be used unencoded in a URI.
- REDUCED OR INCREASED SAFE CHARACTER SETS
-
The same encoding method may be used for encoding characters whose
use, although technically allowed in a URI, would be unwise due to
problems of corruption by imperfect gateways or misrepresentation
due to the use of variant character sets, or which would simply be
awkward in a given environment. Because a % sign always indicates
an encoded character, a URI may be made "safer" simply by encoding
any characters considered unsafe, while leaving already encoded
characters still encoded. Similarly, in cases where a larger set
of characters is acceptable, % signs can be selectively and
reversibly expanded.
Before two URIs can be compared, it is therefore necessary to
bring them to the same encoding level.
However, the reserved characters mentioned above have a quite
different significance when encoded, and so may NEVER be encoded
and unencoded in this way.
The percent sign intended as such must always be encoded, as its
presence otherwise always indicates an encoding. Sequences which
start with a percent sign but are not followed by two hexadecimal
characters are reserved for future extension. (See Example 3.)
Example 1
The URIs
http://info.cern.ch/albert/bertram/marie-claude
and
http://info.cern.ch/albert/bertram/marie%2Dclaude
are identical, as the %2D encodes a hyphen character.
Example 2
The URIs
http://info.cern.ch/albert/bertram/marie-claude
and
http://info.cern.ch/albert/bertram%2Fmarie-claude
are NOT identical, as in the second case the encoded slash does not
have hierarchical significance.
Example 3
The URIs
fxqn:/us/va/reston/cnri/ietf/24/asdf%*.fred
and
news:12345667123%asdghfh@info.cern.ch
are illegal, as all % characters imply encodings, and there is no
decoding defined for "%*" or "%as" in this recommendation.
Next: 4.1.4. Partial (relative) form
Connected: An Internet Encyclopedia
4.1.3. Encoding reserved characters
|