|
|
Conclusions
Connected: An Internet Encyclopedia
Conclusions
Up:
Connected: An Internet Encyclopedia
Up:
Project Documentation
Prev: Search Engine Design
Conclusions
Conclusions
- Absolutely minimize computational requirements on Web servers
A client-side macro processor, particularly one that
could include arbitrary Web documents, would almost certainly
be a useful Web enhancement.
- Skill operating and programming GNU Emacs is invaluable when converting
between document formats
Although cumbersome at times, Emacs seems to be the hands down winner
for automating document conversion. What other program
lets you convert a document, examine the result,
undo the operation with a single command, modify the LISP code driving
the conversion, then try it again with nary a single wasted keystroke?
For document conversion purposes, the most useful change to EMACS would
be an enhancement or replacement for regular expressions. In particular,
it would nice if regular expressions could count,
to more easily interpret varying indentation levels.
I developed a prototype for such a system by taking Frolic (a Prolog-like
language written in Common Lisp), and porting it to EMACS Lisp. After
making a few enhancements, I was able to write grammar-like constructions
to describe blocks of text.
See prolog.el and match.el.
Performance is absolutely horrible.
This example describes one way to
satisfy the match-contents-lines rule:
(*- (match-contents-lines _Start _End)
(match-contents-line _Start _X)
(match "\\s-*<BR>\n" _X _Y)
(match-contents-lines _Y _End))
I plan to try this again sometime, only with a external Prolog engine
(like SB-Prolog), modified to perform RPC interactions over a controlling
TCP connection, which will be driven by EMACS Lisp networking code.
- Establish scripts and procedures for enforcing conformance with
a standard page format
Simply establishing that URL addresses point somewhere would be a big
help to many Web sites. For larger Web problems, with hundreds
or thousands of Web pages, conformance with a standard format is
essential if the full potential of tools like sed
is to be realized.
- Build search tools as you go
Until powerful, standard search tools are easily available, build
your own search engine along the way. Don't wait until the end
to add the search facilities, since a good search tools will
aid construction just as much as browsing.
- HTML is a terrible idea
- Its design is not consistent with its use
HTML's stated design objective is as a language to describe content,
but it is used to describe presentation.
- It does not meet the objectives of its design
HTML is not a good content description language, for the following reasons:
- It attempts to define a single content language for every document
How can one content language describe a court ruling full of legal
references, a repair procedure (with diagrams) for a car engine,
a mathematical paper, and a Spanish story with an English translation
running parallel to it? No one content language can do this,
unless it is extremely feature-loaded, which HTML is not.
- It lacks an extension facility
It is particularly difficult to imagine how a language lacking some
sort of extension facility, such as macro definition, can claim
to represent content. Content is much more in the mind of the
author than in some boilerplate template.
What if the author is documenting a new computer language, and
wants to define a format for sample code in which the new language's
keywords are hyperlinked to their definitions?
Unless the language
is malleable enough to be adapted to each author's needs, it
can hardly represent content well.
- Many simple content features have no HTML representation
How is a bibliographic reference represented?
How about a numbered list that spans several Web pages?
Or for that matter, a numbered header that updates automatically
as new sections are added or old ones removed?
How is a hyperlink to a footnote represented different from a link to
a glossary item?
For that matter, isn't the whole concept of a
hyperlink inherently presentation-dependent?
- It does not meet the needs of its use
HTML is not a good presentation language, for the following reasons:
- It is incomplete
HTML 2 is being standardized. HTML 3 is in the works. There is talk
of an HTML 3+. There will undoubtedly be an HTML 4, an HTML 5, and so on.
This is because the language is incomplete. It can not describe
an purely arbitrary page layout. Therefore, it will have to be continually
revised as people find more and more things they want to do with it,
but can not.
- It is not standardized
There is no definitive description for how particular HTML code is
to be rendered. Thus, I know that to create an indented paragraph
using the Netscape browser, I enclose the text in an otherwise
empty list. There is no assurance that this or any other technique
will work with a different browser.
- It does not scale well
HTML is poorly suited for constructing large hypertext systems,
mainly because it lacks any features to relate
multiple Web pages into a coherent entity.
Unless you want to deal with a tangled mess of 10,000 Web pages
(all different), you must create a standardized structure of your
own - and then enforce it.
- It is not adaptable to change
A better solution might start by acknowledging the functional separation
between content and presentation. The Internet community should
standardize on a presentation language, and leave the choice of content
representation to the author. The standard presentation language
should be complete, be described by a standards-track protocol
document that clearly defines its exact interpretation, and allow
for easy growth and extensibility.
One such a language already exists - Postscript. Defining a hypertext
link could be achieved by adding a single primitive. A workable
free software implementation of the language exists in the form
of Ghostscript.
Almost every modern word processing system
can generate Postscript. The amount of programming effort required
to extend such systems to generate "Web Postscript" would
presumably be negligible.
Netscape, the Encyclopedia's standard browser, can generate
Postscript, as can most other Web browsers. Therefore, it would
seem that HTML-to-Postscript conversion should also require a
trivial effort. Backwards compatibility would be achieved,
and those who which to continue using HTML could do so,
requiring only the extra step of converting their HTML to Postscript
before installing it on the Web.
- See also my
hypertext page in the topical core
Connected: An Internet Encyclopedia
Conclusions
|
|
|
 |

|
 |
|
Protect yourself from cyberstalkers, identity thieves, and those who would snoop on you.
| |
Stop spam from invading your inbox without losing the mail you want. We give you more control over your e-mail than any other service.
| |
Block popups, ads, and malicious scripts while you surf the net through our anonymous proxies.
| |
Participate in Usenet, host your web files, easily send anonymous messages, and more, much more.
| |
All private, all encrypted, all secure, all in an easy to use service, and all for only $5.95 a month!
|
|
Service Details
|
|
 |
|