Douglas Comer defines a protocol as "a formal description
of message formats and the rules two or more machines must follow
to exchange those messages."
Protocols usually exist in two forms. First, they exist in a textual
form for humans to understand.
The majority of Internet protocols are distributed as RFCs,
which can (and should) be read to understand the protocols'
design and operation.
Second, they exist as programming code
for computers to understand. Both forms should ultimately specify the
precise interpretation of every bit of every message exchanged across
a network.
Protocols should exist at every point
where logical program flow crosses between hosts or programs.
In other words,
we need protocols every time
two different computers or programs need to agree on how
they will communicate information between them.
Every time we want to print something on a network printer
we need protocols, otherwise there will be no agreement on how
to pause the sending computer's output if the printer falls behind.
Every time we want to download a file we need protocols,
otherwise the computers will be unable to agree on which file
should be downloaded.
Every time we want to save our work on disk, we don't
need protocols - unless the disk is on a network file server.
Usually multiple protocols will be in use simultaneously. For one thing,
computers usually do several things at once, and often for several people
at one. Therefore, most protocols support multitasking. Also, one
operation can involve several protocols. For example,
consider the NFS (Network File System) protocol. A write to a file is
done with an NFS operation, that uses another protocol (RPC) to
perform a function call on a remote host, that uses another protocol
(UDP) to deliver a datagram to a port on a remote host, that uses
another protocol to delivery a datagram on an Ethernet, and so on.
Along the way we made need to lookup host names (using the DNS protocol),
convert data to a network standard form (using the XDR protocol),
find a routing path to the host (using one or many of numerous protocols) -
I think you get the idea.
Initially, protocols were specified using an explicit description
of how every bit in a binary message should be interpreted.
For example,
RFC 791 Section 3.1,
part of the IP Protocol, specifies the exact
interpretation of every bit in the IP packet header.
In more recent years, it has become popular to specify
protocols using a higher-layer description to avoid
such tedious details, while avoiding ambiguity.
Two popular means of doing this are
ASCII Request/Reply and ASN.1.
In addition to specifying message formats, a protocol
may also specify when certain messages are allowed to occur.
For example, a file transfer protocol
may not allow a READ message until after an OPEN
message has been successfully transferred.
State diagrams are the most popular way to do this (see
RFC 793 Section 3.2 for an example),
though ITU-T standards use a formal graphical syntax called SDL.
\begin{soapbox}
One of the challenges facing network designers is to construct protocols
that are as specific as possible to one function. For example,
I consider NFS a good protocol design because one protocol does
file transport (NFS), one protocol does procedure calls (RPC), etc.
If you need to make a remote procedure call to print a file, you
already have the RPC protocol that already does almost everything
you need. Add one piece to the puzzle - a printing protocol, defined
in terms using the RPC protocol, and your job is done.
On the other hand, I do not consider TCP a very good protocol, because
it mixes two functions: reliable data delivery and connection-oriented
streams. Consequently, the Internet lacks a good, reliable datagram
delivery mechanism, because TCP's reliable delivery techniques,
while effective, are specific to stream connections.