1. INTRODUCTION
Connected: An Internet Encyclopedia
1. INTRODUCTION
Up:
Connected: An Internet Encyclopedia
Up:
Requests For Comments
Up:
RFC 1072
Prev: RFC 1072
Next: 2. TCP WINDOW SCALE OPTION
1. INTRODUCTION
1. INTRODUCTION
Recent work on TCP performance has shown that TCP can work well over
a variety of Internet paths, ranging from 800 Mbit/sec I/O channels
to 300 bit/sec dial-up modems [Jacobson88]. However, there is still
a fundamental TCP performance bottleneck for one transmission regime:
paths with high bandwidth and long round-trip delays. The
significant parameter is the product of bandwidth (bits per second)
and round-trip delay (RTT in seconds); this product is the number of
bits it takes to "fill the pipe", i.e., the amount of unacknowledged
data that TCP must handle in order to keep the pipeline full. TCP
performance problems arise when this product is large, e.g.,
significantly exceeds 10**5 bits. We will refer to an Internet path
operating in this region as a "long, fat pipe", and a network
containing this path as an "LFN" (pronounced "elephan(t)").
High-capacity packet satellite channels (e.g., DARPA's Wideband Net)
are LFN's. For example, a T1-speed satellite channel has a
bandwidth*delay product of 10**6 bits or more; this corresponds to
100 outstanding TCP segments of 1200 bytes each! Proposed future
terrestrial fiber-optical paths will also fall into the LFN class;
for example, a cross-country delay of 30 ms at a DS3 bandwidth
(45Mbps) also exceeds 10**6 bits.
Clever algorithms alone will not give us good TCP performance over
LFN's; it will be necessary to actually extend the protocol. This
RFC proposes a set of TCP extensions for this purpose.
There are three fundamental problems with the current TCP over LFN
paths:
- Window Size Limitation
The TCP header uses a 16 bit field to report the receive window
size to the sender. Therefore, the largest window that can be
used is 2**16 = 65K bytes. (In practice, some TCP
implementations will "break" for windows exceeding 2**15,
because of their failure to do unsigned arithmetic).
To circumvent this problem, we propose a new TCP option to allow
windows larger than 2**16. This option will define an implicit
scale factor, to be used to multiply the window size value found
in a TCP header to obtain the true window size.
- Cumulative Acknowledgments
Any packet losses in an LFN can have a catastrophic effect on
throughput. This effect is exaggerated by the simple cumulative
acknowledgment of TCP. Whenever a segment is lost, the
transmitting TCP will (eventually) time out and retransmit the
missing segment. However, the sending TCP has no information
about segments that may have reached the receiver and been
queued because they were not at the left window edge, so it may
be forced to retransmit these segments unnecessarily.
We propose a TCP extension to implement selective
acknowledgements. By sending selective acknowledgments, the
receiver of data can inform the sender about all segments that
have arrived successfully, so the sender need retransmit only
the segments that have actually been lost.
Selective acknowledgments have been included in a number of
experimental Internet protocols -- VMTP [Cheriton88], NETBLT
[Clark87], and RDP [Velten84]. There is some empirical evidence
in favor of selective acknowledgments -- simple experiments with
RDP have shown that disabling the selective acknowlegment
facility greatly increases the number of retransmitted segments
over a lossy, high-delay Internet path [Partridge87]. A
simulation study of a simple form of selective acknowledgments
added to the ISO transport protocol TP4 also showed promise of
performance improvement [NBS85].
- Round Trip Timing
TCP implements reliable data delivery by measuring the RTT,
i.e., the time interval between sending a segment and receiving
an acknowledgment for it, and retransmitting any segments that
are not acknowledged within some small multiple of the average
RTT. Experience has shown that accurate, current RTT estimates
are necessary to adapt to changing traffic conditions and,
without them, a busy network is subject to an instability known
as "congestion collapse" [Nagle84].
In part because TCP segments may be repacketized upon
retransmission, and in part because of complications due to the
cumulative TCP acknowledgement, measuring a segments's RTT may
involve a non-trivial amount of computation in some
implementations. To minimize this computation, some
implementations time only one segment per window. While this
yields an adequate approximation to the RTT for small windows
(e.g., a 4 to 8 segment Arpanet window), for an LFN (e.g., 100
segment Wideband Network windows) it results in an unacceptably
poor RTT estimate.
In the presence of errors, the problem becomes worse. Zhang
[Zhang86], Jain [Jain86] and Karn [Karn87] have shown that it is
not possible to accumulate reliable RTT estimates if
retransmitted segments are included in the estimate. Since a
full window of data will have been transmitted prior to a
retransmission, all of the segments in that window will have to
be ACKed before the next RTT sample can be taken. This means at
least an additional window's worth of time between RTT
measurements and, as the error rate approaches one per window of
data (e.g., 10**-6 errors per bit for the Wideband Net), it
becomes effectively impossible to obtain an RTT measurement.
We propose a TCP "echo" option that allows each segment to carry
its own timestamp. This will allow every segment, including
retransmissions, to be timed at negligible computational cost.
In designing new TCP options, we must pay careful attention to
interoperability with existing implementations. The only TCP option
defined to date is an "initial option", i.e., it may appear only on a
SYN segment. It is likely that most implementations will properly
ignore any options in the SYN segment that they do not understand, so
new initial options should not cause a problem. On the other hand,
we fear that receiving unexpected non-initial options may cause some
TCP's to crash.
Therefore, in each of the extensions we propose, non-initial options
may be sent only if an exchange of initial options has indicated that
both sides understand the extension. This approach will also allow a
TCP to determine when the connection opens how big a TCP header it
will be sending.
Next: 2. TCP WINDOW SCALE OPTION
Connected: An Internet Encyclopedia
1. INTRODUCTION
|