Congestion collapse
Connected: An Internet Encyclopedia
Congestion collapse
Up:
Connected: An Internet Encyclopedia
Up:
Requests For Comments
Up:
RFC 896
Prev: Introduction
Next: The two problems
Congestion collapse
Congestion collapse
Before we proceed with a discussion of the two specific problems
and their solutions, a description of what happens when these
problems are not addressed is in order. In heavily loaded pure
datagram networks with end to end retransmission, as switching
nodes become congested, the round trip time through the net
increases and the count of datagrams in transit within the net
also increases. This is normal behavior under load. As long as
there is only one copy of each datagram in transit, congestion is
under control. Once retransmission of datagrams not yet
delivered begins, there is potential for serious trouble.
Host TCP implementations are expected to retransmit packets
several times at increasing time intervals until some upper limit
on the retransmit interval is reached. Normally, this mechanism
is enough to prevent serious congestion problems. Even with the
better adaptive host retransmission algorithms, though, a sudden
load on the net can cause the round-trip time to rise faster than
the sending hosts measurements of round-trip time can be updated.
Such a load occurs when a new bulk transfer, such a file
transfer, begins and starts filling a large window. Should the
round-trip time exceed the maximum retransmission interval for
any host, that host will begin to introduce more and more copies
of the same datagrams into the net. The network is now in serious trouble. Eventually all available buffers in the switching
nodes will be full and packets must be dropped. The round-trip
time for packets that are delivered is now at its maximum. Hosts
are sending each packet several times, and eventually some copy
of each packet arrives at its destination. This is congestion
collapse.
This condition is stable. Once the saturation point has been
reached, if the algorithm for selecting packets to be dropped is
fair, the network will continue to operate in a degraded condition. In this condition every packet is being transmitted
several times and throughput is reduced to a small fraction of
normal. We have pushed our network into this condition experimentally and observed its stability. It is possible for round-trip time to become so large that connections are broken because
the hosts involved time out.
Congestion collapse and pathological congestion are not normally
seen in the ARPANET / MILNET system because these networks have
substantial excess capacity. Where connections do not pass
through IP gateways, the IMP-to host flow control mechanisms usually prevent congestion collapse, especially since TCP implementations tend to be well adjusted for the time constants associated with the pure ARPANET case. However, other than ICMP Source
Quench messages, nothing fundamentally prevents congestion collapse when TCP is run over the ARPANET / MILNET and packets are
being dropped at gateways. Worth noting is that a few badly-behaved hosts can by themselves congest the gateways and prevent
other hosts from passing traffic. We have observed this problem
repeatedly with certain hosts (with whose administrators we have
communicated privately) on the ARPANET.
Adding additional memory to the gateways will not solve the problem. The more memory added, the longer round-trip times must
become before packets are dropped. Thus, the onset of congestion
collapse will be delayed but when collapse occurs an even larger
fraction of the packets in the net will be duplicates and
throughput will be even worse.
Next: The two problems
Connected: An Internet Encyclopedia
Congestion collapse
|