5.3 Interaction with data compression
Connected: An Internet Encyclopedia
5.3 Interaction with data compression
Up:
Connected: An Internet Encyclopedia
Up:
Requests For Comments
Up:
RFC 1144
Up:
5 Configurable parameters and tuning
Prev: 5.2 Choosing a maximum transmission unit
Next: 6 Performance measurements
5.3 Interaction with data compression
5.3 Interaction with data compression
Since the early 1980's, fast, effective, data compression algorithms
such as Lempel-Ziv[7] and programs that embody them, such as the
compress program shipped with Berkeley Unix, have become widely
available. When using low speed or long haul lines, it has become
common practice to compress data before sending it. For dialup
connections, this compression is often done in the modems, independent
of the communicating hosts. Some interesting issues would seem to be:
(1) Given a good data compressor, is there any need for header
compression? (2) Does header compression interact with data
compression? (3) Should data be compressed before or after header
compression?/39/
To investigate (1), Lempel-Ziv compression was done on a trace of 446
TCP/IP packets taken from the user's side of a typical telnet
conversation. Since the packets resulted from typing, almost all
contained only one data byte plus 40 bytes of header. I.e., the test
essentially measured L-Z compression of TCP/IP headers. The compression
ratio (the ratio of uncompressed to compressed data) was 2.6. In other
words, the average header was reduced from 40 to 16 bytes. While this
is good compression, it is far from the 5 bytes of header needed for
good interactive response and far from the 3 bytes of header (a
compression ratio of 13.3) that header compression yielded on the same
packet trace.

Figure 10: Data compression alternatives
The second and third questions are more complex. To investigate them,
several packet traces from FTP file transfers were analyzed/40/ with and
without header compression and with and without L-Z compression. The
L-Z compression was tried at two places in the outgoing data stream
(fig. 10): (1) just before the data was handed to TCP for
encapsulation (simulating compression done at the `application' level)
and (2) after the data was encapsulated (simulating compression done in
the modem). Table 1 summarizes the results for a 78,776 byte ASCII text
file (the Unix csh.1 manual entry)/41/ transferred using the guidelines
of the previous section (256 byte MTU or 216 byte MSS; 368 packets
total). Compression ratios for the following ten tests are shown
(reading left to right and top to bottom):
- data file (no compression or encapsulation)
- data -> L--Z compressor
- data -> TCP/IP encapsulation
- data -> L--Z -> TCP/IP
- data -> TCP/IP -> L--Z
- data -> L--Z -> TCP/IP -> L--Z
- data -> TCP/IP -> Hdr. Compress.
- data -> L--Z -> TCP/IP -> Hdr. Compress.
- data -> TCP/IP -> Hdr. Compress. -> L--Z
- data -> L--Z -> TCP/IP -> Hdr. Compress. -> L--Z
| No data compress. | L-Z on data | L-Z on wire | L-Z on both
|
|---|
Raw Data + TCP Encap. w/Hdr Comp.
| 1.00 0.83 0.98
| 2.44 2.03 2.39
| - 1.97 2.26
| - 1.58 1.66
|
Table 1: ASCII Text File Compression Ratios
The first column of table 1 says the data expands by 19% (`compresses'
by .83) when encapsulated in TCP/IP and by 2% when encapsulated in
header compressed TCP/IP./42/ The first row says L--Z compression is
quite effective on this data, shrinking it to less than half its
original size. Column four illustrates the well-known fact that it is a
mistake to L--Z compress already compressed data. The interesting
information is in rows two and three of columns two and three. These
columns say that the benefit of data compression overwhelms the cost of
encapsulation, even for straight TCP/IP. They also say that it is
slightly better to compress the data before encapsulating it rather than
compressing at the framing/modem level. The differences however are
small --- 3% and 6%, respectively, for the TCP/IP and header compressed
encapsulations./43/
Table 2 shows the same experiment for a 122,880 byte binary file (the
Sun-3 ps executable). Although the raw data doesn't compress nearly as
well, the results are qualitatively the same as for the ASCII data. The
one significant change is in row two: It is about 3% better to compress
the data in the modem rather than at the source if doing TCP/IP
encapsulation (apparently, Sun binaries and TCP/IP headers have similar
statistics). However, with header compression (row three) the results
were similar to the ASCII data --- it's about 3% worse to compress at
the modem rather than the source./44/
| No data compress. | L-Z on data | L-Z on wire | L-Z on both
|
|---|
Raw Data + TCP Encap. w/Hdr Comp.
| 1.00 0.83 0.98
| 1.72 1.43 1.69
| - 1.48 1.64
| - 1.21 1.28
|
Table 2: Binary File Compression Ratios
Next: 6 Performance measurements
Connected: An Internet Encyclopedia
5.3 Interaction with data compression
|