Internet DRAFT - draft-doupnik-nagle-mode

draft-doupnik-nagle-mode



HTTP/1.1 200 OK
Date: Mon, 08 Apr 2002 23:38:22 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Mon, 14 Jun 1999 18:57:57 GMT
ETag: "2e791e-9b9a-376550b5"
Accept-Ranges: bytes
Content-Length: 39834
Connection: close
Content-Type: text/plain

  TCP Implementation Working Group                      Joe R. Doupnik
  Internet Draft                                 Utah State University
  Expiration Date: December 1999                             June 1999
  draft-doupnik-nagle-mode-00.txt

            A new TCP transmission policy replacing Nagle mode

  Status of this Memo

  This document is an Internet-Draft and is in full conformance with
  all provisions of Section 10 of RFC2026.

  Internet-Drafts are working documents of the Internet Engineering
  Task Force (IETF), its areas, and its working groups.  Note that
  other groups may also distribute working documents as Internet-
  Drafts.

  Internet-Drafts are draft documents valid for a maximum of six months
  and may be updated, replaced, or obsoleted by other documents at any
  time.  It is inappropriate to use Internet-Drafts as reference
  material or to cite them other than as "work in progress."

  The list of current Internet-Drafts can be accessed at
  http://www.ietf.org/ietf/1id-abstracts.txt

  The list of Internet-Draft Shadow Directories can be accessed at
  http://www.ietf.org/shadow.html.

  Abstract

  Both Nagle mode and delayed ACKs attempt to conserve network and host
  machine resources by delaying transmissions in the expectation that
  the current material can be piggybacked onto a future transmission.
  Unfortunately when both mechanisms are active at the same time on
  either end of a connection a deadlock can exist, which is broken by
  arrival of new data for transmission or firing of the delayed ACK
  timer. This produces classical timer based ACKing, which for the
  common 200ms ACK delay yields five exchanges per second.

  A new TCP transmission policy is discussed in this memo which uses
  information known only to the transmitter about when to send
  segments. It groups octets based on filling segments and sending a
  small segment when the application indicates no more data are
  immediately available, not on arrival of ACKs. It works well with and
  avoids deadlocks with delayed ACKs. It is automatic and does not need
  to be turned off. It is a suitable replacement for Nagle mode.

  A new TCP transmission policy replacing Nagle mode          [Page 2]

  Table of Contents

  1.0 Introduction.................................................2
  1.1 Maximum Segment Size, MSS....................................3
  1.2 Nagle mode...................................................3
  1.3 Strict Nagle mode............................................3
  1.4 Strict Nagle example.........................................4
  1.5 Liberal Nagle mode...........................................5
  1.6 Delayed ACKs.................................................5
  2.0 New transmission policy......................................6
  2.1 Formal statement of new policy...............................7
  2.2 Discussion...................................................8
  2.3 Operation between like and unlike TCP stacks.................9
  3.0 Experimental results........................................10
  4.0 Conclusions.................................................13
  5.0 Security Considerations.....................................14
  6.0 Acknowledgments.............................................14
  7.0 References..................................................14
  8.0 Author's address............................................14

  1.0 Introduction

  Nagle mode [TCP:1] and delayed ACKs are TCP heuristics designed to
  reduce network traffic, and the consequent load on both originating
  and receiving hosts. They perform this by slightly different means,
  but the common factor is to delay a transmission in the expectation
  that another will be required quickly and hence the present and next
  transmissions may be combined into one (piggybacking).

  When both modes are active, as they should be to conserve resources,
  then they may interact to hold data at the transmitter while the
  receiver holds/delays the ACKs until a very slow (200ms) timer forces
  out the ACKs. The delay is of major importance when the conversation
  is alternating between hosts, where one side makes requests, the
  other responds, and the pattern repeats. The response is delayed
  until the entire request has arrived at the receiver. Yet the next to
  last packet of the request can result in a delayed ACK which in turn
  delays release of the last packet being held by the Nagle condition.

  A delay in sending all octets from one side or the other can slow the
  conversation to about 1/delayed_ack_time exchanges per second
  (typically 5 exchanges per second). Such patterns are common for web
  serving, SMTP mail queues, and other modern applications.

  Today many application programmers turn off Nagle mode to overcome
  the interaction. They cannot control delayed ACKs which are often
  turned on or off on a system-wide basis. Unfortunately, turning off
  Nagle mode increases network traffic, host machine workload, and
  router workload. If applications cannot turn off Nagle mode to avoid
  the delayed ACK effect then UDP is the next candidate, and that means
  no regard for the network and little regard (or lots of work in the
  application) for lost packets. Today's growing request/reply work
  would be better served by responsive TCP based communications.

  Doupnik                                                       Page 2

  A new TCP transmission policy replacing Nagle mode          [Page 3]

  1.1 Maximum Segment Size, MSS

  In the following discussion we will use MSS, Maximum Segment Size, as
  a test criteria for full segments. What is meant is the full capacity
  for TCP data after allowing for IP and TCP headers and options, which
  RFC1122 [TCP:2] represents as Eff.snd.MSS. Also some hosts use a
  power of two buffer sizes as a full segment although the MSS is
  larger. Nevertheless, we will employ the term MSS, Maximum Segment
  Size, to be the host's concept of its largest segment size at one
  moment.

  1.2 Nagle mode

  The current definition of Nagle mode is found in RFC1122, [TCP:2],
  section 4.2.3.4 When to Send Data:
     (start quote)
      The Nagle algorithm is generally as follows:
          If there is unacknowledged data (i.e., SND.NXT > SND.UNA),
          then the sending TCP buffers all user data (regardless of the
          PSH bit), until the outstanding data has been acknowledged or
          until the TCP can send a full-sized segment (Eff.snd.MSS
          bytes; see Section 4.2.2.6).
     (end quote)

  Nagle mode has been implemented in at least two different forms,
  leading to different behaviors. Each is discussed below. The
  different forms result from answering the question: if more than one
  Eff.snd.MSS of data has accumulated, how much beyond full segments
  may be sent at once?

  The strict approach answers the question above by sending only full
  segments. A last short segment will be retained for later release. A
  liberal approach answers it by sending all available data including a
  possible (very likely) short ending component. The labels strict
  Nagle and liberal Nagle are used in this paper for purposes of
  discussion. As a matter of interest, TCP/IP stacks derived from BSD
  sources often use the strict Nagle mechanism.

  1.3 Strict Nagle form

  The strict Nagle form transmits only full sized segments while
  awaiting ACKs for previously sent data. A partial segment of unsent
  data remaining afterward is retained in the transmit buffer as unsent
  data until all preceding data have been ACKed, or until more
  application data arrives to compose full length segments. Window size
  and congestion avoidance criteria of Van Jacobson [TCP:3] may cause
  even these to remain unsent for some time.

  Holding back the last partial segment leads to grouping with later
  new application data and hence sending full segments when possible.
  Delayed ACKs assist grouping in the transmitter by allowing time for
  the application to add more octets, assuming there is more data and
  the receiver's window is large enough. But they also introduce the

  Doupnik                                                       Page 3

  A new TCP transmission policy replacing Nagle mode          [Page 4]

  problem of delaying release of the held tail octets. Prior to the
  tail segment, strict Nagle mode is doing a fine job of forming full-
  length segments for transmission. Timely release of held tail octets
  is the essence of the interaction problem discussed in this document.

  1.4 Strict Nagle example

  As an example, suppose the TCP buffer is empty and the application
  writes 3.5 MSS worth of data to it. Remote host window size and
  congestion avoidance criteria are applied to determine the size of
  the candidate transmission. We may consider two cases, one where all
  data are allowed and a second where less is allowed.

  The first case is all octets are allowed. A full MSS of data is
  fetched from the buffer and the Nagle test is applied. It passes
  because the size is a full MSS. The data is sent. The transmitter
  loops back for a second fetch. The Nagle test finds a full segment
  and transmits it although unACKed data exist from the first
  transmission. This repeats until it fetches the last piece, 0.5 MSS.

  The Nagle test fails for it because it is smaller than a full segment
  and there is unACKed data in transit. The test will fail again until
  there is no unACKed data (or enough application data arrives). The
  small tail piece is held until all preceding octets have been ACKed,
  not just the first or second segments. Thus up to three ACKs may be
  required to release the tail. This is a "held tail" effect.

  The second case is windowing and congestion avoidance allows only a
  few octets to be transmitted, say two MSS worth. The first two
  segments are full length and are sent promptly. Nothing more can be
  sent until either a fresh write from the application or arrival of a
  packet creates another transmission opportunity. 1.5 MSS of data
  remain blocked and invisible to Nagle tests. Suppose the application
  does not write more data. The transmitter awaits a packet from the
  receiver that results in calling the transmission code again. At that
  time as many full segments permitted by windowing and congestion
  avoidance are sent. A partial segment remainder blocks by strict
  Nagle rules because it is smaller than a full segment and unACKed
  data are in transit. Up to three ACKs may be required to release the
  trailer. This is a "held tail" effect.

  Unfortunately, the last ACK may be delayed and thus the last piece
  may not go onto the wire for the duration of the receiver's delayed
  ACK timer. The receiver does not know that the transmitter has data
  blocked waiting for the final ACK (rather than say data being forced
  out by new writes from the application). Waiting for the last ACK can
  involve the full delayed ACK interval, often 200ms; and that results
  in timer based ACKing.

  Doupnik                                                       Page 4

  A new TCP transmission policy replacing Nagle mode          [Page 5]

  1.5 Liberal Nagle mode

  The second form of Nagle mode applies the full segment rule from
  RFC1122 but interprets it as saying a trailing partial segment may be
  transmitted with full segments during the blocked condition. In
  essence, the size determination is made on all allowed unsent data
  rather than testing each candidate segment individually as in the
  strict Nagle case. The test should be on all unsent data after being
  reduced by remote host window capacity and congestion avoidance
  limits. The test is really on the minimum of "allowed" (by window
  size and congestion avoidance) and "available" (the number of unsent
  octets visible to the TCP transmitter at that moment. Strict Nagle
  mode of course experiences the same size filtering before data reach
  it.

  The liberal Nagle form reduces but does not eliminate incidence of
  held tails, as the following example illustrates, whereas strict
  Nagle mode creates such incidences at almost each application write
  event. Liberal Nagle blocks with a partial segment when the window
  size and congestion avoidance combine to hold back data during the
  next to last transmission opportunity and only a fraction of an MSS
  of data remain for the last transmission opportunity. The initial
  hold back is invisible to Nagle mode at that time so the small piece
  is not available to be included with the full segments. UnACKed data
  may exist from the previous send and the small segment remains
  blocked until preceding octets have been ACKed. Large transmitter and
  small receiver TCP window sizes and slow comms contribute markedly to
  this held tail effect with liberal Nagle mode.

  One may infer that liberal Nagle mode was created in part to reduce
  incidence of the held tail problem. Alas, it does reduce but not
  eliminate it, and in the process it may send small segments within
  application data.

  1.6 Delayed ACKs

  Delayed ACKs are a popular mechanism of TCP to avoid sending an ACK
  for each received segment. Typically, every other arrival generates
  an ACK. The mechanism is to create a delayed ACK queue which will be
  flushed to the wire as a single ACK when either a delayed ACK timer
  expires, or the queue length reaches a certain value (such as two
  entries), or the local machine sends data. Although ACKs are tiny-
  grams they do take time and CPU resources to create and to receive,
  and the routing load is the same as full-length segments. Even on a
  local wire without routers sending an ACK for each arriving segment
  creates noticeable additional load on both machines and on network
  capacity. Thus delaying to coalesce two or more ACKs is a good
  concept and is the same philosophy as grouping octets into full
  packets rather than many smaller ones.

  Delaying ACKs is guessing, to paraphrase private communications by
  John Nagle, that there will be either more data arriving immediately,
  or there will be a transmission by the receiver in a very short time,

  Doupnik                                                       Page 5

  A new TCP transmission policy replacing Nagle mode          [Page 6]

  or that the receiver doesn't care about immediacy, and thus
  delaying will be a good tactic. Unfortunately, the receiver has
  little basis for making the guess: the sending machine provides no
  hints, the local receiving application provides no notice of data
  about to be delivered. The delay time is fixed, which will be a
  mismatch for either local or long distance communications. And the
  PUSH bit isn't available to act as a hint because the last held
  segment gets the PUSH bit. At best, a receiver may infer tiny
  arrivals might be from human typing where the operating system will
  provide an immediate echo.

  Delayed ACKs would be more effective if the receiver were to adjust
  the delay time to match the session, say in a manner similar to
  making round trip timing estimates. One or two round trip times seems
  appropriate, where that information is available. One way transfers
  such as the FTP data channel make this approach impractical. In
  addition, fine scale timers for crisp responses are a burden for the
  operating system and may not be available for the short intervals of
  local area networking. For example, the 200ms delay of the fast timer
  in many BSD systems is very long on even many of today's long
  distance links. Thus the concept of dynamic delay time is difficult
  at this time and becomes more so at increasingly higher network
  speeds.

  2.0 New transmission policy

  This document proposes a new TCP transmission policy that allows
  delayed ACKs to work as present, thus retaining their advantages. It
  groups octets similar to Nagle algorithms and yet avoids deadlocks.

  Two terms need to be defined to simplify discussion. These are
  "available" data and "allowed" capacity. "Available" data are all the
  data from the application which are not yet sent. It is what a single
  write or output statement would provide. The TCP stack may see only a
  portion of this data on each invocation, or it may see it all. This
  implies the TCP stack knows such a length either explicitly or
  through an indicator from its caller. Current TCP stacks already
  perform this test to properly set the PUSH (PSH) bit.

  "Allowed" capacity is the number of octets permitted to be sent based
  on calculated receiver window size and congestion avoidance limits.
  It is the minimum of these two constraints. Calculated receiver
  window size is the usual value of the last announced window size
  minus the sent but unACKed data. It does not necessarily yield even
  MSS values. Heuristics in the transmitter may modify the calculation.
  Congestion avoidance is the normal Van Jacobson congestion window
  [TCP:3] and this normally yields full MSS values.

  The new policy acts after the window size and congestion avoidance
  size restrictions are applied.

  The transmitting side has a transmission policy designed to group
  data into full segments and to not hold the very last segment. This
  may be stated ambiguously as transmit now if a full segment is

  Doupnik                                                       Page 6

  A new TCP transmission policy replacing Nagle mode          [Page 7]

  available (after limitations of receiver window size and congestion
  avoidance are applied). A small segment candidate should be sent
  immediately only if it exhausts all data from the application;
  otherwise it should be held for joining by more application data.

  Two parts of the above paragraph are unclear. First, "transmit now"
  does not state how much can be transmitted at one time, a problem
  seen with the Nagle algorithm. The policy can be strict: transmit
  whole segments only and withhold a final small segment until an
  indicator of "no more data will follow" has been obtained. It can be
  liberal: transmit a partially full segment if one or more full
  segments immediately precede it, even though this leads to smaller
  segments on the wire than the strict case. These two policies mimic
  strict and liberal Nagle modes used today, but minus ACKs and
  consideration for unACKed data.

  What the policy should not be: hold back a small segment because
  unACKed data is present. That creates the held tail deadlock seen
  with Nagle mode combined with delayed ACKs.

  The second ambiguous part is the size of the transmission buffer.
  Some systems expose the entire application buffer to the protocol
  stack. In such systems TCP may easily decide when the current
  candidate for transmission will empty the buffer. Other systems may
  divide the application buffer into many smaller intermediate buffers
  and expose only an intermediate buffer to TCP, one for each call upon
  the transmitter. The latter requires the operating system to provide
  an indicator of end of application data, a flag or variable or
  equivalent, marking the current buffer as the last in a series and
  thus no more data will follow it. In either case, the TCP stack knows
  how much data is "available" and thus it knows when to properly set
  the PUSH (PSH) bit.

  2.1 Formal statement of new policy

  Stated formally the new transmission policy is as follows:

     Rule 1. Transmit all full segments in min(available, allowed).

     Rule 2. If a partial segment occurs in min(available, allowed)
     then transmit it now if it includes the end of application data;
     otherwise retain it.

     And optionally

     Rule 3. If a partial segment occurs in min(available, allowed)
     then transmit it now if min(available, allowed) is larger than a
     full segment. This modifies phrase "otherwise retain it" above.

  min(a, b) represents the smaller value of a or b.
  Available is the total amount of unsent application data at the time
  of transmission.
  Allowed is the smaller of receiver apparent window size and
  congestion avoidance constraints.

  Doupnik                                                       Page 7

  A new TCP transmission policy replacing Nagle mode          [Page 8]

  2.2 Discussion

  We see that Rule 2 represents a policy of strict grouping until the
  end of application data. Rules 1 and 2 are necessary and sufficient
  for good network behavior and good application response.

  Key points of the new policy are the release conditions are generated
  by the transmitter rather than the receiver, and the conditions are a
  full segment or indication of end of application data. For Nagle
  modes, the release is generated by transmitter and receiver, and the
  conditions are a full segment or all previous data have been ACKed.

  Optional Rule 3 is a liberal policy to permit sending small segments
  from data immediately available but not at the end of application
  data. Rule 3 is presented only because some existing TCP/IP stacks
  are designed for the liberal Nagle approach.

  In practice, the above rules can be overlaid upon current Nagle mode
  code. The full segment test is performed, and the case where a small
  segment is to be delayed is modified to be: transmit a small segment
  if end of application data is reached, else delay it as before.

  At this point, we must discuss a useful and important side effect of
  using the new policy: the network will do what the application asks!

  When an application does small immediate mode writes, then it largely
  controls the size of segments sent onto the wire. This is because
  each output statement implies its own end of application data (give
  or take whatever the operating system may do between it and the
  protocol stack). In an extreme case the application may perform
  single octet writes in massive succession before reading a response.
  If the network can drain data faster than the application can create
  data (a classical queueing problem) then massive quantities of tiny
  segments will appear on the network. That imposes a very heavy load
  on both hosts and network communications. Slower draining yields
  larger segments, naturally, but erratically from erratic delays.

  By way of contrast, Nagle mode will send small segments if ACKs
  arrive promptly. When they don't then Nagle mode strongly groups
  data. A difference between Nagle mode and the new policy is timing
  affects Nagle mode and end of application data affects the new
  policy. The new policy strongly groups bytes that are within the
  application data set, independent of ACKs. One method uses network
  delays to group data and the other uses the application and local
  operating system.

  Non-Nagle mode waits for neither ACKs nor indication from the
  application. Liberal Nagle mode will behave like strict or non-Nagle
  modes, depending on whether all unsent data are smaller than a full
  segment, respectively.

  In the above case of one octet writing by the application, new policy
  and non-Nagle modes behave alike: send tinygrams. Nagle modes group
  data to the extent that ACKs are delayed.

  Doupnik                                                       Page 8

  A new TCP transmission policy replacing Nagle mode          [Page 9]

  To remove the uncertain element of ACK time of arrival, and its
  consequences for held tails and timer based ACKing, as well as bring
  the small segment problem under control the best strategy is for the
  application to write large components. This is readily accomplished
  by the application programmer. For example, rather than using
  immediate mode writing operations, such as Unix function write(),
  one may use equivalents which are buffered automatically in the
  application, such as Unix functions fwrite() or printf(). Unix
  functions are only illustrative here, as is BSD sockets. With
  buffered functions the protocol stack sees large buffer amounts even
  if data are generated in small increments by the application. Then
  the issue becomes one of using ACK time or application indication.

  Buffering is often accompanied by a buffer flush function, such as
  fflush() in Unix, to ensure all data are released at that time rather
  than waiting for the data pathway to be formally closed. A buffer flush
  function also serves as an indirect signal to the protocol stack that
  application data writing is complete, without there being a need to
  invent a special programmer's equivalent to flush TCP transmit data.
  The new policy is closely analogous to this file system buffering.

  It seems to the author that data aggregation at the application
  level makes best sense because the natural end of writing is known
  only at that level. Trying to predict the end of writing at the
  protocol stack level by either transmitter or receiver, in
  expectation of avoiding held tails from delayed ACKs and yet
  delaying transmission to form full length segments, is a very
  difficult task. It probably has no solution in the general case
  because a stack does not know when the application is truly finished
  writing. At best the stack is told when a portion of the output has
  been prepared. The new policy uses that information, as does the
  stack to set the PUSH bit.

  The new policy provides immediate response by the network when the
  application so indicates, which as noted is a double edged sword;
  otherwise it groups independently of network timing.

  The alternatives seem to be we must endure the delayed ACK effect of
  Nagle modes, or risk sending many small segments by poorly designed
  applications, or application writers will turn to UDP and bypass
  network protection mechanisms.

  2.3 Operation between like and unlike TCP stacks

  The new transmission policy proposed here resides entirely on the
  transmitting host. Receivers remain unchanged. Clearly, with
  bilateral exchanges both sides should implement the policy for best
  speed. The new policy sends the trailing segment of a series without
  waiting for ACKs to previous data, the same as non-Nagle mode. The
  new policy groups data into full segments (strict Rule 2), or does so
  most of the time (Rule 2 plus optional Rule 3), whereas non-Nagle
  mode and liberal Nagle mode may send short segments as each portion

  Doupnik                                                       Page 9

  A new TCP transmission policy replacing Nagle mode         [Page 10]

  of application data is delivered to the TCP stack. The PUSH bit
  should be set at end of application data by all policies.

  The receiver and network are ready to deal with the data, because
  window size and congestion avoidance criteria are still effective and
  are applied before either Nagle or new policy mechanisms. New policy
  transmitters send the trailing segment when the network and remote
  host is ready, whereas Nagle mode transmitters may wait for one or
  more ACKs to arrive.

  The new policy works well with the classical case of write(small),
  write(small), read(). Each write() creates a new application data set
  and each is sent immediately. Both strict and liberal Nagle
  transmitter holds the second write's data; that is the held tail
  effect. The new policy transmitter does not hold the second write's
  data, nor does non-Nagle mode.

  The new policy results in more tinygrams when a user is typing by
  hand, because each keystroke constitutes an entire application
  buffer. In practice this is a non-problem because people don't type
  that fast compared to even 200ms delayed ACKs. Thus in practice for
  human typing all three approaches and non-Nagle are about the same on
  the wire. Please see above on data aggregation by applications.

  Let us compare the three approaches for longer data transmissions.
  Strict Nagle induces a held-tail for each application buffer longer
  than one segment. Liberal Nagle can also, but only when windowing or
  congestion avoidance hold back octets. New policy and non-Nagle
  transmitters do not hold tails. During sending of the application
  buffer liberal Nagle, liberal new policy, and non-Nagle transmitters
  may send short segments if the data are delivered to the transmitter
  in small pieces. Strict Nagle and strict new policy transmitters join
  interior small pieces into full segments. However, small segments may
  arise naturally if the application buffer is short and/or its filling
  is slower than its draining by the network.

  In summary, new policy transmitters should work well with existing
  TCP/IP stacks and should produce no known side effects.

  3.0 Experimental results

  Four machines were used in a test configuration to examine serving
  web page activity with and without Nagle mode, and with the new
  transmission policy.

  Operating System Descriptions:
  UnixWare 7.0.1
          400MHz AMD cpu, 200ms delayed ACK, strict Nagle mode.
          32KB receive window. Source code was not available.
  FreeBSD v3.2
          233MHz AMD cpu, 200ms delayed ACK, strict Nagle mode.
          Source code was modified for new policy. Note indication
          of TCP receive window size, rwnd, in tests.

  Doupnik                                                      Page 10

  A new TCP transmission policy replacing Nagle mode         [Page 11]

  Solaris 7/Intel
          350MHz AMD cpu, 50ms delayed ACK, liberal Nagle mode.
          8KB receive window. Source code was modified for new policy.
  Linux 2.2.5-15
          350MHz AMD cpu, 10ms dynamically adjusted delayed ACK,
          liberal Nagle mode. 16KB receive window. Source code was
          modified for new policy.

  Interconnections were via a 100Mbps Ethernet hub. This has
  implications for the tests. The fast network is able to drain TCP
  data faster than the application can supply it. Thus protocol
  behavior is exposed that otherwise would be hidden by forced holding
  back from congestion avoidance and window size constraints.

  The test procedure employs a web request client to request a web
  page, receive and discard it without reading the content, request it
  again, and so on, and provide timing results. The client sends a
  short one packet GET request, it reads the server's HTTP headers and
  then it counts in the following data file. Once all data file octets
  have been read then the original request is repeated.

  Each Unix machine runs a simplified web server that replies to the
  request with two short packets, HTTP web server identification and
  the HTTP document description, followed by the document itself. Thus
  there are two short write()'s followed by a succession of 4KB
  write()'s for the file body. The client counts file octets and when
  done initiates the next request. Keep-alive connections were used to
  create a succession of request and replies on the same TCP
  connection. The serial nature of the request and reply means the
  longer the file the fewer requests occur per second.

  The web client produces delayed ACKs to all servers. Its use or not
  of Nagle mode has no influence because each request is only one
  segment and occurs after each long response from the server. Thus the
  server's protocol behavior is being examined in the presence of
  delayed ACKs.

  The web server is run as a single process without threads, to
  simplify the experiment and to emphasize serialized request and
  response interaction. Requests were repeated as fast as the systems
  could perform, up to 60 seconds or 50000 requests.

  The short file is smaller than window size and congestion avoidance
  limits, as well as fitting into a single Unix write() statement.  The
  longer file may encounter the window size limit, and it will be
  expressed as a sequence of Unix write statements. Both files have
  tails to be held (should we name this the monkey effect?).

  The interaction between Nagle mode and 200ms delayed ACKs is evident.
  Also present is a case where liberal Nagle mode is caught by delayed
  ACKs when window size constraints leave a small segment without
  preceding large segments to drag it out.

  Doupnik                                                      Page 11

  A new TCP transmission policy replacing Nagle mode         [Page 12]

  What the results show is the new policy works. It works better than
  strict Nagle. It works as well as both liberal Nagle (but without the
  held tail effect) and non-Nagle (but without sending small segments
  gratuitously). It does not require control at the application layer.
  However, as discussed previously, applications can abuse the swift
  responsiveness of the network by performing many small writes in
  succession without buffering at the applications layer.

  Table 1. Web page test results, requests and bytes per second.

  Client        Server        2.2KB file      33KB file
  ------       --------       ----------      ----------

  UW7          FreeBSD         5 req/sec        5 req/sec
               Nagle on       12 KB/sec       165 KB/sec

  UW7          FreeBSD      1249 req/sec      222 req/sec
               Nagle off    2914 KB/sec      7142 KB/sec

  UW7          FreeBSD      1247 req/sec      228 req/sec
               new policy   2909 KB/sec      7324 KB/sec

  UW7          Solaris7      991 req/sec      221 req/sec
               Nagle on     2311 KB/sec      7112 KB/sec

  UW7          Solaris7      935 req/sec      219 req/sec
               Nagle off    2181 KB/sec      7041 KB/sec

  UW7          Solaris7      993 req/sec      219 req/sec
               new policy   2317 KB/sec      7041 KB/sec

  FreeBSD      UW7             5 req/sec        5 req/sec
  16KB rwnd    Nagle on       12 KB/sec       177 KB/sec

  FreeBSD      UW7          1508 req/sec      264 req/sec
  16KB rwnd    Nagle off    3519 KB/sec      8478 KB/sec

  FreeBSD      UW7             5 req/sec        5 req/sec
  4KB rwnd     Nagle on       12 KB/sec       166 KB/sec

  FreeBSD      UW7          1421 req/sec      235 req/sec
  4KB rwnd     Nagle off    3315 KB/sec      7553 req/sec

  FreeBSD      Linux        1665 req/sec      277 req/sec
  16KB rwnd    Nagle on     3912 KB/sec      8876 KB/sec

  FreeBSD      Linux        1709 req/sec      279 req/sec
  16KB rwnd    Nagle off    3987 KB/sec      8970 KB/sec

  FreeBSD      Linux        1665 req/sec      277 req/sec
  16KB rwnd    new policy   3883 KB/sec      8894 KB/sec

  Doupnik                                                      Page 12

  A new TCP transmission policy replacing Nagle mode         [Page 13]

  FreeBSD      Linux        1685 req/sec       55 req/sec
  4KB rwnd     Nagle on     3930 KB/sec      1776 KB/sec

  FreeBSD      Linux        1692 req/sec      238 req/sec
  4KB rwnd     Nagle off    3946 KB/sec      7634 KB/sec

  FreeBSD      Linux        1699 req/sec      241 req/sec
  4KB rwnd     new policy   3964 KB/sec      7740 KB/sec

  FreeBSD      Solaris7     1104 req/sec      180 req/sec
  4KB rwnd     Nagle on     2575 KB/sec      5795 KB/sec

  FreeBSD      Solaris7     1090 req/sec      180 req/sec
  4KB rwnd     Nagle off    2544 KB/sec      5772 KB/sec

  FreeBSD      Solaris7     1090 req/sec      165 req/sec
  4KB rwnd     new policy   2543 KB/sec      5290 KB/sec

  FreeBSD      Solaris7     1151 req/sec      233 req/sec
  16KB rwnd    Nagle on     2685 KB/sec      7474 KB/sec

  FreeBSD      Solaris7     1186 req/sec      239 req/sec
  16KB rwnd    Nagle off    2768 KB/sec      7669 KB/sec

  FreeBSD      Solaris7     1206 req/sec      237 req/sec
  16KB rwnd    new policy   2813 KB/sec      7634 KB/sec

  Solaris7 as a client produced erratic results from long variable
  delays preceding each request. This occurred for stock and modified
  Solaris7. There is suspicion that its server performance may be
  influenced too.

  4.0 Conclusions

  The new TCP transmission policy solves the problem of Nagle mode
  deadlocking with delayed ACKs. It retains data grouping but
  operates with only transmitter information. It accommodates those
  systems which wish to implement a liberal sending policy regarding
  partial segments not at the end of application data, and those which
  prefer the stronger grouping of a strict sending policy.

  The new policy works well with delayed ACKs and sending into small
  receiver windows. Its performance is essentially the same as non-Nagle
  mode, yet it retains grouping which non-Nagle mode does not. It does
  not need an on/off control visible to applications. The new
  transmission policy is a suitable replacement for Nagle mode.

  The warning is the same as for non-Nagle mode: what is sent by the
  application to the protocol stack is also what the network tries to
  send. Thus grouping of data in applications and/or operating systems
  remains a good idea.

  Doupnik                                                      Page 13

  A new TCP transmission policy replacing Nagle mode         [Page 14]

  5.0 Security Considerations

  There are no security considerations in this memo.

  6.0 Acknowledgements

  Special thanks to John Nagle for candid discussions on the problem
  and reviewing the draft document. Thanks to Gehri Grimaud at Utah
  State University for introducing the author to FreeBSD and helping to
  run experiments. And to Miles Johnson at USU, Richard J. Letts at
  Salford University in the UK and Diana Osborn at San Diego State
  University for reading the rough draft of this document.

  7.0 References

  [TCP:1] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January
  1984.

  [TCP:2] "Requirements for Internet Hosts -- Communication Layers", R.
  Brandon RFC-1122, October 1989.

  [TCP:3] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-
  88, August 1988.

  8.0 Author's address

  Joe R. Doupnik
  Dept of Electrical and Computer Engineering
  Utah State University
  Logan, Utah 84322
  Phone: (801) 797-2982
  Email: jrd@cc.usu.edu

  Full Copyright Statement

  "Copyright (C) The Internet Society (1999). All Rights Reserved. This
  document and translations of it may be copied and furnished to
  others, and derivative works that comment on or otherwise explain it
  or assist in its implementation may be prepared, copied, published
  and distributed, in whole or in part, without restriction of any
  kind, provided that the above copyright notice and this paragraph are
  included on all such copies and derivative works. However, this
  document itself may not be modified in any way, such as by removing
  the copyright notice or references to the Internet Society or other
  Internet organizations, except as needed for the purpose of
  developing Internet standards in which case the procedures for
  copyrights defined in the Internet Standards process must be
  followed, or as required to translate it into languages other than
  English.

  The limited permissions granted above are perpetual and will not be
  revoked by the Internet Society or its successors or assigns.

  Doupnik                                                      Page 14

  A new TCP transmission policy replacing Nagle mode         [Page 15]

  This document and the information contained herein is provided on an
  "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
  TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
  BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
  HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
  MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."

  Doupnik                                                      Page 15