INTERNET-DRAFT Lan Wang Expires: December 1999 Andreas Terzis Lixia Zhang UCLA June 1999 RSVP Refresh Overhead Reduction by State Compression Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract As a soft-state protocol, RSVP specifies that each RSVP node sends peri- odic RSVP messages for each existing RSVP session. The overhead due to such periodic refreshes goes up linearly with the number of active RSVP sessions. One can reduce the overhead by using a longer refresh period, which unfortunately leads to longer delays in re-synchronizing RSVP state. To overcome the dilemma, this draft describes the use of a "state-compression" approach to soft-state protocol refreshes. In the absence of state changes this compression mechanism has a constant over- head while retaining the soft-state nature of RSVP. draft-wang-rsvp-state-compression-02.txt [Page 1] INTERNET-DRAFT June 1999 Changes Below is the list of changes from the previous version of this draft . - The timestamp in the MESSAGE_ID object is replaced by an ordered ID. - An example is added in section 3 to describe the procedure to com- pute a digest using a hash table and a tree structure. - Section 5.4 (state requirements) is deleted since it describes optimization rather than requirements. - Section 9 (compatibility) is merged with section 4.6 and removed from this draft. 1. Introduction As a soft-state setup protocol, RSVP specifies that each RSVP node sends control messages for each active RSVP session to each of its neighbors serving the same session. These control messages are sent immediately whenever a change of state occurs, for example when a new session starts, when the TSPEC or RSPEC parameters of an existing session change, or when a route change occurs that affects the paths of existing RSVP sessions. In addition, an RSVP node also sends periodic refresh messages to its neighbors about all existing sessions. Such periodic refreshes serve the purpose of keeping consistent RSVP reservation state along data flow paths. There are potentially a number of causes that can make the reservation state between neighbor RSVP nodes out-of-syn- chronization. For example: - An RSVP message that carries state-change information gets lost. - A rare local event inadvertently changed RSVP state values. - Other locally induced state changes occur, for example when the current Kerberos key expires, a new key is obtained which needs to be carried to the next hop by the POLICY_DATA object in the next refresh. Instead of attempting to identify all the potential causes of changes and handle them individually, RSVP simply uses periodic refreshes to persistently re-synchronize the reservation state along the path. This approach assures consistent reservation state along the data flow path, even when the routers do not have an exhaustive list of all the possible causes of state inconsistency or how to correct them individually. draft-wang-rsvp-state-compression-02.txt [Page 2] INTERNET-DRAFT June 1999 The drawback of this simple approach is that its overhead grows linearly with the number of active RSVP sessions. A common mechanism to reduce the periodic refresh overhead is by using a longer refresh period (exam- ple: RTCP, SDR announcements). A longer refresh period, however, leads to longer delays in re-synchronizing state. In this draft we propose a "state compression" mechanism for refresh overhead reduction. Our goal is to improve the efficiency of soft-state protocols in their communication behavior, not to further simplify RSVP. Instead, we apply *additional* processing (compression) to the protocol state information so that the number of refresh messages is reduced from being proportional to the number of sessions to being proportional to the number of neighbor nodes, while keeping the soft-state nature of RSVP. Section 7 presents a detailed overhead analysis of our compres- sion mechanism. 2. Refresh Overhead Reduction Even in the absence of new control information generated by sources or destinations, an RSVP node sends one message per refresh period to its neighboring nodes for each of all the RSVP sessions to keep the state consistent. Instead of sending one message per refresh period per ses- sion, one way to reduce the overhead is to compress the state of all RSVP sessions to a single digest, which can then be sent to neighbor nodes in place of all the "raw" refresh messages. All we need is an algorithm that can produce a mapping from the given set of RSVP state to a digest with a low probability of collision. Two compression algorithms have been suggested so far, CRC-32 and MD5 algorithm [MD5]. Section 8 presents a brief comparison of the two algo- rithms. In the following discussion we assume either one can be used for the purpose and simply refer it as "the compression algorithm". Whenever state inconsistency is detected (the digests of two neighbors disagree), raw RSVP messages are transmitted to re-synchronize the state. As a further optimization, we also add an acknowledgment message (ACK) to the RSVP protocol. The ACK is used to minimize the re-synchroniza- tion delay after an explicit state change request. A node can request an ACK for each RSVP message that carries state-change information, and promptly retransmit the message until an acknowledgment is received. It is important to note that the use of ACKs is to assure quick delivery of time-sensitive messages; RSVP still relies on periodic refreshes to cor- rect any potential state inconsistency that may occur even when critical messages are acknowledged, for example state inconsistency due to unde- tected bit errors, or due to undetected state changes. draft-wang-rsvp-state-compression-02.txt [Page 3] INTERNET-DRAFT June 1999 2.1. Definitions - Input Message The input to the compression function. In this document an input message contains some portion of the RSVP state stored at the local node. - Signature The result of compression function on an input message. - Digest A set of Signatures that represents a compressed version of the RSVP state shared between two neighboring RSVP nodes. The following definitions are taken from RFC 2209. - PSB Path State Block. Each PSB holds path state for a particular (ses- sion, sender) pair, defined by SESSION and SENDER_TEMPLATE objects, respectively. -RSB Reservation State Block. Each RSB holds a reservation request that arrived in a particular RESV message, corresponding to the triple: (session, next hop, Filter_spec). 2.2. Benefits and Constraints of the New Scheme The design of the new scheme aims at the following features - efficient state re-synchronization when two RSVP neighbors discover state inconsistency. - choice of individual nodes to use either original RSVP refreshes or the compressed refresh with overhead reduction, or switch in between over time to achieve the best tradeoff. - backward compatibility with the current RSVP specification. - incremental digest computation when only part of the session(s) changes state. We also recognize the necessary cost and constraints of the new scheme. draft-wang-rsvp-state-compression-02.txt [Page 4] INTERNET-DRAFT June 1999 First, to assure RSVP state consistency over reserved paths, the digest must be computed over the RSVP session state rather than over the mes- sages that are exchanged. Those messages represent partial, but not all causes of state changes, two neighbor nodes may end up in inconsistent state even when RSVP messages are all acknowledged. Furthermore, the protocol requires that the latest messages carrying state-change infor- mation get delivered as soon as possible, rather than all the messages get reliably delivered. Secondly, to reduce the refresh overhead to one refresh message per refresh period per neighbor, a digest must represent the compressed state of aggregate sessions. Thus an RSVP node must first group all RSVP sessions by each neighbor, compute the digest, and then send to the corresponding neighbor node. On the other hand, the paths of individual RSVP sessions may change over time. In the current design, RSVP PATH messages carry the destination address of the session, thus all refresh messages automatically follow the latest unicast or multicast routing changes (including multicast group membership changes). Although RSVP has an interface to the local routing module for prompt notification of route changes to minimize the adaptation delay, all RSVP reservations eventually converge on new paths even in the absence of such notifica- tions. When routers stop sending individual PATH refresh messages, however, RSVP loses its ability to automatically adjust reservations according to route changes. The proposed refresh compression schemes relies on explicit notifications to any routing changes. We consider this the major constraint of the new scheme. When two neighbor nodes of a unicast RSVP session are directly con- nected, RSVP can expect to receive notification of route changes through the interface to the local routing module. When the two nodes are interconnected through a non-RSVP cloud, however, there seems no known way to reliably detect all routing changes without the per session peri- odic refreshes. Multicast sessions present a similar problem, namely that the list of neighbors for a given session may not be known and may change dynamically as receivers leave and join the session. Further complications associated with multicast sessions also arise when a router is attached to a broadcast LAN. For example there can be both compression-capable and compression-incapable downstream neighbors on the LAN, in which case RSVP must fall back to per session periodic refreshes. Furthermore, it remains an open issue whether a router can reliably detect all changes in the downstream neighbors when a down- stream router on the broadcast LAN joins or leaves a group without affecting the list of outgoing interfaces of the associated RSVP states. Therefore this first draft on the refresh compression mechanism limits the problem space to handling unicast sessions whose paths do not cross draft-wang-rsvp-state-compression-02.txt [Page 5] INTERNET-DRAFT June 1999 non-RSVP clouds. An RSVP node resorts to the current refresh scheme for all sessions that have the "non_RSVP_cloud" bit set, or that have a mul- ticast destination address. 3. Digest Mechanism Description 3.1. Session Signatures RSVP maintains path state and reservation state for each active session [RFC2205]. A session is uniquely identified by a session object which contains the IP destination address, protocol ID and optionally a desti- nation port number of the associated data packets. A Path State Block (PSB) is comprised of a sender template (i.e. IP address and port number of the sender), a Tspec that describes the sender's traffic characteris- tics, and possibly objects for policy control and advertisements. A Reservation State Block (RSB) contains filterspecs (i.e. sender tem- plates) of the senders for which the reservation is intended, the reser- vation style and a flowspec that quantifies the reservation. It may also contain objects for policy control and confirmation. As the first step in computing a digest, an RSVP node computes a signa- ture for each session. Table 1 shows the objects that should be included in the digest computation. These objects must be fed into the computation procedure in the order as presented in the table. Although PSBs and RSBs also contain a few other fields such as incoming interface and outgoing interfaces, those fields have local meaning and may have different values at different nodes. Therefore they must be excluded from the digest computation. RSVP Structure || Objects to Include _________________||_______________________________________ || Session || session object _________________||_______________________________________ || PSB || sender template, sender tspec, || adspec, policy _________________||_______________________________________ || RSB || filterspec, flowspec, style, policy _________________||_______________________________________ Table 1. Objects to Include in Digest Computation draft-wang-rsvp-state-compression-02.txt [Page 6] INTERNET-DRAFT June 1999 3.2. State Organization for Digest Computation As a second step to efficient refresh exchanges, an RSVP node organizes all the sessions shared with each neighbor node in a hash table. Assum- ing the hash table has M slots, each RSVP session is hashed into one of the M slots. The hashing is done over some fixed session fields (e.g the session ID). If multiple sessions hash to the same slot, they cre- ate an ordered linked list, following the increasing order of IP address, protocol ID and port number in session objects. Figure 1 shows the hash table created for each neighbor. In this Figure, slot i contains a pointer to the head of the linked list of all RSVP sessions that hash to i. Assuming the total number of RSVP sessions is T, the average length of this linked list will be T/M. Slot i also con- tains a Signature that is computed by applying the compression algorithm over the concatenation of the signatures of all the sessions in the linked list. If the slot is empty, the signature is zero. Whenever any of the sessions in the list changes state, the signature of the slot must also be recomputed. <--------T/M--------> +---+ +---+ +---+ +---+ | 1 |--->| |-->| |-->| | +---+ +---+ +---+ +---+ | . | | . | | . | +---+ +---+ +---+ | i |--->| |-->|D_1| +---+ +---+ +---+ | . | | . | | . | +---+ | M | +---+ Figure 1. Session Hash Table As a third step towards efficient refresh exchanges, we propose to com- pute the digest in a structured way. To illustrate this need, suppose there are 100,000 sesions and we can transmit 80 signatures in each digest message. With a hash table size (M) of 4,000 (25 sessions in each slot on average), if we periodically exchange all the 4,000 signa- tures(one for each slot), the overhead will be 50 digest messages per refresh period. When session state changes relatively infrequent, it draft-wang-rsvp-state-compression-02.txt [Page 7] INTERNET-DRAFT June 1999 would be desirable if we can minimize this refresh overhead. Therefore, we build a N-ary tree on top of the hash table to compress the M signa- tures to no more than N signatures, where N is the number of signatures that can fit inside a single IP packet. This set of signatures is called a digest. Whenever two neighbor nodes disagree on the digest value, they can walk down specific branches of the N-ary tree to locate the portion of inconsistent state in an efficient way. The digest com- putation tree is shown in Figure 2. ^ o o o digest | /|\ /|\ /|\ h y_1 o o o o o o o o o level-1 signature | /|\ ......... /|\ v o o o ..........o o o leaf-level signatures x_1...x_N Figure 2. A Digest Tree with N=3 A node constructs the digest computation tree in the following way. The leaves of the tree are the signatures stored in the slots of the hash table. The signatures of N slots are concatenated and the compression function is applied on the compound message. The result is stored at the parent node on the tree. Looking at Figure 2, Signatures x_1, ..., x_N are concatenated and the result of the compression function is stored in node y_1. This grouping results in M/N level-1 signatures. If the num- ber of level-1 signatures is still larger than N, the node continues on to group each of N level-1 signatures to compute a level-2 signature to get M/N^2 level-2 signatures. If C_i is the number of level-i signa- tures, we repeat the grouping until C_i is smaller or equal to N. The top level signatures represent the digest of the total RSVP state. In our previous example N is 80 and M is 4,000, so we have 50 level-1 sig- natures and the final digest is comprised of these signatures. We choose the degree of the tree to be the same as the maximum number of Signatures in a digest object to simplify the data structure and partial rollback process when two nodes detect inconsistent state. Note that all RSVP session insertions and deletions are done in the hash table; the purpose of the digest computation tree is simply to compress signa- tures to fit into one packet. It is important that neighboring RSVP nodes use the same hashing table size M and digest size N (the number of signatures); inconsistency in either of these two values will lead to the failure of compression scheme. We can achieve this by standardizing the default N and M and requiring all implementations to support them for interoperability. An optional feature is to allow N and M to be configurable based on the draft-wang-rsvp-state-compression-02.txt [Page 8] INTERNET-DRAFT June 1999 link MTU and the expected number of active sessions. However, either human or automatic negotiation is needed for both ends to agree on the configured value. For simplicity, this draft assumes that neighbors have reached consensus on the value of N and M. A digest needs to be recomputed when the following events occur: o a new session is added; o an existing session is changed, i.e. a state is added to a session, or a state is modified or deleted from a session; o an existing session is deleted. The cost of digest computation is summarized in section 7. 3.3. An Example of Digest Computation In this section, we use a simple example to illustrate how we compute and update a digest tree. T_20 T_21 / \ / \ digest tree (N = 2) T_10 T_11 T_12 T_13 / \ / \ / \ / \ +----+----+----+----+----+----+----+----+ |T_00 T_01 T_02 T_03 T_04 T_05 T_06 T_07| hash table (8 slots) +----+----+----+----+----+----+----+----+ SS10 SS19 SS12 SS13 SS14 SS25 SS16 SS17 sessions hashed to each slot SS18 SS22 Figure 3. A Digest Tree on top of a Hash Table Suppose a node currently shares 10 sessions with a neighbor and it uses a small hash table with 8 slots. For simplicity, the sessions' addresses are of the format 1.2.3.j (j = 10, 12, 13, 14, 16, 17, 18, 19, 22, 25) and we call the signature of session j SSj. The node first maps the sessions into the hash table (Figure 3 shows the result of the map- ping). It then computes a signature for each slot over the session sig- natures contained in that slot. For example, T_00 is obtained by com- pressing the concatenation of SS10 and SS18. Now we have signatures T_00 to T_07, but our goal is to reduce these eight signatures to two signatures assuming N is 2. Therefore, we separate them into groups of 2 and compute a signature for each group. This procedure is repeated twice to produce the final digest (T_20 and T_21). Now we look at the procedure to update this structure when this node receives a path message with a destination address of 1.2.3.15 which creates a new path state. In addition to forward this path message to draft-wang-rsvp-state-compression-02.txt [Page 9] INTERNET-DRAFT June 1999 its neighbor, the node calculates the signature of this session (SS15) and inserts the new session into the hash table. Let's assume this ses- sion is mapped to slot 5. Since the new session's address is smaller than that of session 25, it's inserted before session 25. Next, the node recalculates T_05, T_12 and T_21, i.e. all the signatures on the path from the leaf signature (slot 5's signature) to the corresponding root. To delete a session, the node first needs to locate the session's slot and remove it from the slot. The procedure to update the digest tree is the same as that of adding a session. 3.4. Neighbor Data Structure An RSVP node needs to compute two digests for each neighbor, one digest for the states that are refreshed by messages from that neighbor and one for the states that the local node sends refresh messages to that neigh- bor. We refer to the two digests as InDigest and OutDigest, respec- tively. OutDigest is used in the Digest message and the InDigest is sent in a DigestErr message to the corresponding neighbor. Section 4 specifies the format of the new RSVP message types proposed for our scheme. To avoid having to traverse all the RSVP sessions to compute the digest for a specific neighbor, a router maintains a data structure containing pointers to all the sessions shared with the neighbor. An example of the per neighbor structure is shown in Figure 4. The Neighbor structure holds all the information needed to send and receive digests to/from an RSVP neighbor node. A description of each field in the Neighbor structure follows: IP Address Holds the IP address of the interface this neighbor uses to communi- cate with the local node. OutSession This field will point to the hash table holding the sessions to be included in the computation of the digest the local node is sending to this neighbor. The sessions included are those that have PSBs whose next downstream hop is the neighbor in question and RSBs that have that neighbor as the immediate upstream hop. OutDigest draft-wang-rsvp-state-compression-02.txt [Page 10] INTERNET-DRAFT June 1999 A set of pointers that point to the top level signatures of the Digest tree for the neighbor. RefreshOutTimer Contains the next time a digest should be sent to this neighbor. Neighbor Struct +----------------- + | IP Address | +----------------- + | OutSession |--->Hash table for outgoing sessions +----------------- + | OutDigest | +----------------- + | RefreshOutTimer | +----------------- + | IDLastSent | +------------------+ | OutDigestTimeout | +------------------+ | InSession |--->Hash table for incoming sessions +------------------+ | InDigest | +------------------+ | CleanupInTimer | +------------------+ | ... | +------------------+ Figure 4. Neighbor Structure IDLastSent Contains the Message_ID of the latest digest sent to this neighbor. OutDigestTimeout Contains the timeout period in milliseconds for digests sent to this neighbor. If an acknowledgment is not received by the end of this period, the digest should be retransmitted. draft-wang-rsvp-state-compression-02.txt [Page 11] INTERNET-DRAFT June 1999 InSession This field will point to the hash table holding the sessions that are refreshed by digests received from this neighbor. Specifically, the table includes sessions whose PSBs have this neighbor as previous hop and sessions whose RSBs have this neighbor as next hop. InDigest A set of pointers that point to the top level signatures of the Digest tree for the neighbor. CleanupInTimer If no digest is received from this neighbor by the time contained in the CleanupInTimer field, all associated state should be cleaned up. 3.4.1. SessDigest Structure The SessDigest structure contains information about a session that has a specific neighbor as a Next Hop. The SessDigest structure has the fol- lowing fields: Session* A pointer to the portion of the RSVP state that holds information for the session represented by this structure. Next SessDigest When multiple sessions map to the same hash table bin, this field is used to point to the next object in the linked list of SessDigest objects. Otherwise this field SHOULD be NULL. P/R This field indicates whether the PSB or the RSB of a Session will be used in the signature operation. Signature Contains the computed Signature for this session. draft-wang-rsvp-state-compression-02.txt [Page 12] INTERNET-DRAFT June 1999 Session Struct ^ | | +-----------------+ | Session* | +-----------------+ | Next SessDigest |---->.. +-----------------+ | P/R | +-----------------+ | Signature | +-----------------+ Figure 5. The SessDigest structure 3.5. Neighbor Discovery To use the compressed refresh scheme, a router needs to discover all the neighbors that are capable of exchanging digests (compression-capable). There are two ways this can be achieved: 1. If a node receives a raw RSVP message containing a MESSAGE_ID object with the D flag turned on from an unknown neighbor, that neighbor is assumed to be compression capable and a new Neighbor data structure is created for it. 2. When an RSVP router starts sending (raw) RSVP messages of a ses- sion, it should request that the neighbor(s) acknowledge the mes- sages by including a MESSAGE_ID object with the ACK_Requested flag on (see section 4.1). If the local router receives an ACK message with the D bit on (see Section 4.1) from a neighbor whose address is not currently in the list of Neighbor structures, it has discov- ered a new compression-capable neighbor. If it receives an error message from a neighbor who does not understand the new MESSAGE_ID object or an ACK message containing a MESSAGE_ID object with the D bit turned off, it has instead discovered a neighbor that is com- pression-incapable. When a non-RSVP cloud exists between two RSVP neighbor nodes, although they can discover each other using the acknowledgments during initial message exchanges, the upstream node may not be able to detect a change of the downstream neighbor caused by a unicast route change. As shown in Figure 6, node A originally has a downstream neighbor B for a partic- ular unicast RSVP session. After a route change, node C becomes A's downstream neighbor. However, since A's outgoing interface is unchanged, A will not notice the route change, hence it will continue to draft-wang-rsvp-state-compression-02.txt [Page 13] INTERNET-DRAFT June 1999 include the path state of sender S when calculating the digest to B. Node B will not be aware of the change either as long as A sends B the same digest. C may never get a path message from A. Therefore, the digest scheme cannot be used crossing non-RSVP clouds until an effective way of detecting route changes is found. RSVP routers can detect the existence of non-RSVP clouds between neighbors by comparing the TTL value in the IP header with the Send_TTL in the RSVP common header, as described in [RFC2209]. original route ___________ ---> -> ->| |-- B ----| -> S - - --- A --| non-RSVP | |------- D | cloud | | |___________|-- C ----| new route ---> Figure 6. An RSVP Session with a non-RSVP cloud 3.6. Digest Refreshes RSVP states are refreshed by regular RSVP messages as well as by Digest messages. However, different from a regular RSVP message which refreshes the state of one RSVP session, a digest message may carry mul- tiple Signatures, each representing the compressed state of multiple sessions. If an individual signature in the digest matches the corre- sponding local value, all the RSVP states covered by that signature should be refreshed. That is the local node should carry out the same operation as if one refresh message for each of all the covered sessions has been received. The refresh rate of digest messages can be different from the refresh rates of individual sessions covered by that digest. For digest mes- sages, the default refresh rate is 30 seconds as RFC 2205 suggests. If a different refresh rate is used, it SHOULD be carried in the TIME_VAL- UES object of the digest message. 4. New RSVP Message Types and Objects draft-wang-rsvp-state-compression-02.txt [Page 14] INTERNET-DRAFT June 1999 4.1. MESSAGE_ID object MESSAGE_ID Class = Value to be assigned by IANA of form 0bbbbbbb MESSAGE_ID object Class = MESSAGE_ID Class, C_Type = 1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |A|D| Flags | Epoch | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message_ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Flags: 8 bits The A bit indicates that the sender is requesting a message acknowl- edgment. This flag MUST be set when the message is a Digest message. It MUST not be set in an ACK or DigestErr message. Compression capable nodes set the D bit on in the MESSAGE_ID objects they send. The receiver of a MESSAGE_ID object that has the D bit turned on can assume that the neighbor which sent the message is com- pression capable and willing to receive Digest messages as refreshes. Epoch: 24 bits This field is used to detect node reboots and contains a random value initialized at boot time. All messages sent between reboots should contain the same value. When a node receives two messages containing different Epoch values it can safely deduce that the node on the other side has rebooted and that it has to purge all the information related to the digest received from that node. Furthermore all ses- sions to this node will have to be refreshed using raw RSVP messages, before digests can be exchanged again. Message_ID: 32 bits When combined with the message generator's IP address, the Message_ID field uniquely identifies a message. This field is ordered and only decreases in value when the Epoch changes or the value wraps. draft-wang-rsvp-state-compression-02.txt [Page 15] INTERNET-DRAFT June 1999 4.2. DIGEST Object DIGEST Class = Value to be assigned by IANA of form 10bbbbbb DIGEST object Class = DIGEST Class, C_Type = 1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Level | Group | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Number of Signatures | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | // signature list // | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Level: 8 bits This is the level of this set of signatures in the corresponding digest tree. The level of a signature is defined as follows: 1. Signatures belonging to individual sessions inside a hash table bin are assigned level -1; 2. Leaf signatures are assigned level 0; 3. The level of a non-leaf signature is (1 + the level of its imme- diate children). Group: 24 bits This is the group this set of signatures belong to (recall we divide signatures into groups of size N and compute a signature for each group to build a tree). We number the groups at the same level of a digest tree increasingly from left to right and the left-most group is assigned group 0. Level and group uniquely identify the location of a set of signatures in a digest tree. Reserved: 16 bits This field is reserved. It MUST be set to zero on transmission and MUST be ignored on receipt. draft-wang-rsvp-state-compression-02.txt [Page 16] INTERNET-DRAFT June 1999 Number of Signatures: 16 bits Number of Signatures contained in this object. Signature: The result of applying the compression algorithm to a session or a group of sessions. 4.3. Digest Message The format of a Digest message is as follows: ::= [ ] [ ] A Digest message has a common RSVP header with the message type set to 14. It MUST contain a MESSAGE_ID object with the ACK_Requested flag set and a digest object. The TIME_VALUES object may be included so that a router can inform its neighbor what timer value to use to time out states when using a digest scheme. 4.4. ACK Message The format of an ACK message is as follows: ::= [ ] [ ... ] An ACK message has a message type of 15. It MUST contain at least one MESSAGE_ID object. With multiple MESSAGE_ID objects, the ACK message can be used to acknowledge multiple messages. 4.5. DigestErr Message A DigestErr message has the following format: draft-wang-rsvp-state-compression-02.txt [Page 17] INTERNET-DRAFT June 1999 ::= [ ] The message type of DigestErr message is 16. It MUST contain a MES- SAGE_ID object and a DIGEST object. A DigestErr message acts as a nega- tive acknowledgment to a Digest message. The MESSAGE_ID object identi- fies the Digest message being acknowledged. The DIGEST object carries the local digest value of the sender of the DigestErr message. 4.6. Backward Compatibility The class number of MESSAGE_ID object is of the form 0bbbbbbb. Accord- ing to RFC 2205, the receiver of a message containing this object should send back an error message (PathErr or ResvErr depending on the original message) if it does not understand the object. Upon receiving the error message, the sender should do the following things: 1. Withhold sending digests to the receiver; 2. continue regular RSVP refreshes. 3. withhold sending MESSAGE_ID object in future messages until it receives from that neighbor a message containing a MESSAGE_ID object; Regarding Digest messages, a node SHOULD start sending Digest messages only when it discovers that its particular peer is compression-capable using the procedure outlined in Section 3.5. 5. RSVP Message Operations 5.1. Non-digest RSVP Messages A node sends non-digest RSVP messages in the same way as before, with the new option of asking for an ACK. To request an acknowledgment, the node sends the message with a MESSAGE_ID object and sets the ACK_Requested bit in the object. It also sets a retransmission timer. When the neighbor receives a message that requires acknowledgment, it processes the message as usual. If the processing does not generate an error message, it sends back an ACK message which contains the MES- SAGE_ID object of the message being acknowledged. If there is any change in the state, it then updates the signature. If no ACK is received after timeout, the node retransmits the message up to a configured number of times. The retransmission period SHOULD be shorter than regular refresh period and be proportional to the round- draft-wang-rsvp-state-compression-02.txt [Page 18] INTERNET-DRAFT June 1999 trip time between the two neighboring nodes. Each retransmission MUST carry the same MESSAGE_ID as that in the original message. If the cor- responding state is changed before an ACK is received, the original mes- sage becomes obsolete and no longer needs to be retransmitted. If the node does not receive an ACK after all the retransmissions, then incon- sistencies caused by the possible loss of this message will be resolved later during the recovery process discussed in Section 6. If a node receives an error message, it needs to check if the error is triggered by a message that is expecting an ACK. If so, any pending retransmission of this message should be canceled. To facilitate iden- tification of the original message, an error message may include the same MESSAGE_ID as carried in the original message. 5.2. Digest Message Once a compression-capable node discovers a compression-capable neigh- bor, instead of sending one refresh message for each PATH/RESV state, it sends a digest, in one RSVP message per refresh period to each of the neighbors. Special treatments for RSVP states containing ADSPEC and POLICY_DATA objects are discussed in section 5.3. Each digest MUST carry a MESSAGE_ID object with the ACK_Requested flag set. When the neighbor receives the Digest message, it first locates the cor- responding set of signatures in the local digest tree and then performs a binary comparison between each signature in the message and that in the local digest tree. o If there is a mismatch between any of the signatures received and the corresponding local signature, the node should respond with a DigestErr message containing the local digest value. Note that if there is any matching signature, RSVP states represented by that signature should be refreshed. o Otherwise, the receiver is in a consistent state with the sender, so it sends an ACK message and refreshes all the states shared with the sender. Digest messages are also retransmitted for a maximum number of times in the absence of ACK messages. However, following the original RSVP design where an RSVP node never stops sending refresh messages for each active session, a node should not stop sending digest refreshes even if it fails to receive an acknowledgment in the previous refresh interval. If the peer node crashed and becomes alive again, it will find the digest value different from its own and the two routers will start the re-syn- chronization process. When the digest value is changed, the node needs to cancel any pending retransmission of the obsolete Digest message and promptly send a Digest message with the new digest value. draft-wang-rsvp-state-compression-02.txt [Page 19] INTERNET-DRAFT June 1999 5.3. Refresh of ADSPEC and POLICY_DATA An RSVP message may carry an ADSPEC object that contains resource infor- mation used by receivers to predict the end-to-end service. This object is updated hop-by-hop to gather information along a route. Since it is opaque to RSVP, RSVP simply records the received ADSPEC in the corre- sponding state and, whenever it needs to send a refresh message for a state, it calls the traffic control module with the recorded ADSPEC to get an updated ADSPEC object. Therefore, the ADSPEC object sent by a node may be different from that received by the node. POLICY_DATA object is handled in the same way except that it is submitted to the policy control module. To handle ADSPEC or POLICY_DATA objects correctly with the digest scheme, we need to solve the following two problems. First, for each session with ADSPEC or POLICY_DATA object, we must trigger periodic updates of ADSPEC or POLICY_DATA objects. Second, since ADSPEC and POL- ICY_DATA are opaque to RSVP, RSVP cannot tell whether the objects returned by the traffic control (or policy control) module changes after each call. For the first problem, we treat digest refreshes the same way as regular RSVP refresh messages. In other words, before a node sends a digest, the node calls corresponding modules to update these objects. If the traffic and/or policy control modules provide trigger mechanisms that asyn- chronously notify the RSVP module when changes occur then this step will be avoided. There are three possible solutions to the second problem. First, a node can simply assume that the object returned by the traffic control (or policy control) module changes after each call at refresh period, which represents a change in RSVP state and hence a regular RSVP PATH state should be sent to carry the new object value to the next hop. This approach requires no changes to RSVP specification but it essentially falls back to the original RSVP refresh mechanism whenever ADSPEC or POLICY_DATA object exists in an RSVP session. The second option is to modify the RSVP/traffic control and RSVP/policy control interfaces so that they return a change bit that indicates whether the updated object is different from the previously returned one. The previously returned object can be an argument passed to the interface. For example, the original RSVP/traffic control interface for ADSPEC is Call: TC_Advertise( Interface, Adspec, Non_RSVP_Hop_flag ) -> New_Adspec, draft-wang-rsvp-state-compression-02.txt [Page 20] INTERNET-DRAFT June 1999 where Adspec is the ADSPEC from received RSVP messages. We can change it to Call: TC_Advertise( Interface, Adspec, Old_Adspec Non_RSVP_Hop_flag ) -> New_Adspec, change bit, and Old_Adspec is the ADSPEC object returned by the previous call. When the changed bit is set, a regular RSVP refresh must be sent. The third option is to let RSVP perform a binary comparison between the previously and newly returned objects after each call to traffic control (policy control) module to find out whether the object has changed. This third option can be a short-term fix while we work on the proposed changes in RSVP interfaces to other modules. If POLICY_DATA objects are encoded, for replay attack prevention reasons, in a way that makes iden- tical POLICY_DATA objects look as different binary objects this third option will deteriorate to the first option, i.e. a refresh message will be sent, for every POLICY_DATA object returned by the policy control module. 6. RSVP State Re-synchronization Two RSVP neighbors may become out-of-sync due to a number of reasons. - A state-changing RSVP message is lost, and the sender did not ask for ACK. - A neighbor crashed and lost part or all of its state. - Other unknown reasons. Receipt of a DigestErr message indicates inconsistency between two nodes. The MESSAGE_ID and digest value in the DigestErr message help the two neighbors to localize the problem. o If the Message_ID acknowledged is smaller than the Message_ID of the last Digest message sent (IDLastSent), this error message is for an obsolete message. This message should be ignored since it may not represent the current state of the neighbor. o If the Message_IDs are equal, the node should start the recovery process we describe next. The node then compares the signatures contained in the DigestErr mes- sage. When it finds the first mismatching signature (call it S_1), it sends a Digest message containing the lower level signatures in the digest tree used to compute S_1. A DigestErr is expected for this Digest message since at least one of the children signatures should not draft-wang-rsvp-state-compression-02.txt [Page 21] INTERNET-DRAFT June 1999 match. The node again looks for the first mismatching signature (S_2) in the second DigestErr message and sends the children of S_2 in a Digest message. This procedure is repeated until the leaf signature (S_h) (h=log_N(M), see Figure 2) causing the problem is found. Now, the node knows that one or more of the sessions in that hash table slot (represented by S_h) must be inconsistent with those in the neighbor. It can then either locate these sessions by exchanging the session signa- tures with the receiver or it can simply send raw refreshes for all the sessions in that particular bin. After refreshing these sessions, the node re-examines S_(h-1) (the parent of S_h) for other inconsistencies and continues to traverse the tree until all the mismatching sessions are located and refreshed. Notice there is a tradeoff between the latency of the recovery procedure and the transmission efficiency. For example, if the tree has many lev- els, many RTTs are needed to exchange the digests at all the tree levels in order to find the leaf-level sessions that contribute to the incon- sistency. However, if speed of convergence is more important than effi- ciency, one can stop at an intermediate tree level and refresh all the states represented by the mismatching signature at that level. 7. Computation Costs We have presented in Section 3.2 the structure used to support efficient and incremental computation of digests. This structure consists of two parts: a hash table that stores the Signatures of sessions shared with a particular neighbor and an N-ary tree used to reduce the number of sig- natures to a number that can be sent inside a single packet. In this section we focus on the operations applied to these data structures and analyze their requirements in terms of processing. We begin with some definitions. Let the number of sessions be T, the size of the hash table be M and the number of Signatures inside the digest message be N. Let's further define the cost of computing the com- pression function on a message of size x to be f(x). To determine the behavior of f(x), we have to study each algorithm's behavior. Summariz- ing the description in RFC 1321 [MD5], the MD5 algorithm divides the input message to 64-byte blocks and applies a four-step process to each one of these 64-byte blocks. In each of the four steps, sixty-four bit- wise logical operations are applied to that 64-byte block. The results of the computation on the n-th block are used as input for the computa- tion of the (n+1)-th block. After all the blocks have been processed, the message's digest is produced. From this description, one can see that f(x) is a linear function of x, the size of the input message mea- sured in bytes. A similar analysis applies for CRC-32, so the cost of computing the CRC-32 checksum of a message with size x is also linear on draft-wang-rsvp-state-compression-02.txt [Page 22] INTERNET-DRAFT June 1999 the size of x but with smaller constants. When a session is modified, a new signature for that session as well as a new digest has to be computed. To illustrate this procedure, imagine that we want to update session D_1 inside the hash table of Figure 1. First, we look up the session inside the hash table (a possible opti- mization here is to store the hash table index where the session is stored rather than re-computing the hash function every time the session is modified). In our example, we would come up with the index i. If mul- tiple sessions map to the same hash table slot, we traverse the linked list of sessions until we find the session in question. Once the session is found and its new Signature is computed, we have to compute the new Signature stored at the base of the linked list which represents all the sessions mapped to that hash table slot. On the average Ceiling[T/M] sessions will occupy the same slot. The total time needed for this oper- ation is therefore f(D* Ceiling[T/M]), where D is the size of the signa- ture (16 bytes for MD5 and 4 bytes for CRC-32). We assume that the linked-list lookup time which is O(Ceiling[T/M]) is small compared to the time needed to update the Signatures and therefore we don't include it in our calculations. The next step is to update the values on the digest tree. We begin by computing the Signature of the contents of slot i concatenated with its N-1 siblings which will be stored in their par- ent node on the digest tree. We continue this procedure until we reach the top of the tree. Since there are log_N(M) levels on the tree and at each level we apply the compression algorithm on a message of size D*N (the combined size of N Signatures), the time spent during this step is N*(log_N(M)-1)*f(D). Notice that the term is log_N(M)-1 since we do not calculate a Signature out of the N topmost Signatures. Also f(D*N) = N*f(D) since f(x) is linear on x. From the discussion above, we can conclude that the total time needed to calculate the new digest after a session is modified is given by the following formula, where S is the size of a session in bytes: f(S) + f(D * Ceiling[T/M]) + N*(log_N(M)-1) * f(D) [1] When a new session has to be inserted in the hash table, we locate the slot this session hashes to and insert the session to that slot's linked list, if one exists. Given that the list is ordered, the new session has to be inserted in order inside the list, which means traversing the list until we find a session whose destination address is larger than the destination address of the session we want to add and inserting the new session before that session. Deleting a session, involves finding the slot it hashes to, searching for it inside the linked list, and ``splic- ing'' its predecessor to its successor on the list. draft-wang-rsvp-state-compression-02.txt [Page 23] INTERNET-DRAFT June 1999 The computation cost for the creation of the new digest after an inser- tion or deletion operation, is almost identical to the update cost. The only difference is that in the case of deletion we don't calculate the Signature of the session (since we are deleting it) but only calculate the combined Signature of the rest of the sessions on the list. Equa- tions 2 and 3 respectively, show the insertion and deletion costs. f(S) + f(D * Ceiling[T/M]) + N*(log_N(M)-1) * f(D) [2] f(D * Ceiling[T/M]) + N*(log_N(M)-1) * f(D) [3] We can see from Equations 1, 2 and 3 that when the size M of the hash table is small compared to the number of sessions T, the cost of updat- ing the linked list of sessions will be linear to T. In this case, updating the linked list becomes the most expensive operation, forcing the total cost to also be linear to T. The size M of the hash table should therefore be comparable to the expected T to avoid increased update times. 8. Compression Mechanism Comparison We mentioned earlier that two possible candidates for the compression function are the CRC-32 checksum algorithm and the MD-5 digest algorithm [MD5]. The main task of the compression function is to produce a one-to- one mapping between sets of sessions and digests. It is therefore important that the probability of two different configurations producing the same digest (collision probability) is negligible. The compression scheme fails when the digest trees created at the two neighbors contain different sessions but produce the same digest, i.e the same set of N top level Signatures. In this section we compute the probability of failure as a function of the size of signatures produced by the compression algorithm. Specifi- cally we compute the probability a single bit difference in the RSVP state shared between the two neighbors is not detected. Let's assume that the maximum packet size that can be sent un-fragmented is L bits. Then if the size of the signature is x bits, N, the number of Signatures in a digest, is L/x. As before, let's further assume that the size of the hash table is M. To make the calculations easier we actu- ally compute the probability that the error will be detected, P(suc- cess). The failure probability is of course 1-P(success). The probability that the error will be detected is the probability that the modified session will produce a different Signature AND the list of draft-wang-rsvp-state-compression-02.txt [Page 24] INTERNET-DRAFT June 1999 other sessions contained in the same slot with the modified session will create a different Signature AND all the nodes on the path of the digest tree from that hash table bin to one of the roots of the digest tree will create different Signatures. Let's calculate calculate each of these probabilities first. Assuming that the compression function is perfect, i.e. it distributes messages evenly over the set of Signatures, the probability that the changed session will create a different Signature is (1-1/2^x). In a similar fashion the probability that at each level of the digest tree a different Signature will be produced is again (1-1/2^x). [Note: one might observe that CRC-32 detects all single bit differences and there- fore in this case the probability of detection in the first step is one. While this is correct, the goal of our simplified analysis is to compare the relative performance of the two schemes without focusing on the specifics of each compression algorithm.] Since the compression function produces a (pseudo-)random set of bits, the events described above are independent and therefore to compute the desired probability we take their product. Hence the probability that the error will be detected is: P(success) = (1-1/2^x)^(h+1) [4] and P(failure) = 1 -P(success) = 1 - (1-1/2^x)^(h+1) [5] Where h is the height of the digest tree, h = log_N(M). To get a better understanding, we substitute the parameters in Equation [5] with some values. For M=10000, L=10000 bits and x=32 (CRC-32), P(failure) = 3.74*10^-9 while for x=128 (MD5) P(failure)=0. From this numerical example we can see that for single-bit differences, MD5 has infinitesimal probability of failure while at the same time CRC-32 provides high assurance that errors will be detected. Multiple bit changes present a slightly different picture. While the detection probability in the case of MD5 is independent of the number of changed bits, for CRC-32 the probability of detection decreases when the number of changed bits increases. The conjecture is therefore that CRC-32 will perform worse than MD5 for multiple bit differences. On the other hand the smaller size of the CRC-32 Signature means that N will be larger and the height h of the digest tree will be significantly shorter. Finally, computing the CRC-32 checksum is a lot faster than the draft-wang-rsvp-state-compression-02.txt [Page 25] INTERNET-DRAFT June 1999 MD5 computation. 9. Summary Due to the scaling requirements from setting up large numbers of RSVP reservations, such as the case in MPLS traffic engineering, a few recent proposals suggested to slow down or stop periodic state refreshes in RSVP. We believe that stopping refreshes entirely represents a funda- mental departure from the soft-state approach adopted in RSVP design. As we have argued at a few places in this draft, reliable RSVP message delivery alone is neither necessary nor sufficient for assuring consis- tent reservation state inside the network. To keep the soft-state approach while making the protocol scale with the number of sessions, the proposed state compression mechanism described in this draft pays the cost of computing and maintaining a tree of ses- sion signatures. When sessions or session parameters change, maintain- ing the tree of session signatures has a computational overhead propor- tional to T/M (T is the total number of sessions and M is a constant comparable to T). When large numbers of RSVP sessions exist in a network, we believe that the proposed scheme brings substantial performance gain from reduced refresh overhead, especially when most of the sessions are relatively stable. However the feasibility of using this approach in a highly dynamic environment where sessions are constantly added and deleted remains an open issue. 10. IANA Considerations IANA must provide three new Msg Types for the three new messages defined in this draft: Digest, ACK and DigestErr. Also IANA must provide Class Numbers for the objects defined here. Specifically: The MESSAGE_ID object requires a Class-Num value of the form 0bbbbbbb. The DIGEST object requires a a Class-Num value of the form 10bbbbbb. 11. Security Considerations No new security issues are raised in this document. See [RFC2205] for a general discussion on RSVP security issues. draft-wang-rsvp-state-compression-02.txt [Page 26] INTERNET-DRAFT June 1999 12. Acknowledgments The phrase "overhead reduction by state compression" was suggested by Van Jacobson of cisco. Vern Paxson suggested the use of hashing to sim- plify Digest computation. Our design also benefited from discussion with Steve McCanne of UC Berkeley regarding the roles of acknowledgment in soft-state protocol design. We thank Hilarie Orman for her insight- full comments on MD5 and CRC-32. We also received valuable feedback from Bob Braden and Steven Berson of ISI. 13. References [REDUCT] Lou Berger et al, "RSVP Refresh Reduction Extensions", draft- berger-rsvp-refresh-reduct-01.txt, April 1999. [MD5] R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321, April 1992. [RFC2205] Braden, R. Ed., Zhang, L., et al, "Resource ReserVation Proto- col -- Version 1 Functional Specification", RFC 2205, September 1997. [RFC2209] Braden, R., Zhang, L, "Resource ReSerVation Protocol (RSVP) -- Version 1 Message Processing Rules", RFC 2209, September 1997. 14. Authors' Addresses Lan Wang UCLA 4805 Boelter Hall Los Angeles, CA 90095 Phone: 310-267-2190 Email: lanw@cs.ucla.edu Andreas Terzis UCLA 4677 Boelter Hall Los Angeles, CA 90095 Phone: 310-267-2190 Email: terzis@cs.ucla.edu Lixia Zhang UCLA draft-wang-rsvp-state-compression-02.txt [Page 27] INTERNET-DRAFT June 1999 4531G Boelter Hall Los Angeles, CA 90095 Phone: 310-825-2695 EMail: lixia@cs.ucla.edu draft-wang-rsvp-state-compression-02.txt [Page 28]