Internet Engineering Task Force A. Lange INTERNET DRAFT Cable & Wireless June 2003 Expires December 2003 Issues in Revising BGP-4 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document records the issues discussed and the consensus reached in the Interdomain Routing (IDR) Working Group during its efforts to revise and bring up to date the base specification for the BGP-4 protocol. Table of Contents Status of this Memo......................................... 1 Abstract.................................................... 1 1. Introduction............................................. 5 2. The Issues from -17 to -18............................... 5 2.1 IDR WG Charter.......................................... 5 2.2 TCP Port................................................ 6 2.3 FSM wording for what state BGP accepts connections in... 7 2.4 BGP Identifier/Router ID................................ 7 2.5 Direct EBGP Peers....................................... 7 Lange [Page 1] INTERNET DRAFT May 2003 2.6 Disallow Private Addresses.............................. 8 2.7 Renumber Appendix Sections.............................. 8 2.8 Jitter Text............................................. 8 2.9 Reference to RFC904 - EGP Protocol...................... 13 2.10 Extending AS_PATH Attribute............................ 13 2.11 Rules for routes from Loc-RIB to Adj-RIB-Out - Section 9.1............................................ 15 2.11.1 Rules for routes from Loc-RIB to Adj-RIB-Out - Section 9.1.3........................................ 17 2.11.2 Rules for routes from Loc-RIB to Adj-RIB-Out - Section 2............................................ 18 2.11.3 Documenting IBGP Multipath........................... 20 2.12 TCP Behavior Wording................................... 24 2.13 Next Hop for Originated Route.......................... 24 2.14 NEXT_HOP to Internal Peer.............................. 25 2.15 Grammer Fix............................................ 25 2.16 Need ToC, Glossary and Index........................... 26 2.17 Add References to other RFC-status BGP docs to base spec................................................... 26 2.18 IP Layer Fragmentation................................. 26 2.19 Appendix Section 6.2: Processing Messages on a Stream Protocol............................................... 27 2.20 Wording fix in Section 4.3............................. 28 2.21 Authentication Text Update............................. 28 2.22 Scope of Path Attribute Field.......................... 29 2.23 Withdrawn and Updated routes in the same UPDATE message 29 2.24 Addition or Deletion of Path Attributes................ 31 2.25 NEXT_HOP Semantics..................................... 32 2.26 Attributes with Multiple Prefixes...................... 32 2.27 Allow All Non-Destructive Messages to Refresh Hold Timer.................................................. 33 2.28 BGP Identifier as Variable Quantity.................... 34 2.29 State Why Unresolveable Routes Should Be Kept in Adj-RIB-In............................................. 34 2.30 Mention Other Message Types............................ 35 2.31 Add References to Additional Options................... 36 2.32 Clarify EGP Reference.................................. 36 2.32.1 EGP ORIGIN Clarification............................. 37 2.32.2 BGP Destination-based Forwarding Paradigm............ 41 2.33 Add "Optional Non-Transitive" to the MED Section....... 45 2.34 Timer & Counter Definition............................. 45 2.35 Fix Typo............................................... 46 2.36 Add Adj-RIB-In, Adj-RIB-Out and Loc-RIB to the Glossary 46 2.37 Combine "Unfeasible Routes" and "Withdrawn Routes"..... 46 2.38 Clarify Outbound Route Text............................ 48 2.39 Redundant Sentence Fragments........................... 49 2.40 Section 9.2.1.1 - Per Peer vs. Per Router MinRouteAdvertisementInterval.......................... 50 Lange [Page 2] INTERNET DRAFT May 2003 2.41 Mention FSM Internal Timers............................ 50 2.42 Delete the FSM Section................................. 51 2.43 Clarify the NOTIFICATION Section....................... 51 2.44 Section 6.2: OPEN message error handling............... 52 2.45 Consistent References to BGP Peers/Connections/Sessions 54 2.46 FSM Connection Collision Detection..................... 55 2.47 FSM - Add Explicit State Change Wording................ 57 2.48 Explicitly Define Processing of Incoming Connections... 57 2.49 Explicitly Define Event Generation..................... 61 2.50 FSM Timers............................................. 62 2.51 FSM ConnectRetryCnt.................................... 62 2.52 Section 3: Keeping routes in Adj-RIB-In................ 63 2.53 Section 4.3 - Routes v. Destinations - Advertise....... 64 2.54 Section 4.3 - Routes v. Destinations - Withdraw........ 65 2.55 Section 4.3 - Description of AS_PATH length............ 67 2.56 Section 6 - BGP Error Handling......................... 68 2.57 Section 6.2 - Hold timer as Zero....................... 70 2.58 Deprecation of ATOMIC_AGGREGATE........................ 71 2.59 Section 4.3 - Move text................................ 79 2.60 Section 4.3 - Path Attributes.......................... 80 2.61 Next Hop for Redistributed Routes...................... 81 2.62 Deprecate BGP Authentication Optional Parameter from RFC1771................................................ 83 2.63 Clarify MED Removal Text............................... 87 2.64 MED for Originated Routes.............................. 93 2.65 Rules for Aggregating with MED and NEXT_HOP............ 93 2.66 Complex AS Path Aggregating............................ 94 2.67 Counting AS_SET/AS_CONFED_*............................ 96 2.68 Outbound Loop Detection................................ 97 2.69 Appendix A - Other Documents........................... 99 3. The Issues from -18 to -19............................... 99 3.1 Reference to RFC 1772................................... 99 3.2 MUST/SHOULD Capitalization.............................. 99 3.3 Fix Update Error Subcode 7 -- accidently removed........ 100 3.4 Section 5.1.4 - Editorial Comment....................... 101 3.5 Section 9.1 - Change "all peers" to "peers"............. 101 3.6 AS Loop Detection & Implicit Withdraws.................. 101 3.7 Standardize FSM Timer Descriptions...................... 102 3.8 FSM MIB enumerations.................................... 103 3.9 Make "delete routes" language consistent................ 104 3.10 Correct OpenSent and OpenConfirm delete wording........ 104 3.11 Incorrect next state when the delay open timer expires. 105 3.12 Entering OpenConfirm / Adding "Stop OpenDelay" action.. 105 3.13 FSM Missing Next States................................ 111 3.13.1 FSM Missing Next States - Event 15 or 16 (Connect State)...................................... 111 3.13.2 FSM Missing Next States - Event 14 (Connect State)... 113 3.13.3 FSM Missing Next States - Event 15 or 16 Lange [Page 3] INTERNET DRAFT May 2003 (Active State)....................................... 115 3.13.4 FSM Missing Next States - Event 13-17 (TCP Connection)..................................... 116 3.13.5 FSM Missing Next States - Event 17 (Connect State)... 118 3.13.6 FSM Missing Next States - Event 18 (Open Confirm).... 121 3.14 FSM - Peer Oscillation Damping......................... 124 3.15 FSM - Consistent FSM Event Names....................... 124 3.16 Many Editorial Comments................................ 127 3.17 Section 3, Page 8, Paragraph 3 - Obsolete?............. 132 3.18 MED Removal Text....................................... 135 3.19 Security Considerations................................ 138 3.20 Peer Oscillation Damping............................... 138 3.21 Session Attributes - IdleHold Timer.................... 139 3.22 Specify New Attributes (Accept Connections/Peer Oscillation Damping)................................... 141 3.23 Event1/Event2 Clean Up................................. 142 3.24 Events 3, 5, 6 & 7 Give Examples....................... 142 3.25 Event 4 & 5 Session Initiation Text.................... 144 3.26 Event 4 & 5 - bgp_stop_flap option..................... 145 3.27 Event 5 Clarification.................................. 147 3.28 Timer Events Definition - Make Consistent.............. 148 3.29 Event 8 - Clean Up..................................... 148 3.30 Hold Timer - Split?.................................... 149 3.31 OpenDelay Timer Definition............................. 149 3.32 Definition of TCP Connection Accept (Event 13)......... 149 3.33 Event 13 & 14 - Valid Addresses & Ports................ 150 3.34 Event 17 - TCP Connection Fails to TCP Connection Termination............................................ 151 3.35 Making Definition Style Consistent..................... 151 3.36 Event 19 - Definition Cleanup.......................... 154 3.37 Event 22 - Cleanup..................................... 155 3.38 FSM Description - ConnectRetry Count................... 156 3.39 Handling Event 7 (Auto Stop to Idle State processing).. 157 3.40 Clearing the Connection Retry Timer.................... 157 3.41 Handling of Event 14 in the Connect State.............. 159 3.42 Handling events 20, 21 in the Connect State and Active State.................................................. 160 3.43 Handling the default events in the Connect state....... 163 3.44 Handling Event 23 in Connect and OpenSent.............. 165 3.45 Event 17 in the Connect state.......................... 167 3.46 Handling of Event 17 in Active state................... 170 3.47 Handling of Event 19 in Active state................... 170 3.48 Handling of Event 2 in Active state.................... 171 3.49 Default Event handling in Active state................. 173 3.50 Clearing Hold timer in OpenSent, OpenConfirm and Established State...................................... 173 3.51 Clearing Keepalive timer in OpenConfirm and Established State.................................................. 174 Lange [Page 4] INTERNET DRAFT May 2003 3.52 Handling Event 18 in the OpenSent state (Keepalive Timer)................................................. 174 3.53 Established State MIB.................................. 177 3.54 State impact of not supporting Optional Events......... 177 3.55 New DelayOpen State.................................... 178 3.56 Clarify what is covered in the base document........... 178 4. References............................................... 179 5. Author's Address......................................... 180 Specification of Requirements The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 1. Introduction This document records the issues discussed and the consensus reached in the Interdomain Routing (IDR) Working Group during its efforts to revise and bring up to date the base specification for the BGP-4 protocol. The rational for doing this is simple: Experience has demonstrated that the same issues and questions tend to come up again and again. This memo will document not only the decisions on these issues but also how and why the working group reached those conclusions. We hope that this will help make future discussions more fruitful by providing them with a historical context. This document traces the evolution of the BGP-4 base specification from its incarnation as draft-ietf-idr-bgp4-17.txt through the big revision and update push culminating in draft-ietf-idr-bgp4-19.txt. It is divided into two main sections. The first deals with the issues discussed between -17 and -18, and the second deals with the issues discussed between -18 and -19. N.B. There is no rhyme or reason to the numbering scheme other than unique tags to address the issues. 2. The Issues from -17 to -18. This section lists the issues discussed on the list from late August to late October 2002. 2.1 IDR WG Charter Status: Consensus Change: Yes Summary: New charter adopted. Lange [Page 5] INTERNET DRAFT May 2003 Discussion: A variety of discussions surrounded the new charter. The rough consensus is to accept the new charter that the AD's have proposed, and to push as hard a possible to get the base spec to RFC status so other drafts that are dependent can also move forward. For our information, Alex has provided these approximate time lines: Stage Anticipated delay Comment -------------------------------------------------------------------- AD-review 1-4 weeks The document may go back depending on to the WG for the workload AD-review comments to be addressed; this would introduce additional delay. IETF LC 2 weeks Same as above IESG review & 1-2 weeks depending Same as above telechat on when the IETF LC ends -------------------------------------------------------------------- Note that if the document is sent back to the WG at some stage, required changes may warrant an additional WG Last Call. I can personally commit to a 2-week upper bound for the AD-review period. Bill may have a different timer granularity. The opinions expressed on this were 7 in favor, 4 against. This thread has messages subjects of "BGP spec and IDR WG charter" and "IDR WG charter". 2.2 TCP Port Status: Consensus Change: Yes Summary: Change: "BGP uses TCP port 179 for establishing its connections." To: "BGP listens on TCP port 179." Discussion: There has been a discussion on clarifying the wording in Section 2, on which port BGP uses. The original text was: Lange [Page 6] INTERNET DRAFT May 2003 "BGP uses TCP port 179 for establishing its connections." The proposed new text is: "BGP listens on TCP port 179." There seems to be a rough consensus that the new text is better. This thread has a message subject of "Review: Section 2, TCP Port 179" 2.3 FSM wording for what state BGP accepts connections in Status: Consensus Change: No Summary: No change necessary Discussion: An issue was brought up later in the "Review: Section 2, TCP Port 179" thread about the words in the FSM for what state BGP accepts connections in. The consensus is that the existing wording is clear. 2.4 BGP Identifier/Router ID Status: Consensus Change: No Summary: No change necessary to base draft. Perhaps in a BCP. Discussion: The "admin dist/gp spec proposal", "Router ID" and "bgp spec proposal" threads discussed the BGP Identifier and how close or not it is to IGP's Router ID. The consensus was that this discussion is better saved for a BCP draft, and that it does not need to be contained in the base spec. 2.5 Direct EBGP Peers Status: Consensus Change: No Summary: A recollection that ebgp peers must be direct. No text proposed, no discussion. Discussion: Jonathan recalled something that stated that ebgp peers must be direct. No specific sections were quoted. Lange [Page 7] INTERNET DRAFT May 2003 Yakov responded to this with: Section 5.1.3 talks about both the case where ebgp peers are 1 IP hop away from each other: 2) When sending a message to an external peer X, and the peer is one IP hop away from the speaker: as well as the case where they are multiple IP hops away from each other: 3) When sending a message to an external peer X, and the peer is multiple IP hops away from the speaker (aka "multi hop EBGP"): And emphasized that multi hop EBGP does exist. This came up in the "bgp draft review" thread. 2.6 Disallow Private Addresses Status: Consensus Change: No Summary: No change necessary Discussion: In the tread entitled "bgp draft review": Mentioned explicitly disallowing private addresses. The consensus was that there is no reason to disallow them. Which IP addresses peers use is an operational issue. 2.7 Renumber Appendix Sections Status: Consensus Change: Yes Summary: Rename/renumber appendix sections so they do not have the same numbers as sections of the main text. Discussion: In the tread entitled "bgp draft review": This thread brought up renaming sections in the appendix to avoid confusion with sections of the same number in the main text. Yakov responded that he would do so in the next edition. Lange [Page 8] INTERNET DRAFT May 2003 2.8 Jitter Text Status: Consensus Change: Yes Summary: Get rid of section 9.2.1.3 ("Jitter"). Move the text to an Appendix: "BGP Timers" Expand text to indicate that jitter applies to all timers, including ConnectRetry. The text for the appendix is listed at the end of the discussion. Discussion: In the tread entitled "bgp draft review": The thread also proposed: "jitter should be applied to the timers associated with MinASOriginationInterval, Keepalive, and MinRouteAdvertisementInterval" Be changed to: "jitter should be applied to the timers associated with ConnectRetry timer" Yakov agreed with making some changes and suggested that we make sure that jitter is applied to all timers. Specifically, he proposes we get rid of section 9.2.1.3 ("Jitter"), move the text of this section into Appendix "BGP Timers", and expand the text to indicate that jitter applies to ConnectRetry timer as well. Jonathan, the original commenter, agreed with Yakov's suggestion. In a follow-up to this issue, there was a question raised about the values we have specified for timers in the document. Specifically: The ConnectRetry timer is should have a value that is 'sufficiently large to allow TCP initialization. Application of jitter can reduce the this value (by up to 25%). A configuration which the ConnectRetry timer has been pegged at a value close to TCP connection time may cause a connection to be terminated as a result of this jitter. Is this a cause for concern ? The default value suggested for ConnectRetry (120 seconds) is sufficiently large that event with a jitter of 0.75, it will be greater than TCP's connection establishment timer. Is adding a jitter to the ConnectRetry timer a standard practice ? What benefit does this provide ? Lange [Page 9] INTERNET DRAFT May 2003 Curtis responded to this with: The TCP connection establishment timer is 75 seconds (sysctl yield "net.inet.tcp.keepinit: 75000" in BSD-oids). The ConnectRetry determines when to make a second attempt after a prior attempt to connect has failed. It is to avoid a rapid succession of retries on immediate failures (for example "Connection refused" if the peer was in the middle of a reboot, Network Unreachable if you can't get there from here, etc) but also covers the case where the TCP SYN goes off and is never heard from again. And Jonathan replied with this information about current practice: It seems to me that if you bring up all bgp peers at once it may lead to load spikes on the network. Cisco seems to wait 27.5 +/- 2.5 seconds for IBGP, and 40 +/- 5 seconds for EBGP--20 sec. from config time to the "open active, delay" jittered delay assignment plus the jittered delay (5 to 10 sec. for IBGP, and 15 to 25 sec. for EBGP). This would also apply for "no neighbor x.x.x.x shutdown". Their value of ConnectRetry is 60sec. though, not sure how this value is used (based on above). Maybe some Cisco folks can chime in on this one??? I did not check Juniper. Also, interestingly, they do not apply jitter to the other timers (as far as I can tell), but I don't see a problem with this. Another timer that they use that is not mentioned in the draft/rfc is the next hop resolution timer which is 30 seconds. Although it would be nice to have this in the spec, I will concede that it is out of scope and/or implementation dependent. So the question that arises from this followup, is how does this question affect the text of the appendix on jitter? Curtis replied that we need to only state that jitter should be applied to all timers. Whether a vendor does so or not is a minor deficiency and does not bear on interoperability. Therefore, specifying exact details are not necessary. After Jonathan's response Curtis and Jonathan agreed that jitter should be added to all timers and that we should state so in the text. Yakov proposed the following text for the appendix to discuss jitter: Lange [Page 10] INTERNET DRAFT May 2003 I'd like to propose the following text for "BGP Timers" section: BGP employs five timers: ConnectRetry (see Section 8), Hold Time (see Section 4.2), KeepAlive (see Section 8), MinASOriginationInterval (see Section 9.2.1.2), and MinRouteAdvertisementInterval (see Section 9.2.1.1). The suggested value for the ConnectRetry timer is 120 seconds. The suggested value for the Hold Time is 90 seconds. The suggested value for the KeepAlive timer is 1/3 of the Hold Time. The suggested value for the MinASOriginationInterval is 15 seconds. The suggested value for the MinRouteAdvertisementInterval is 30 seconds. An implementation of BGP MUST allow the Hold Time timer to be configurable, and MAY allow the other timers to be configurable. To minimize the likelihood that the distribution of BGP messages by a given BGP speaker will contain peaks, jitter should be applied to the timers associated with MinASOriginationInterval, Keepalive, MinRouteAdvertisementInterval, and ConnectRetry. A given BGP speaker shall apply the same jitter to each of these quantities regardless of the destinations to which the updates are being sent; that is, jitter will not be applied on a "per peer" basis. The amount of jitter to be introduced shall be determined by multiplying the base value of the appropriate timer by a random factor which is uniformly distributed in the range from 0.75 to 1.0. Jeff & Ben agreed with this. Justin suggested that we move the range from 0.75 to 1.25 to ensure that the average is around the configured value. Yakov agreed with Justin's changes. Jonathan disagreed, arguing that it was out-of- scope for the task of clarifying the text only. Justin agreed and withdrew his comment. Curtis liked the general text, but suggested these modifications: minor improvement (not really an objection) -- s/suggested value/suggested default value/g Also Lange [Page 11] INTERNET DRAFT May 2003 s/shall apply the same jitter/may apply the same jitter/ (to each of these quantities regardless of ...). s/jitter will not be applied/jitter need not be configured/ (on a "per peer" basis). He stated that in Avici's implementation they allow a lot of granularity in timer settings, so this reflects current practice. Curtis also suggested changing the last paragraph: The suggested default amount of jitter shall be determined by multiplying the base value of the appropriate timer by a random factor which is uniformly distributed in the range from 0.75 to 1.0. A new random value should be picked each time the timer is set. The range of the jitter random value MAY be configurable. This would make it clear that it is possible to have this timer as configurable and still be within spec. Other comments on Yakov's text pointed out that IOS uses 5 seconds as the default IBGP MinRouteAdvertisementInterval. Tom pointed out that there seems to be a discrepancy between this text and the FSM: The FSM has an OpenDelay timer. And the FSM suggests a HoldTimer of 4 minutes. In following up on this issue, Yakov stated: Here is the final text for the BGP Timers section: BGP employs five timers: ConnectRetry (see Section 8), Hold Time (see Section 4.2), KeepAlive (see Section 8), MinASOriginationInterval (see Section 9.2.1.2), and MinRouteAdvertisementInterval (see Section 9.2.1.1). The suggested default value for the ConnectRetry timer is 120 seconds. The suggested default value for the Hold Time is 90 seconds. The suggested default value for the KeepAlive timer is 1/3 of the Hold Time. The suggested default value for the MinASOriginationInterval is 15 seconds. The suggested default value for the MinRouteAdvertisementInterval is Lange [Page 12] INTERNET DRAFT May 2003 30 seconds. An implementation of BGP MUST allow the Hold Time timer to be configurable, and MAY allow the other timers to be configurable. To minimize the likelihood that the distribution of BGP messages by a given BGP speaker will contain peaks, jitter should be applied to the timers associated with MinASOriginationInterval, Keepalive, MinRouteAdvertisementInterval, and ConnectRetry. A given BGP speaker may apply the same jitter to each of these quantities regardless of the destinations to which the updates are being sent; that is, jitter need not be configured on a "per peer" basis. The suggested default amount of jitter shall be determined by multiplying the base value of the appropriate timer by a random factor which is uniformly distributed in the range from 0.75 to 1.0. A new random value should be picked each time the timer is set. The range of the jitter random value MAY be configurable. With this in mind, I would suggest we mark this issue as closed. Jonathan suggested adding "per peer" to the text, Yakov responded with this text: An implementation of BGP MUST allow the Hold Time timer to be configurable on a per peer basis, and MAY allow the other timers to be configurable. This proposal met with general agreement. This issue is at consensus. 2.9 Reference to RFC904 - EGP Protocol Status: Consensus Change: Yes Summary: Add a reference to RFC904 Discussion: The "Review Comment: Origin Attribute pg 14" thread suggested adding a reference to RFC904(?), to refer to the EGP protocol. There was no discussion. Yakov agreed to this, and Jonathan seconded it. 2.10 Extending AS_PATH Attribute Status: Consensus Lange [Page 13] INTERNET DRAFT May 2003 Change: Yes Summary: Add this to 9.2: If due to the limits on the maximum size of an UPDATE message (see Section 4) a single route doesn't fit into the message, the BGP speaker MUST not advertise the route to its peers and may choose to log an error locally. Discussion: The "Extending AS_PATH attribute length en route" thread brought up the issue of what action should we specify when we receive a route with an AS_PATH that exceeds the defined maximum length. There was some discussion, and it was suggested that, after logging the error, the route not be propagated. Yakov stated that: The real issue here is how to handle the case when a route (a single address prefix + path attributes) doesn't fit into 4K bytes (as the max BGP message size is 4 K). To address this issue I would suggest to add the following to 9.2: After some discussion, Yakov's proposed text's last sentence was dropped and we arrived at: If due to the limits on the maximum size of an UPDATE message (see Section 4) a single route doesn't fit into the message, the BGP speaker may choose not to advertise the route to its peers. In response to Andrew's clarification question to the list, Curtis responded: Wording would be more like: If the attributes for a specific prefix becomes too large to fit the prefix into the maximum sized BGP UPDATE message, the prefix should not be advertised further. Truncation or omission of attributes should not occur unless policies for such modifications are specifically configured. Such policies may contribute to the formation of route loops and are not within the scope of this protocol specification. After some additional discussion, it was decided that we add "and may choose to log an error locally." to the end of Yakov's text. Also, we agreed to change "may choose not to advertise..." to "MUST Lange [Page 14] INTERNET DRAFT May 2003 NOT advertise...". So the text on the table right now is: If due to the limits on the maximum size of an UPDATE message (see Section 4) a single route doesn't fit into the message, the BGP speaker MUST not advertise the route to its peers and may choose to log an error locally. This met with one agreement and no disagreements. We have a consensus. 2.11 Rules for routes from Loc-RIB to Adj-RIB-Out - Section 9.1 Status: Consensus Change: Yes Summary: Add this text: The local speaker SHALL then install that route in the Loc-RIB, replacing any route to the same destination that is currently being held in the Loc-RIB. When the new BGP route is installed in the Rout- ing Table, care must be taken to ensure that existing routes to the same destination that are now considered invalid are removed from the Routing Table. Whether or not the new BGP route replaces an existing non-BGP route in the Routing Table depends on the policy configured on the BGP speaker. Discussion: The "Proxy: comments on section 9.1.3" thread brought up some lack of clarity in the section discussing the rules for which routes get propagated from the Loc-RIB into the Adj-RIB-Out. These discussions resulted in a number of suggestions for new text. The first new text was proposed to clarify the issue that the thread first brought up: I agree that this could use some clarification. How about adding to b) in section 9.1: The Loc-RIB must contain only one route per destination; further, it must include all routes that the BGP speaker is using. changing c) in section 9.1.2 to: c) is selected as a result of the Phase 2 tie breaking rules specified in 9.1.2.2, or Lange [Page 15] INTERNET DRAFT May 2003 and adding d) when routing protocols other than BGP are in use, is determined by some other local means to be the route that will actually be used to reach a particular destination. This text was never discussed or a consensus formed on putting it in the document. This modification to 9.1.2 was also proposed to address the same concern: How about changing the paragraph after c) in 9.1.2 to: The local speaker SHALL then install that route in the Loc-RIB, replacing any route to the same destination that is currently being held in the Loc-RIB. This route SHALL then also be installed in the BGP speakers forwarding table. There was one response in the negative to this change, arguing that is is not necessary. Yakov replied to this that: Wrt "adding to b) in section 9.1", the second part (after ";") is redundant as this point is already stated in 3.2. Wrt the first point about Loc-RIB containing just one route per destination, I would suggest to add it to section 3.2, where Loc-RIB is first introduced, rather than adding it to 9.1. Wrt "changing c)... and adding...", I have no objections to add/modify the text, as suggested above. I am not sure though that changing the paragraph after c) in 9.1.2 is really necessary though, so I would prefer to keep it as is. The "issue 11" thread this was being discussed in then digressed to the topic, now covered in issue 11.3. Ben re-addressed the original issue with this input: I have somewhat of an issue with the paragraph after item c section 9.1.2 as discussed. which is => "The local speaker SHALL then install that route in the Loc-RIB, replacing any route to the same destination that is currently being Lange [Page 16] INTERNET DRAFT May 2003 held in the Loc-RIB. If the new BGP route is installed in the Routing Table (as a result of the local policy decision), care must be taken to ensure that invalid BGP routes to the same destination are removed from the Routing Table. Whether or not the new route replaces an already existing non-BGP route in the routing table depends on the policy configured on the BGP speaker." Can we assume that its OK to have a route present in the Loc-RIB and possibly in the adj-RIB-Out but not in the Routing table due to some policy. Won't we violate rule number 1? Only advertise what you use. As conversely implied in this sentence => "If the new BGP route is installed in the Routing Table (as a result of the local policy decision), care must be taken to ensure that invalid BGP routes to the same destination are removed from the Routing Table" I would rephrase the paragraph as follows => "The local speaker SHALL then install that route in the Loc-RIB, replacing any route to the same destination that is currently being held in the Loc-RIB. When the new BGP route is installed in the Routing Table, care must be taken to ensure that existing routes to the same destination that are now considered invalid are removed from the Routing Table. Whether or not the new BGP route replaces an existing non-BGP route in the routing table depends on the policy configured on the BGP speaker." Jeff replied: With the exception that Routing Table should be capitalized throughout, I'd suggest we take this as consensus. Yakov agreed. We are at consensus. 2.11.1 Rules for routes from Loc-RIB to Adj-RIB-Out - Section 9.1.3 Status: Consensus Change: Yes Summary: The text below will be added to the -18 version. Discussion: In further discussions around this issue, this text was also proposed: How about adding to section 9.1.3, at the end: Lange [Page 17] INTERNET DRAFT May 2003 Any local-policy which results in reachability being added to an Adj- RIB-Out without also being added to the local BGP speaker's forwarding table is beyond the scope of this document. This suggestion received one response that agreed to this change. This text will be added to the -18 version, and since there were no objections, this issue has been moved to consensus. 2.11.2 Rules for routes from Loc-RIB to Adj-RIB-Out - Section 2 Status: Consensus Change: Yes Summary: Add this text: In the context of this document we assume that a BGP speaker advertises to its peers only those routes that it itself uses (in this context a BGP speaker is said to "use" a BGP route if it is the most preferred BGP route and is used in forwarding). All other cases are outside the scope of this document. Discussion: Additionally this thread produced this section of new text, in section 2: "one must focus on the rule that a BGP speaker advertises to its peers (other BGP speakers which it communicates with) in neighboring ASs only those routes that it itself uses." Should be changed to "one must focus on the rule that a BGP speaker advertises to its peers (other BGP speakers which it communicates with) in neighboring ASs only routes whose NLRIs are locally reachable." "one must focus on the rule that a BGP speaker advertises to its peers (other BGP speakers which it communicates with) in neighboring ASs only routes which are locally reachable. Local reachability can be achieved by having any protocol route to the given destination in the routing table." Lange [Page 18] INTERNET DRAFT May 2003 There were a lot of emails exchanged on this topic with a variety of texts proposed (see early in the "Active Route" thread). This issue reopened with Jonathan, who brought up the issue originally, stating that: The issue I raised, and would like to be [re-]considered is with: "one must focus on the rule that a BGP speaker advertises to its peers (other BGP speakers which it communicates with) in neighboring ASs only those routes that it itself uses." Curtis replied that: That is called route origination and it is allowed by: 9.4 Originating BGP routes A BGP speaker may originate BGP routes by injecting routing information acquired by some other means (e.g. via an IGP) into BGP. [...] The decision whether to distribute non-BGP acquired routes within an AS via BGP or not depends on the environment within the AS (e.g. type of IGP) and should be controlled via configuration. Advice on what to put in the AS_PATH and NEXT_HOP is in the document. He continued with: I don't think there was ever consensus on what to do with the statement "a BGP speaker advertises to its peers (other BGP speakers which it communicates with) in neighboring ASs only those routes that it itself uses". Some reasonable choices are: 1. Omit it (the implied consensus of the rewrite of the paragraph in 32.2). 2. Leave it as is and put it in another paragraph to separate it from the destination based routing statement. 3. Clean up the wording and put it in another paragraph to separate it from the destination based routing statement. The separate paragraph for 2 would be the exact sentence we now have. A BGP speaker advertises to its peers (other BGP speakers which it communicates with) in neighboring ASs only those routes that it itself uses. In possibility 3 we (try to) clear up the ambiguity about the meaning Lange [Page 19] INTERNET DRAFT May 2003 of the word "use" in this sentence. A BGP speaker advertises to its peers (other BGP speakers which it communicates with) in neighboring ASs only those routes that it itself uses. In this context a BGP speaker is said to "use" a BGP route if it is the most preferred BGP route and is either directly used in forwarding or in a specifically configured case where the BGP route would be forwarded internally but IGP forwarding information is used. The latter case reflects a usage in which the IGP is used for forwarding but BGP is originated to IBGP to carry attributes that cannot be carried by the IGP (for example, BGP communities [N]). Other special cases such as virtual routers and multiple instances of BGP on a single router are beyond the scope of this document but for each of these the statement "a BGP speaker advertises to its peers (other BGP speakers which it communicates with) in neighboring ASs only those routes that it itself uses" can (and should in the definition of the extension) be made true with an appropriate definition of the word "use". Unless someone volunteers better wording this may be a good starting point. I thing the last sentence borders on ridiculous in a protocol spec but may be necessary to address specific objections raised on this mailing list. If we want to elaborate on the meaning of the word "use" and address the objections this is what we end up with. Of course looking at what we ended up with, I'd also go along with the other two options (leave it out or put the one sentence in a separate paragraph as is). After some additional discussion (in the "issue 11.2" thread), we have come to a consensus on this text: In the context of this document we assume that a BGP speaker advertises to its peers only those routes that it itself uses (in this context a BGP speaker is said to "use" a BGP route if it is the most preferred BGP route and is used in forwarding). All other cases are outside the scope of this document. This issue is at consensus. 2.11.3 Documenting IBGP Multipath Status: Consensus Change: Yes Summary: The documenting of IBGP Multipath is left to another Internet Draft. The consensus is that it should not be in the base spec. Lange [Page 20] INTERNET DRAFT May 2003 Discussion: This thread began in the "issue 11" discussion. In it it was proposed that: There is support in some router vendors to allow more than one BGP route to be installed, for the purpose for load balancing. Given that this is a current practice, and seems to be a useful feature as well, should we insist that only one route be installed in the Loc-RIB ? I would like to suggest that all sections which use MUST in the context of only one route in Loc-RIB be relaxed a little to a SHOULD, and a section added that states that it is possible for a n implementation to add more than one route to the Loc-RIB for the purposes of load balancing. While it will be useful to describe how this situation is the handler, it is perhaps sufficient to even state that handling of this situation is outside the scope of this RFC. I am including some proposed text for this purpose: For the part: > The Loc-RIB must contain only one route per destination; consider instead, % The Loc-RIB SHOULD contain only one route per destination. % An implementation may choose to install multiple routes to % a destination (for the purposes of load balancing). The % handling of such a configuration, however, is outside the % scope of this RFC. Perhaps, this can be in section 3.2 instead. After much discussion back and forth, it was agreed that documenting IBGP Multipath behavior is a good thing. However, it is something that belongs in another draft. Alex opened this issue up again. There were a flurry of responses, most all of them agreeing with the original consensus that we should document this feature in a different draft, since it doesn't affect the core interoperability requirements, and we want to advance the spec in a timely manner. Alex persisted in his assertion that this belongs in the base specification. Right now, the issue is still open. Lange [Page 21] INTERNET DRAFT May 2003 This discussion later expanded in scope to include all BGP multipath. Curtis laid out a good description of the various flavors of multipath: In addition to IGP multihop, there are two cases of BGP multipath. In IGP multihop there is one BGP advertisement but to ways to reach th BGP NEXT_HOP via the IGP. In one case of BGP multihop, two (or more) IBGP routers peering with the same external AS have equal routes to a destination and are an equal cost away from a third router. BGP multihop is applicable to that third router. Without BGP multihop, BGP would normally pick the BGP NEXT_HOP of the advertisement from only one of those IBGP peers (using BGP Identifier) and use that. The IGP lookup would yield one next hop. With BGP multihop, BGP uses the BGP NEXT_HOP of both advertisements. Each BGP NEXT_HOP has a different IGP next hop (one or more IGP next hop). The second case is where all of the candidates routes for BGP multipath are external. Seldom does IGP multipath come into play for EBGP (odd tunneled EBGP multihop cases maybe). Typically the load is split among two (or more) routers in the same AS. If in EBGP multipath you split among routers in difference AS, an aggregate should be formed. This is still prior to the IGP cost rule in the route selection. Normally one would not combine IBGP and EBGP in multihop given that the decision point for multihop is after "d" in 9.1.2.2. If the multihop decision was prior to "d", then two routers each with an external peering would forward some of the traffic to each other and for some src/dst pairs, they'd form a loop. [So don't do that!] This is getting to be a lot to add to the base spec. I hope we've convinced you that we should put it in another document. Curtis later added specific text, that could serve as a start for the new document (or added to the base spec if the consensus ended up going the other way): BGP specifies how to select the single best route. OSPF specifically defines procedures for handling equal cost multipath (ECMP) [cite OSPF]. The same technique has been applied to ISIS. A similar technique has been used with BGP. Variations exist but the decision to support BGP multipath, the specific variation of BGP multipath, or not to support it, does not affect interoperability. Lange [Page 22] INTERNET DRAFT May 2003 A naive implementations of ECMP can cause severe performance degradation for TCP flows. To avoid this, implementations of BGP multipath SHOULD maintain packet ordering within microflows as described in [cite rfc2991, rfc2992]. BGP multipath, if implemented, SHOULD be disabled by default. In addition to IGP multipath (OSPF ECMP and ISIS equivalent), there are two variations of BGP multipath described here. A BGP implementation may offer both, either one, or neither variation of BGP multipath. Other variations of BGP multipath may exist, but no guarantees can be made in this protocol specification of their properties or impact on interoperability. Where IGP multipath is used, there is an interaction with BGP learned routes. The lookup of a BGP NEXT_HOP in the IGP can result in the selection of an IGP multipath entry. This is not a variation of BGP multipath. When this occurs, one BGP route is selected as the best but there is more than one way to reach the BGP NEXT_HOP via the IGP. In one variation of BGP multipath, a set of more than one IBGP routers peering with the same external AS have equal routes to a destination and are an equal IGP cost away from a second set of one or more routers. BGP multipath is applicable to the latter set of routers. Without BGP multipath, BGP would pick the BGP NEXT_HOP of the advertisement from only one of those IBGP peers (using BGP Identifier) and use only that BGP route. With BGP multipath, BGP uses the BGP NEXT_HOP of more than one of these equal cost advertisements, yielding more than one BGP NEXT_HOP. Each BGP NEXT_HOP has a different IGP next hop (one or more IGP next hop if IGP multipath is in use). The second case is where all of the candidates routes for BGP multipath are external and learned by a single BGP peer. Without BGP multipath this peer would select only one of the BGP routes and obtain only one BGP NEXT_HOP. With BGP multipath, more than one equal cost route is selected yielding more than one BGP NEXT_HOP. Seldom does IGP multipath come into play when looking up an EBGP NEXT_HOP but could in principle be applicable. If in EBGP multipath traffic is split among routers in difference AS, an aggregate SHOULD be formed so as to propagate a route with an accurate AS_PATH. If the resulting aggregate is not more specific than the components, the AS_SET SHOULD NOT be dropped. The decision point for multipath is after step "d" in Section 9.1.2.2 (prefer externally learned routes). IBGP learned and EBGP learned routes MUST NOT be combined in multipath. If the multipath decision Lange [Page 23] INTERNET DRAFT May 2003 is prior to "d", then two routers each with an external peering would form a routing loop. The decision point for multipath is generally after step "e" in Section 9.1.2.2. Some relaxation of the "equal cost" rule (also applicable to IGP multipath) is possible. In addition to the equal cost BGP NEXT_HOPS available at BGP route selection, if the IGP next hop for other BGP NEXT_HOPs are of lower cost, then those may be used as well. This relaxation of the step "e" is possible but is not widely implemented (and may not be implemented at all). The consensus of the majority of the IDR WG is to keep this in a separate draft and out of the base spec. 2.12 TCP Behavior Wording Status: Consensus Change: No Summary: In issue 19 we decided to remove this section entirely. As a result the previous consensus on this issue (no change) is needed moot. Discussion: The subject-less "your mail" thread discussed a wording clarification from: "An implementation that would "hang" the routing information process while trying to read from a peer could set up a message buffer (4096 bytes) per peer and fill it with data as available until a complete message has been received. " To something that is more TCP-correct, such as: "An implementation that would "hang" the routing information process while trying to received from a peer could set up a message buffer (4096 bytes) per peer and fill it with data as available until a complete message has been received. " (only change: "read" to "received" This was one of a couple of suggested changes.) This suggestion was quite contentious, and although there were a variety of alternate texts proposed, the only consensus was that this was a very minor issue, and probably not worth changing. In issue 19 we decided to remove this section entirely. 2.13 Next Hop for Originated Route Lange [Page 24] INTERNET DRAFT May 2003 Status: Consensus Change: No Summary: No responses, assumed consensus to keep things the same. Discussion: There was a one-message thread entitled "next hop for originated route". This message received no response, so the assumption is that there is a consensus to keep things as they are. For related discussion see issue 61. 2.14 NEXT_HOP to Internal Peer Status: Consensus Change: No Summary: Closed in favor of issue 61. Discussion: The thread entitled "NEXT_HOP to internal peer" starts with this question: When sending a locally originated route to an internal peer, what should NEXT_HOP be set to? One response suggested that we add a line stating that the NEXT_HOP address originates from the IGP. Since this issue and issue 61 are basically the same, except 61 proposes text, we'll close this issue in favor of 61. 2.15 Grammar Fix Status: Consensus Change: Yes Summary: Change: "The Prefix field contains IP address prefixes ..." To: "contains an IP address prefix ..." Discussion: The thread entitled "Review comment: bottom of page 16" corrects a grammar mistake by suggesting we change: "The Prefix field contains IP address prefixes ..." to: "contains an IP address prefix ..." Lange [Page 25] INTERNET DRAFT May 2003 Yakov responded that this will be fixed in -18. The consensus seems to be to correct this, and go with the new text. 2.16 Need ToC, Glossary and Index Status: Consensus Change: Yes Summary: Need to add a Table of Contents (ToC), Glossary and Index to the draft. Will be added in draft -18. Discussion: The "Review Comments: draft-ietf-idr-bgp4-17.txt" thread suggests: 1. Document needs, Table of Contents, Glossary, and Index 2. Paths, Routes, and Prefixes need to be defined in the spec early on (like in a glossary), so it is obvious what is implied. Yakov responded that draft -18 will have a ToC and definition of commonly used terms. 2.17 Add References to other RFC-status BGP docs to base spec. Status: Consensus Change: Yes Summary: Add references to other RFC-status BGP docs to the base spec. Discussion: The "Review Comments: draft-ietf-idr-bgp4-17.txt" thread then changes titles to: "Review of draft-ietf-idr-bgp4-17.txt" and goes on to suggest: 3. All BGP Extensions described in other documents that made it to RFC status should be at least referenced in the Reference section P.64. This is justifiable since it's the core BGP standard spec. Yakov responded that this will be added to the -18 review. Jonathan agreed. 2.18 IP Layer Fragmentation Status: Consensus Change: No Lange [Page 26] INTERNET DRAFT May 2003 Summary: No need to mention IP Layer Fragmentation in the BGP specification, since this is taken care of at the TCP level. Discussion: 1. P.6 section 4. Message Formats, its possible for the source BGP peer IP layer to fragment a message such that the receiving BGP peer socket layer would have to reassemble it. Need to mention this, since BGP implementations are required to do this. The response to this was that, while true, reassembly is something that is inherent in the TCP layer that BGP rides over. Therefore, this is something that is in the TCP spec, and needn't be repeated in the BGP spec. This comment was reaffirmed. There seems to be consensus that this isn't something that needs to be in the BGP spec. 2.19 Appendix Section 6.2: Processing Messages on a Stream Protocol Status: Consensus Change: Yes Summary: Remove the section entirely, as this is something that does not belong in the base spec. Discussion: This first came up in response to Issue 17: There was one comment suggesting that section 6.2 (Processing Messages on a Stream Protocol" mentioned this. The original reviewer responded that the out-of-scope comment was out-of-place and referred the responder to section 6.2 (appendix 6) The original reviewer stated that he is happy with just adding a reference to section 6.2 in appendix 6 and leaving it at that. Curtis suggested we just add a reference to Stevens in appendix 6. 6.2 and be done with it. Specifically: 6.2 Processing Messages on a Stream Protocol BGP uses TCP as a transport mechanism. If you are unsure as to how to handle asynchronous reads and writes on TCP sockets please refer to Unix Network Programming [RWStevens] or other introductory text for programming techniques for the operating system and TCP implementation that you are using. There were further suggestions to remove the section entirely as out- Lange [Page 27] INTERNET DRAFT May 2003 of-scope. At least 3 people agreed with this. Alex responded that he sees no reason to remove it, but wouldn't have a problem if the WG decides to do so. There seems to be general agreement that this section should be removed. N.B. This also affects issue 12. 2.20 Wording fix in Section 4.3 Status: Consensus Change: Yes Summary: A small change for clarity in section 4.3 Discussion: This suggestion grew out of the discussion on Issue 18. The following change was suggested in section 4.3, second line of the first paragraph: s/UPDATE packet/UPDATE message/ Yakov agreed to this change and updated the draft. 2.21 Authentication Text Update Status: Consensus Change: No Summary: The consensus is that additional references to RFC2385 are not necessary. Discussion: P. 10, "Authentication Data:" section you might want to add this, It is also possible to use MD5 (RFC2385) at the transport layer to validate the entire BGP message. Yakov replied to this: There is already text that covers this: "Any authentication scheme used by TCP (e.g., RFC2385 [RFC2385]) may be used in addition to BGP's own authentication mechanisms." .... Lange [Page 28] INTERNET DRAFT May 2003 "In addition, BGP supports the ability to authenticate its data stream by using [RFC2385]." So, I see no need to add the text proposed above. Ishi agreed with Yakov. Jonathan disagreed since he thought no one uses BGP auth. Ishi replied that there are lots of people who do use it. Jonathan replied with a clarification question: "Who uses *BGP's own* authentication mechanisms???" Ron Bonica replied that they use BGP auth. There was some additional discussion over who implements simple password authentication vs. MD5. After further discussion, the consensus seems to be that we should leave the text as it is for the reasons Yakov pointed out. There was some discussion over opening a new issue to discuss deprecating the BGP auth mechanism discussed in RFC1771 in favor of the mechanism in RFC2385. The issue of Deprecating BGP AUTH is discussed in issue 62. 2.22 Scope of Path Attribute Field Status: Consensus Change: Yes Summary: This is already being covered by text that has been added to the -18 draft. Discussion: P. 12, right after "Path Attributes". The following sentence should be added to this section to clarify the scope of the Path Attribute field. "All attributes in the Path Attribute field represent the characteristics of all the route prefixes defined in the NLRI field of the message". Yakov replied to this that: This will be covered by the following text in 3.1 that will be in the -18 version (see also issue 54). Routes are advertised between BGP speakers in UPDATE messages. Multiple routes that have the same path attributes can be advertised in a single UPDATE message by including multiple prefixes in the NLRI field of the UPDATE message. Therefore there is no need to add the sentence proposed above. There were no objections to this statement, so this issue has been Lange [Page 29] INTERNET DRAFT May 2003 moved into consensus. 2.23 Withdrawn and Updated routes in the same UPDATE message Status: Consensus Change: No Summary: For various reasons, not least of which is compatibility with existing implementations, the decision was made to keep thing the way they are. Discussion: 4. P.16, last paragraph in section 4.3 as stated, "An UPDATE message should not include the same address prefix in the WITHDRAWN ROUTES and Network Layer Reachability Information fields, however a BGP speaker MUST be able to process UPDATE messages in this form. A BGP speaker should treat an UPDATE message of this form as if the WITHDRAWN ROUTES doesn't contain the address prefix." This complexity could have been avoided if withdrawn routes and NLRI prefixes with their attributes were mutually exclusive of each other and appeared in different update messages. If that was the case, the priority of which field to process first would have been as simple as using "first come, first served" message processing approach. Yakov commented that this would make the case where they are both in the same message unspecified. John commented that this is something we don't want to change this late in the game. Although it was acknowledged that this might be a good change if we were working from a clean slate. Ben acceded that this was somewhat wishful thinking on his part. Curtis's comment seems to coincide with this message, stating: The existing rules are very clear. Summarized: If an UPDATE contains only a withdraw for a prefix, then withdraw whatever route the peer had previously sent. If an UPDATE contains the prefix only in the NLRI section, replace whatever route had previously been advertised by the peer or add a route if there was no previous route, in both cases adding a route with the current attributes. Lange [Page 30] INTERNET DRAFT May 2003 Don't put the same prefix in the same in both the withdraw and NLRI section of the same update. If you receive an UPDATE with the same prefix in both the withdraw and NLRI, ignore the withdraw. [Some older implementations thought this was a good way to say "delete then add".] Process UPDATEs from the same peer in the order received. And goes on to say, that to him, these rules are clear from the existing text. Consensus is that while this would be nice, we need to stick with what we have, and move on. 2.24 Addition or Deletion of Path Attributes Status: Consensus Change: Yes Summary: Add the following to section 3.1: Changing the attributes of a route is accomplished by advertising a replacement route. The replacement route carries new (changed) attributes and has the same NLRI as the original route. Discussion: 5. P. 20 Its not stated how we delete or modify Path Attributes associated with NLRI prefixes. A response to this comment said that this is implicit in the definition of "route" and the general withdraw/replace behavior and therefore doesn't need to be repeated. Ben responded saying that, while there was an assumption, there was no well defined mechanism, and this leads to ambiguity. John responded, no need to define everything explicitly, or we'll be here forever. Picking this thread up again, Yakov argued: By *definition* a route is a pair. From that definition it follows that changing one or more path attributes of a route means changing a route, which means withdrawing the old route (route with the old attributes) from service and advertising a new route (route with the new attributes). Procedures for doing this are well-defined (see section 3.1), and therefore no new text to cover Lange [Page 31] INTERNET DRAFT May 2003 this is needed. Jonathan agreed with this statement, but Ben argued that the text in section 3 is insufficient the way it is currently written. After two iterations, Ben and Yakov agreed on this formulation for an update to section 3.1: Changing the attributes of a route is accomplished by advertising a replacement route. The replacement route carries new (changed) attributes and has the same NLRI as the original route. Jeff objected somewhat to the wording, since, because of a bgp route is defined as a pair, changing either part of that pair, by definition, changes the route. He acknowledged that this might fall under the category of implementation detail. Yakov presented the view that he thought we were at consensus with the text he proposed above. Jonathan agreed. There were no objections, so this is moved to Consensus. 2.25 NEXT_HOP Semantics Status: Consensus Change: No Summary: After responders pointed out another sentence, this comment was resolved. Things will stay the way they are. Discussion: 1. P.28, 2nd to last paragraph. The line that reads, "To be semantically correct, the IP address in the NEXT_HOP must not be the IP address of the receiving speaker, and the NEXT_HOP IP address must either be the sender's IP address (used to establish the BGP session), or the interface associated with the NEXT_HOP IP address must share a common subnet with the receiving BGP speaker..." This is not always true, what if the current ASBR BGP router is advertising an external AS route (to a IBGP Peer) whose NEXT_HOP IP address is the IP address of the EBGP peer in the other AS? A response to this pointed out that right before this is a sentence stating that this only applied to eBGP links, and only when the peers are one hop from each other, so a modification is unnecessary. This response was confirmed with another. The original reviewer acknowledged this and withdrew the comment. The consensus is to leave things the way they are. Lange [Page 32] INTERNET DRAFT May 2003 2.26) Attributes with Multiple Prefixes Status: Consensus Change: No Summary: After some discussion, the consensus is to keep things the same since the suggested behavior is defined in the message format. Discussion: 2. P. 29, Section 6.3. Add this rule near the attribute rules. "Multiple prefixes that require the same attribute type with different values must never appear in the same update message". A response to this suggested that this text is unnecessary since this behavior is ruled out by the way the message format is defined. The original commenter agrees with the responder. The consensus is to leave things the way they are. 2.27 Allow All Non-Destructive Messages to Refresh Hold Timer Status: Consensus Change: No Summary: It is agreed that this is a change that exceeds the original goal of this draft revision. This goal is to document existing practice in an interoperable way. Discussion: 3. P. 29, Section 6.5, Please rewrite this sentence from: "If a system does not receive successive KEEPALIVE and/or UPDATE and/or NOTIFICATION messages within the period specified in the Hold Time field of the OPEN message ..." To This: "If a system does not receive successive KEEPALIVE and/or UPDATE and/or any other BGP message within the period specified in the Hold Time field of the OPEN message ..." There is disagreement on this change. It has been discussed in other threads. The original commenter acknowledged that this is something that would be "adding a new feature" as opposed to the stated goal of "documenting what exists." He suggested that the ADs decide if we should open the door for new features or not. Yakov replied to this that he would suggest we keep things as is, since the purpose is to document current implementations. Lange [Page 33] INTERNET DRAFT May 2003 This did not meet with any objections, so this issue has been moved into consensus. 2.28 BGP Identifier as Variable Quantity Status: Consensus Change: No Summary: The consensus is that changing the BGP Identifier in the base draft is out-of-scope at this point in the draft evolution. Discussion: 4. P. 31, section 6.8, Please rewrite this sentence from: "Comparing BGP Identifiers is done by treating them as (4-octet long) unsigned integers." To This: "Comparing BGP Identifiers is done by treating them as large numbers based on their IP Address type (e.g. IPv4, IPv6, etc.)." A response to this was that since BGP Identifier is defined in the base spec as a 4 byte unsigned integer, and not a variable quantity, the sentence as written is acceptable. This was also confirmed by another response. The original commenter was thinking of IPv6, and providing sufficient space to allow a full v6 address to be used. Again, responders said that this is out-of-scope for the current draft. 2.29 State Why Unresolveable Routes Should Be Kept in Adj-RIB-In Status: Consensus Change: Yes Summary: Add: "in case they become resolvable" after the last sentence on p. 46. Discussion: 5. P.46, last sentence, "However, corresponding unresolvable routes SHOULD be kept in the Adj-RIBs-In." It would helpful if the author states why unresolvable routes should be kept in Adj-RIBs-In? A response to this stated "In case they become resolvable" Yakov responded that: Lange [Page 34] INTERNET DRAFT May 2003 I suggest we add "in case they become resolvable" after the last sentence on p. 46. The original commenter stated that: Then the point that the peer will not refresh the route if we drop them (unless we use Route Refresh) because they are unreachable should be made. Yakov also responded that: This should be clear from the following text in Section 3: The initial data flow is the portion of the BGP routing table that is allowed by the export policy, called the Adj-Ribs-Out (see 3.2). Incremental updates are sent as the routing tables change. BGP does not require periodic refresh of the routing table. Jonathan, who was the original commenter, agreed with both the changed text and the clarity of section 3. 2.30 Mention Other Message Types Status: Consensus Change: Yes Summary: Add a reference to RFC2918 at the end of the type code list. Discussion: 1. P. 7 Type: Need to add the new message types such as, Capability Negotiations (RFC2842), Route Refresh, etc. One response argued that these are out-of-scope of the base document. One response agreed, but thought that it should be capability and not message type. The original commenter responded about Message type from the capability draft. Sue mentioned this would be added in the second round. Yakov replied that: The only new message type that is covered by an RFC (rather than just an Internet Draft) is the Refresh message. With this in mind how about replacing the following: The following type codes are defined: 1 - OPEN 2 - UPDATE 3 - NOTIFICATION Lange [Page 35] INTERNET DRAFT May 2003 4 - KEEPALIVE with This document defines the following type codes: 1 - OPEN 2 - UPDATE 3 - NOTIFICATION 4 - KEEPALIVE [RFC2918] defines one more type code. Jonathan agreed with this change. This issue has been moved to consensus. 2.31 Add References to Additional Options Status: Consensus Change: Yes Summary: Consensus to add: [RFC2842] defines another Optional Parameter. Discussion: 2. P. 9, right after "This document defines the following optional parameters:" Need to mention possible options, such as: Capabilities (RFC2842), Multiprotocol extensions (RFC2858), Route Refresh (RFC2918). One response agreed that adding references would be fine. A second response agreed. Yakov replied that: Please note that only rfc2842 defines an OPEN optional parameter. Neither rfc2858 nor rfc2918 defines an OPEN optional parameter. With this in mind I would suggest to add the following text: [RFC2842] defines another Optional Parameter. The original poster agreed with this modification. This issue is at consensus. 2.32 Clarify EGP Reference Lange [Page 36] INTERNET DRAFT May 2003 Status: Consensus Change: No Summary: The consensus is that this was addressed in 32.1, so we can close this. Discussion: 3. P. 13, EGP, are there other EGP protocols other than BGP that are in use? If not, change EGP to BGP. A response to this suggested that we add a reference to [1] (the EGP spec) here. Another response clarified that this refers to EGP-the-protocol and NOT the class. Another response disagreed, but suggested that: IGP = network was explicitly introduced into bgp (network cmd) INCOMPLETE = network was implicitly introduced into bgp (redistribute) EGP = other The original commenter thought that this referred to EGP-the-class of protocols. And why not use BGP therefore, as the only EGP. There was some discussion over whether or not we should mention something that is historical. Jeff suggested a footnote in the Origin section about EGP. Curtis suggested that we state that the EGP in ORIGIN is deprecated, but retain the value to document what it used to mean. This reviewer thinks a statement about whether this "EGP" origin refers to the protocol or the class or protocols would be useful. Yakov replied that an EGP reference will be added (see issue 9). Yakov also stated that he doesn't see what is wrong with the current text, and suggested we keep it. This includes leaving out any reference to the status of the EGP spec. He sees that it is clear from context that we are talking about "the EGP" [RFC904]. Jeff noted that this issue has been sufficiently addressed in the solution to 32.1. This met with agreement. We are at consensus. 2.32.1 EGP ORIGIN Clarification Lange [Page 37] INTERNET DRAFT May 2003 Status: Consensus Change: Yes Summary: Change section 5.1.1 to read: ORIGIN is a well-known mandatory attribute. The ORIGIN attribute shall be generated by the speaker that originates the associated routing information. Its value SHOULD NOT be changed by any other speaker." Consensus to change: 1 EGP - Network Layer Reachability Information learned via the EGP protocol to: 1 EGP - Network Layer Reachability Information learned via the EGP protocol [RFC904] Discussion: This discussion is picked up again in the "Review of draft-ietf-idr- bgp4-17" thread, where specific text is proposed: Old: "ORIGIN is a well-known mandatory attribute that defines the origin of the path information. The data octet can assume the following values: Value Meaning 0 IGP - Network Layer Reachability Information is interior to the originating AS 1 EGP - Network Layer Reachability Information learned via the EGP protocol 2 INCOMPLETE - Network Layer Reachability Information learned by some other means" New: "ORIGIN is a well-known mandatory attribute that defines the origin of the path information. The data octet can assume the following values: Value Meaning Lange [Page 38] INTERNET DRAFT May 2003 0 IGP - NLRI was explicitly introduced into bgp 1 EGP - this value was administratively configured to affect policy decisions or NLRI was learned via the EGP protocol [1] 2 INCOMPLETE - NLRI was implicitly introduced into bgp" since: 1) The network command sets the origin to IGP and I remember seeing somewhere that only static routes should be set to IGP. 2) The primary use of EGP value is policy 3) EGP seems to still exist, anyway even if it does not it is not worth re-writing the world. Also, change: "5.1.1 ORIGIN ORIGIN is a well-known mandatory attribute. The ORIGIN attribute shall be generated by the autonomous system that originates the associated routing information. It shall be included in the UPDATE messages of all BGP speakers that choose to propagate this information to other BGP speakers." to: "5.1.1 ORIGIN The value of the ORIGIN attribute shall be set by the speaker that originates the associated NLRI. Its value shall not be changed by any other speaker unless the other speaker is administratively configured to do so to affect policy decisions." since: 1) It is already defined as well-known mandatory attribute. 2) It may be set differently within the same AS (not saying this is good). 3) It is commonly used for policy, but by default does not get changed. 4) Speakers have no choice, it is mandatory. After much continued discussion on this in the "issue 32.1" thread, we seem to have come to a consensus that section 5.1.1 should read: ORIGIN is a well-known mandatory attribute. The ORIGIN attribute shall be generated by the speaker that originates the associated routing information. Its value should not be changed by any other speaker unless the other speaker is administratively configured to do so to affect policy decisions." This text met with a number of agreements, and one disagreement stating that we shouldn't have the "unless administratively configured" portion. Lange [Page 39] INTERNET DRAFT May 2003 After some further discussion, we have this text on the table: ORIGIN is a well-known mandatory attribute. The ORIGIN attribute is generated by the BGP speaker that originates the associated BGP routing information. The attribute shall be included in the UPDATE messages of all BGP speakers that choose to propagate this information to other BGP speakers. Jonathan suggested that we change "propagate this information" to "forward this route". He also mentioned that he would prefer something more explicit instead of/in addition to "The attribute shall be included in the UPDATE messages of all BGP speakers that choose to propagate this information to other BGP speakers." such as "other speakers do not change the ORIGIN value." On the issue of making the EGP ORIGIN type more clear Andrew proposed: To me, there seems to be sufficient confusion around the "EGP" reference to merit some sort of clarification. The simplest modification would be to change: 1 EGP - Network Layer Reachability Information learned via the EGP protocol to: 1 EGP - Network Layer Reachability Information learned via the EGP protocol [RFC904] That would clarify that we're talking about the protocol, and not the class-of-protocols, or EBGP. It would leave unstated that this could theoretically be used to muck with route selection. I think that is ok. If operators want to override ORIGIN to affect some hoho magic, they are welcome to do so, but I don't think it needs to be documented in the base spec. This met with a number of agreements. On the second text section we are working on, Jonathan objected to the current working text below and suggested an alternate: CHANGE: "ORIGIN is a well-known mandatory attribute. The ORIGIN attribute is generated by the BGP speaker that originates the associated BGP routing information. The attribute shall be included in the UPDATE messages of all BGP speakers that choose to propagate this Lange [Page 40] INTERNET DRAFT May 2003 information to other BGP speakers." TO: "ORIGIN is a well-known mandatory attribute. The ORIGIN attribute shall be generated by the speaker that originates the associated routing information. Its value should not be changed by any other speaker unless the other speaker is administratively configured to do so to affect policy decisions." -or- "ORIGIN is a well-known mandatory attribute. The ORIGIN attribute shall be generated by the speaker that originates the associated routing information. Its value should not be changed by any other speaker." Jonathan cited a recent example of someone who was still confused by this section of the text in -17 (not specifically the working text). Yakov proposed this as final text: In 4.3: a) ORIGIN (Type Code 1): ORIGIN is a well-known mandatory attribute that defines the origin of the path information. The data octet can assume the following values: Value Meaning 0 IGP - Network Layer Reachability Information is interior to the originating AS 1 EGP - Network Layer Reachability Information learned via the EGP protocol [RFC904] 2 INCOMPLETE - Network Layer Reachability Information learned by some other means Usage of this attribute is defined in 5.1.1. In 5.1.1: ORIGIN is a well-known mandatory attribute. The ORIGIN attribute shall be generated by the speaker that originates the associated routing information. Its value SHOULD NOT be changed by any other Lange [Page 41] INTERNET DRAFT May 2003 speaker." This met with agreement. This issue is at consensus. 2.32.2 BGP Destination-based Forwarding Paradigm Status: Consensus Change: Yes Summary: After much discussion, this is the consensus: This text in the current draft: To characterize the set of policy decisions that can be enforced using BGP, one must focus on the rule that a BGP speaker advertises to its peers (other BGP speakers which it communicates with) in neighboring ASs only those routes that it itself uses. This rule reflects the "hop-by-hop" routing paradigm generally used throughout the current Internet. Note that some policies cannot be supported by the "hop-by-hop" routing paradigm and thus require techniques such as source routing (aka explicit routing) to enforce. For example, BGP does not enable one AS to send traffic to a neighboring AS intending that the traffic take a different route from that taken by traffic originating in the neighboring AS. On the other hand, BGP can support any policy conforming to the "hop-by-hop" routing paradigm. Since the current Internet uses only the "hop-by-hop" inter-AS routing paradigm and since BGP can support any policy that conforms to that paradigm, BGP is highly applicable as an inter-AS routing protocol for the current Internet. will be replaced in -18 with the following text: Routing information exchanged via BGP supports only the destination- based forwarding paradigm, which assumes that a router forwards a packet based solely on the destination address carried in the IP header of the packet. This, in turn, reflects the set of policy decisions that can (and can not) be enforced using BGP. Note that some policies cannot be supported by the destination-based forwarding paradigm, and thus require techniques such as source routing (aka explicit routing) to be enforced*. Such policies can not be enforced using BGP either. For example, BGP does not enable one AS to send traffic to a neighboring AS for forwarding to some destination (reachable through but) beyond that neighboring AS intending that the traffic take a different route to that taken by the traffic originating in the neighboring AS (for that same destination). On the other hand, BGP can support any policy conforming to the destination-based forwarding paradigm. Discussion: Lange [Page 42] INTERNET DRAFT May 2003 In response to these proposals, Yakov proposed that the real problem is that it is not clear that BGP is build to support the destination- based forwarding paradigm. To fix this, it was proposed that: To characterize the set of policy decisions that can be enforced using BGP, one must focus on the rule that a BGP speaker advertises to its peers (other BGP speakers which it communicates with) in neighboring ASs only those routes that it itself uses. This rule reflects the "hop-by-hop" routing paradigm generally used throughout the current Internet. Note that some policies cannot be supported by the "hop-by-hop" routing paradigm and thus require techniques such as source routing (aka explicit routing) to enforce. For example, BGP does not enable one AS to send traffic to a neighboring AS intending that the traffic take a different route from that taken by traffic originating in the neighboring AS. On the other hand, BGP can support any policy conforming to the "hop-by-hop" routing paradigm. Since the current Internet uses only the "hop-by-hop" inter-AS routing paradigm and since BGP can support any policy that conforms to that paradigm, BGP is highly applicable as an inter-AS routing protocol for the current Internet. Routing information exchanged via BGP supports only the destination- based forwarding paradigm, which assumes that a router forwards a packet based solely on the destination address carried in the IP header of the packet. This, in turn reflects the set of policy decisions that can (and can not) be enforced using BGP. Note that some policies cannot be supported by the destination-based forwarding paradigm and thus require techniques such as source routing (aka explicit routing) to enforce. Such policies can not be enforced using BGP either. For example, BGP does not enable one AS to send traffic to a neighboring AS intending that the traffic take a different route from that taken by traffic originating in the neighboring AS. On the other hand, BGP can support any policy conforming to the destination-based forwarding paradigm. Curtis thinks the newer text here is more clear. In response to the new text, Christian Martin proposed a slightly different new text: Routing information exchanged via BGP supports only the destination- based forwarding paradigm, which assumes that a router forwards a packet based solely on the destination address carried in the IP header of the packet. This, in turn reflects the set of policy Lange [Page 43] INTERNET DRAFT May 2003 decisions that can (and can not) be enforces using BGP. Note that some policies cannot be supported by the destination-based forwarding paradigm and thus require techniques such as source routing (aka explicit routing) to enforce. Such policies can not be enforced using BGP either. For example, BGP does not enable one AS to send traffic to a neighboring AS based on prefixes originating from the local AS. On the other hand, BGP can support any policy conforming to the destination-based forwarding paradigm. To which Yakov replied: Routing information exchanged via BGP supports only the destination- based forwarding paradigm, which assumes that a router forwards a packet based solely on the destination address carried in the IP header of the packet. This, in turn, reflects the set of policy decisions that can (and can not) be enforces using BGP. Note that some policies cannot be supported by the destination-based forwarding paradigm, and thus require techniques such as source routing (aka explicit routing) to enforce. Such policies can not be enforced using BGP either. For example, BGP does not enable one AS to send traffic through a neighboring AS to some destination (which is outside of the neighboring AS, but is reachable through the neighboring AS) intending that the traffic take a different route from that taken by the traffic to the same destination that originating in the neighboring AS. On the other hand, BGP can support any policy conforming to the destination-based forwarding paradigm. And Chris responded: Routing information exchanged via BGP supports only the destination- based forwarding paradigm, which assumes that a router forwards a packet based solely on the destination address carried in the IP header of the packet. This, in turn, reflects the set of policy decisions that can (and can not) be enforces using BGP. Note that some policies cannot be supported by the destination-based forwarding paradigm, and thus require techniques such as source routing (aka explicit routing) to enforce. Such policies can not be enforced using BGP either. For example, BGP does not enable one AS to send traffic through a neighboring AS to some destination beyond the neighboring AS intending that the traffic take a different route from that taken by traffic to the same destination which originates in the neighboring AS. In other words, the BGP policy of a local AS cannot affect the downstream (aka, away from the local AS) forwarding policy of a remote AS. On the other hand, BGP can support any policy conforming to the destination-based forwarding paradigm. Tom Petch preferred Yakov's second formulation, with these changes: Lange [Page 44] INTERNET DRAFT May 2003 policies can not be enforced using BGP either. For example, BGP does not enable one AS to send traffic ! to a neighboring AS for forwarding to some destination (reachable through but) beyond ! that neighboring AS intending that ! the traffic take a different route to that taken by the traffic ! originating in the neighboring AS (for that same destination). On the other hand, BGP can support any policy conforming to the destination-based forwarding paradigm. Yakov agreed to Tom's suggested changes. 2.33 Add "Optional Non-Transitive" to the MED Section Status: Consensus Change: Yes Summary: Add "Optional Non-Transitive" to MED Section Add "well-known mandatory" to the NEXT_HOP Section Discussion: 4. P.23, change the following: "The MULTI_EXIT_DISC attribute may be used on external (inter-AS) links to discriminate among multiple exit or entry points to the same neighboring AS ..." To the following: "The MULTI_EXIT_DISC is an optional non-transitive attribute which may be used on external (inter-AS) links to discriminate among multiple exit or entry points to the same neighboring AS ..." A responder disagreed, and stated reasons "covered elsewhere" Original commenter asked for reasons, since the modification seemed obvious to him. Yakov agreed to make this change in -18. Jonathan replied that: 5.1.3 NEXT_HOP also, it is missing " well-known mandatory". Yakov also agreed to make this change. 2.34 Timer & Counter Definition Status: Consensus Lange [Page 45] INTERNET DRAFT May 2003 Change: No Summary: No discussion, no text proposed, defaults to consensus for no change. Discussion: 5. In section 8, there are a number of Timers, Counters, etc. that need to be explicitly defined before they are used by the FSM. Perhaps these definitions should go in the Glossary section. There has been no further discussion on this issue. Unless it is brought up again, this issue is in consensus, with no change. 2.35 Fix Typo Status: Consensus Change: Yes Summary: Fix a Typo. No discussion, but this seem clear. Discussion: 1. P. 41. Typing error, "Each time time the local system...". 2.36 Add Adj-RIB-In, Adj-RIB-Out and Loc-RIB to the Glossary Status: Consensus Change: Yes Summary: This change requires a glossary. Yakov has committed to having a section where commonly used terms are defined in draft 18, so this issue is at consensus. Discussion: 2. Section 9.1, Need to have Adj-RIB-In, Adj-RIB-Out, and Loc-RIB in the glossary, so when they are used in section 9.1, it is well understood what they are. Yakov replied: will be added to the section "Definition of commonly used terms" in -18 version. 2.37 Combine "Unfeasible Routes" and "Withdrawn Routes" Status: Consensus Change: Yes Summary: Add the following terms to the "commonly used terms section": Lange [Page 46] INTERNET DRAFT May 2003 Feasible route A route that is available for use. Unfeasible route A previously advertised feasible route that is no longer available for use. Discussion: 3. P. 45, Phase I, There is no definition of what are unfeasible routes? Are they the same as withdrawn routes? If so, the two should be combined to one name. Ishi replied to this that he thought that we could combine the two terms, since there is limited difference from an implementation standpoint. Yakov replied: The routes are withdrawn from service because they are unfeasible, not because they are "withdrawn". So, we need to keep the term "unfeasible" to indicate the *reason* why a route could be withdrawn. On the other hand, "withdrawn" is used as a verb, and to the best of my knowledge "unfeasible" can't be used as a verb. With this in mind, I don't think that we can combine the two into a single term. Ishi replied that he was convinced, and that the terms should stay separate. Andrew asked the list if we should define these terms in the "commonly used terms" section in draft -18. Ben replied that if we use them a lot, we should define them, and if not local definitions will suffice. There was some back and forth about the necessity of defining terms which should be obvious. mrr actually checked the doc to see if we were consistently using the terms, and found: It turns out there there is an inconsistency in the usage of the word withdrawn. Section 3.1: There are three methods by which a given BGP speaker can indicate that a route has been withdrawn from service: ... Lange [Page 47] INTERNET DRAFT May 2003 b) a replacement route with the same NLRI can be advertised, or ... Later, in the definition of Withdrawn Routes Length, we have: A value of 0 indicates that no routes are being withdrawn from service, Taken together, this could be construed as meaning that a Withdrawn Routes Length of 0 indicates that all routes included in the UPDATE represent newly feasible routes... not replacement routes. Now, it's possible that this problem has been removed by changes to the text that have not yet been incorporated in to a new draft; however, it arose because the text, for the most part, does _not_ use "withdrawn" in the standard way. Instead, it refers to routes included in the WITHDRAWN ROUTES field of an UPDATE message. Consequently, I propose defining a "withdrawn route" as follows: Withdrawn route: a route included in the WITHDRAW ROUTES field of an UPDATE message. Regardless of whether or not this definition is included, Section 3.1 should be changed from: There are three methods by which a given BGP speaker can indicate that a route has been withdrawn from service: to: There are three methods by which a given BGP speaker can indicate that a route has been removed from service: or: There are three methods by which a given BGP speaker can indicate that a route is now unfeasible: After some further off-list discussion, mrr agreed that this inconsistency is extremely minor, and withdrew his comment. feasible and unfeasible route will be defined in the "commonly used terms" section to clear up any confusion. 2.38 Clarify Outbound Route Text Status: Consensus Change: No Lange [Page 48] INTERNET DRAFT May 2003 Summary: Consensus that the issue was sufficiently minor to leave things alone. Discussion: 4. P. 50, line, "If a route in Loc-RIB is excluded from a particular Adj-RIB-Out the previously advertised route in that Adj-RIB-Out must be withdrawn from service by means of an UPDATE message (see 9.2)." Would like to rephrase the sentence for clarity, "If a route in Loc- RIB is excluded from a particular Adj-RIB-Out and was previously advertised via Adj-RIB-Out, it must be withdrawn from service by means of an UPDATE message (see 9.2)." One comment suggested either leave it alone, or remove "via Adj-RIB- Out". The original commenter withdrew the comment. 2.39 Redundant Sentence Fragments Status: Consensus Change: Yes Summary: Fix typo & parentheses. Discussion: 5. P. 50, section 9.1.4, The two fragments of this sentence are redundant and don't say anything new or simplify the content. Just keep one fragment. "A route describing a smaller set of destinations (a longer prefix) is said to be more specific than a route describing a larger set of destinations (a shorted prefix); similarly, a route describing a larger set of destinations (a shorter prefix) is said to be less specific than a route describing a smaller set of destinations (a longer prefix)." There was a comment that disagreed, thinking that both "more specific" and "less specific" need to be defined. And suggested that only the third and forth parentheses need to be dropped. The original commenter agreed with the parentheses changes. Yakov agreed to drop the third and fourth parentheses in the -18 version. Jonathan replied to this: Lange [Page 49] INTERNET DRAFT May 2003 Disagree, the text if fine the way it is, except you need to change "shorted" to "shorter". After minimal further discussion, it was decided we are at a consensus on this issue to fix the typo and drop the third and fourth parentheses. 2.40 Section 9.2.1.1 - Per Peer vs. Per Router MinRouteAdvertisementInterval Status: Consensus Change: No Summary: The consensus is that current practice allows for the MinRouteAdvertisementInterval to be set per peer, so the text should be kept the same. Discussion: 6. P. 52, section 9.2.1.1 Change this sentence for clarity, "This rate limiting procedure applies on a per-destination basis, although the value of MinRouteAdvertisementInterval is set on a per BGP peer basis." To This: "This rate limiting procedure applies on a per-destination basis, although the value of MinRouteAdvertisementInterval is set on a BGP router (same value for all peers) basis." There was a comment disagreeing with this proposal. It was later elaborated on to include that the reason for disagreement was that the proposed changes changed the protocol and not just a practice clarification. Ben responded asking for how this is a protocol change, he saw it as a clarification. Perhaps there is something deeper that needs to be clarified? Again, response to this is that current implementations allow the MinRouteAdvertisementInterval to be set per-peer, not per-router. Original reviewer conceded the point. There was some additional discussion on this point. Most of it was along the lines of extracting what was really implemented and supported among various vendors. The conclusion was the same. 2.41 Mention FSM Internal Timers Status: Consensus Change: No Summary: No discussion on this issue. No text proposed. Perhaps this is in the FSM section of the draft? Either way, it defaults to Lange [Page 50] INTERNET DRAFT May 2003 consensus with no change. Discussion: 7. P. 61, item 6.4. Although all the BGP protocol interfacing timers are mentioned, there are a few FSM internal timers mentioned in the spec that need to be covered here as well. There has been no discussion on this, it now defaults to consensus with no change. 2.42 Delete the FSM Section Status: Consensus Change: No Summary: There was some confusion on the question: Is the FSM draft going to be a separate document, or incorporated into the base draft. The consensus is that it is going to become part of the base draft, so the FSM section will be kept, and elaborated on. Discussion: 8. Since there is going to be an FSM spec, do we need to have FSM descriptions in this spec. Maybe the FSM section should be delete. There was one response agreeing with this. One response asking for clarification: Was this a move to remove section 8. Finite State Machine from the base draft?? The original reviewer said, yes, when Sue's FSM draft becomes a WG document, we should remove section 8 from the base draft. Yakov asked that the AD's provide input on this suggestion. Alex responded saying that the FSM draft is going to be part of the base spec, and not another document once the FSM words are approved. 2.43 Clarify the NOTIFICATION Section Status: Consensus Change: Yes Summary: Replace: "If a peer sends a NOTIFICATION message, and there is an error in that message, there is unfortunately no means of reporting this error via a subsequent NOTIFICATION message." With: If a peer sends a NOTIFICATION message, and the receiver of the Lange [Page 51] INTERNET DRAFT May 2003 message detects an error in that message, the receiver can not use a NOTIFICATION message to report this error back to the peer. Discussion: The "NOTIFICATION message error handling" thread proposed: Please change" "If a peer sends a NOTIFICATION message, and there is an error in that message, there is unfortunately no means of reporting this error via a subsequent NOTIFICATION message." To: "If a peer receives a NOTIFICATION message, and there is an error in that message, there is unfortunately no means of reporting this error via a subsequent NOTIFICATION message." This reversal of meaning met with disagreement, and this text was proposed instead: All errors detected while processing the NOTIFICATION message cannot be indicated by sending subsequent NOTIFICATION message back to originating peer, therefore there is no means of reporting NOTIFICATION message processing errors. Any error, such as an unrecognized Error Code or Error Subcode, should be noticed, logged locally, and brought to the attention of the administration of the peer that has sent the message. The means to do this, however, lies outside the scope of this document. The original posted agreed with the intent of the respondent's text, thought it was too wordy, but did not propose alternate text. Yakov replied with this proposed text: If a peer sends a NOTIFICATION message, and the receiver of the message detects an error in that message, the receiver can not use a NOTIFICATION message to report this error back to the peer. Two responses liked this new text. Unless there are objections, we'll consider that a consensus. 2.44 Section 6.2: OPEN message error handling Status: Consensus Change: No Summary: One commenter observed that the spec seems to specify behavior that doesn't seem to be observed by extant implementations, and suggested modifications to the spec. They were later reminded that the base behavior is acceptable, and agreed. Lange [Page 52] INTERNET DRAFT May 2003 Discussion: The "BGP4 draft ; section 6.2" thread began with a discussion of section 6.2: OPEN message error handling. Specifically: "If one of the optional parameters in the Open message is not recognized, then the error subcode is set to 'unsupported optional parameters" We have hit on this line when we were testing a BGP connection between a speaker that supported capability negotiation and a speaker that did not. The speaker that did not support the negotiation closed down the peering session using the error clause mentioned above. Sometimes this lead to the other router to repeat the OPEN message with the Capability optional parameter; a game that went on for minutes. This router manufacturer stated in a reply to this that : "One should not close down the connection if an optional parameter is unrecognized. That would make this parameter basically mandatory. This is an well known error in the BGP spec. Neither Cisco or Juniper do this" If this is true it might be good to adapt the text. The response to this quoted RFC2842, Capabilities Advertisement with BGP-4: A BGP speaker determines that its peer doesn't support capabilities advertisement, if in response to an OPEN message that carries the Capabilities Optional Parameter, the speaker receives a NOTIFICATION message with the Error Subcode set to Unsupported Optional Parameter. In this case the speaker should attempt to re-establish a BGP connection with the peer without sending to the peer the Capabilities Optional Parameter. The original poster responded: This section from the Capabilities Advertisement RFC, is indeed inline with the section 6.2 of the BGP4 specification. For me however the question remains if most implementations do no simply ignore optional parameters that are unknown. And if so, if the text stated above reflects what is implemented by routers that do not have capability advertisement at all. Yakov replied to this with: Lange [Page 53] INTERNET DRAFT May 2003 RFC2842 assumes that a router (that doesn't implement RFC2842) would close the BGP session when the router receives an OPEN message with an unrecognized Optional Parameter. Therefore the text in the spec should be left unmodified. The original poster, Jonathan, agreed with this. This issue moves to consensus. 2.45 Consistent References to BGP Peers/Connections/Sessions Status: Consensus Change: Yes Summary: Stick with "BGP Connection" as the consistent term. Discussion: Ben proposed and Yakov responded: > 1. Throughout the document we have various ways of naming the BGP > peering communication. 1) BGP Session, 2) BGP Peering Session, I'll replace "session" with "connection". > 3) TCP Connection, The spec doesn't name BGP peering communication as "TCP connection"; TCP connection is used to establish BGP connection. So, TCP connection and BGP connection are two different things. > 4) BGP Connection, The spec is going to use this term (see above). > 5) BGP Peering Connection, I'll replace "BGP peering connection" with "BGP connection". > 6) Connection, The text uses "connection" whenever it is clear from the context that it refers to "BGP connection" (or "TCP connection"). > 7) BGP Speaker Connection. I'll replace "BGP Speaker Connection" with "BGP connection". > > BGP router: 1) BGP Speaker, 2) speaker, 3)local speaker Lange [Page 54] INTERNET DRAFT May 2003 The term "speaker" is used when it is clear from the context that we are talking about "BGP speaker". > 2. Change Internal peer to IBGP Peer. IBGP stands for "BGP connection between internal peers". Therefore the term "IBGP Peer" would mean "BGP connection between internal peers peer". That doesn't seem appropriate. This issue has had some discussion, and section 3 was referenced, specifically: Refer to Section 3 - Summary of operations which clearly states that " .. a peer in a different AS is referred to as an external peer, while a peer in the same AS may be described as an internal peer. Internal BGP and external BGP are commonly abbreviated IBGP and EBGP" After more discussion it was decided that we should modify a paragraph on page 4 to read: If a particular AS has multiple BGP speakers and is providing transit service for other ASs, then care must be taken to ensure a consistent view of routing within the AS. A consistent view of the interior routes of the AS is provided by the IGP used within the AS. For the purpose of this document, it is assumed that a consistent view of the routes exterior to the AS is provided by having all BGP speakers within the AS maintain IBGP with each other. Care must be taken to ensure that the interior routers have all been updated with transit information before the BGP speakers announce to other ASs that transit service is being provided. This change has consensus. > 3. Change External peer to EBGP Peer. Ditto. Alex responded that having explicit definitions would be nice. This ties into the general glossary suggestion (see issues 16, 34 & 36). He also suggested that: "BGP session" which works over a "TCP connection" would be closer to the terminology we're actually using now and would avoid possible confusions when people read terms like "Connection collision") This was discussed in the "Generial Editorial Comment" thread. Lange [Page 55] INTERNET DRAFT May 2003 After some further discussion, it was decided that, due to existing implementations, we should go with "BGP connection" as the consistent term. We are at consensus. 2.46 FSM Connection Collision Detection Status: Consensus Change: Yes Summary: Add this to section 8: There is one FSM per connection. Prior to determining what peer a connection is associated with there may be two connections for a given peer. There should be no more than one connection per peer. The collision detection identifies the case where there is more than one connection per peer and provides guidance for which connection to get rid of. When this occurs, the corresponding FSM for the connection that is closed should be disposed of. Discussion: The original reviewer (Tom) commented that the base draft, FSM section, could use some clarification around the area of connection collision detection. Specifically, he argued that it seems like there are actually 2 FSM's depending on which one backs off in the case of a collision. He proposed this text to clear things up: "8 BGP Finite State Machine This section specifies BGP operation - between a BGP speaker and its peer over a single TCP connection - in terms of a Finite State Machine (FSM). Following is a brief summary ... "(as before) Instead of just "This section specifies BGP operation in terms of a Finite State Machine (FSM). Following is a brief summary ... "(as before). Curtis responded: There is one FSM per connection. Prior to determining what peer a connection is associated with there may be two connections for a given peer. There should be no more than one connection per peer. The collision detection identifies the case where there is more than one connection per peer and provides guidance for which connection to get rid of. When this occurs, the corresponding FSM for the connection that is closed should be disposed of. I'm not sure which document containing an FSM we should be reading at Lange [Page 56] INTERNET DRAFT May 2003 this point, but we could add the above paragraph if we need to explicitly state that the extra connection and its FSM is disposed of when a collision is detected. When a TCP accept occurs, a connection is created and an FSM is created. Prior to the point where the peer associated with the connection is known the FSM cannot be associated with a peer. The collision is a transient condition in which the rule of "one BGP session per peer" is temporarily violated and then corrected. This is discussed in the "FSM but FSM of what?" thread. Sue responded that she would be happy to add Curtis' text to section 8 and solicited any additional comments. There was only one on capitalization, so this issue is at consensus. 2.47 FSM - Add Explicit State Change Wording Status: Consensus Change: No Summary: A desire for explicit state change wording was expressed. No text was proposed. The assumption is that this issue has reached a happy conclusion. Discussion: The initial reviewer: In most places, the actions taken on the receipt of an event include what the new state will be or that it remains unchanged. But there are a significant number of places where this is not done (eg Connect state events 14, 15, 16). I would like to see consistency, always specify the new/unchanged state. Else I may be misreading it. There was a response asking for specific text, and offering to take the discussion private. This is discussed in the "FSM words - state changes" thread. There has been no further discussion on this. The assumption is that is has reached a happy conclusion privately. 2.48 Explicitly Define Processing of Incoming Connections Status: Consensus Change: Yes Summary: Add text that is at the end of the discussion to section 8. Lange [Page 57] INTERNET DRAFT May 2003 Discussion: Alex suggested we explicitly define: - processing of incoming TCP connections (peer lookup, acceptance, FSM creation, collision control,) Curtis later proposed this text: BGP must maintain separate FSM for each configured peer. Each BGP peer paired in a potential connection will attempt to connect to the other. For the purpose of this discussions, the active or connect side of a TCP connection (the side sending the first TCP SYN packet) is called outgoing. The passive or listening side (the sender of the first SYN ACK) is called the an incoming connection. A BGP implementation must connect to and listen on TCP port 179 for incoming connections in addition to trying to connect to peers. For each incoming connection, a state machine must be instantiated. There exists a period in which the identity of the peer on the other end of an incoming connection is not known with certainty. During this time, both an incoming and outgoing connection for the same peer may exist. This is referred to as a connection collision (see Section x.x, was 6.8). A BGP implementation will have at most one FSM for each peer plus one FSM for each incoming TCP connection for which the peer has not yet been identified. Each FSM corresponds to exactly one TCP connection. Jonathan pointed out that there was an inaccuracy in the proposed text. Curtis replied with this: You're correct in that you must have a collision of IP addresses on the TCP connections and that the BGP Identifier is used only to resolve which gets dropped. The FSM stays around as long as "BGP Identifier" is not known. Replace "not known with certainty" with "known but the BGP identifier is not known" and replace "for the same peer" with "for the same configured peering". The first paragraph is unchanged: BGP must maintain separate FSM for each configured peer. Each BGP peer paired in a potential connection will attempt to connect to the other. For the purpose of this discussions, the active or connect side of a TCP connection (the side sending the first TCP SYN packet) is called outgoing. The passive or listening side (the sender of the Lange [Page 58] INTERNET DRAFT May 2003 first SYN ACK) is called the an incoming connection. The second paragraph becomes: A BGP implementation must connect to and listen on TCP port 179 for incoming connections in addition to trying to connect to peers. For each incoming connection, a state machine must be instantiated. There exists a period in which the identity of the peer on the other end of an incoming connection is known but the BGP identifier is not known. During this time, both an incoming and outgoing connection for the same configured peering may exist. This is referred to as a connection collision (see Section x.x, was 6.8). The next paragraph then needs to get fixed. Changed "for each peer" to "for each configured peering". A BGP implementation will have at most one FSM for each configured peering plus one FSM for each incoming TCP connection for which the peer has not yet been identified. Each FSM corresponds to exactly one TCP connection. Add a paragraph to further clarify the point you made. There may be more than one connection between a pair of peers if the connections are configured to use a different pair of IP addresses. This is referred to as multiple "configured peerings" to the same peer. > So multiple simultaneous BGP connection are allowed between the same two > peers, and this behavior is implemented, for example to do load balancing. Good point. I hope the corrections above cover your (entirely valid) objections. If you see any more errors please let me know. Tom replied that: I take issue with the 'will attempt to connect' which goes too far. The FSM defines events 4 and 5, 'with passive Transport establishment', so the system may wait and not attempt to connect. The exit from this state is either the receipt of an incoming TCP connection (SYN) or timer expiry. So we may have a FSM attempting to transport connect for a given source/destination IP pair or we may have an FSM not attempting to Lange [Page 59] INTERNET DRAFT May 2003 connect. (In the latter case, I do not think we can get a collision). In the latter case, an incoming connection should not generate an additional FSM. I do not believe the concept of active and passive is helpful since a given system can flip from one to the other and it does not help us to clarify the number of FSM involved.. And Curtis suggested that: Could this be corrected by replacing "will attempt to connect" with "unless configured to remain in the idle state, or configured to remain passive, will attempt to connect". We could also shorten that to "will attempt to connect unless configured otherwise". Clarification (perhaps an item for a glossary entry): The terms active and passive have been in our vocabulary for almost a decade and have proven useful. The words active and passive have slightly different meanings applied to a TCP connection or applied to a peer. There is only one active side and one passive side to any one TCP connection as per the definition below. When a BGP speaker is configured passive it will never attempt to connect. If a BGP speaker is configured active it may end up on either the active or passive side of the connection that eventually gets established. Once the TCP connection is completed, it doesn't matter which end was active and which end was passive and the only difference is which side of the TCP connection has port number 179. Tom agreed with Curtis, that he liked the "will attempt to connect unless configured otherwise" verbiage. This was discussed in the "Generial Editorial Comment" thread. Sue proposed we add the text above in section 8.2. It is summarized here for clarity: 8.2) Description of FSM 8.2.1) FSM connections (text below) 8.2.2) FSM Definition (text now in 8.2) "BGP must maintain a separate FSM for each configured peer plus Each Lange [Page 60] INTERNET DRAFT May 2003 BGP peer paired in a potential connection unless configured to remain in the idle state, or configured to remain passive, will attempt to to connect to the other. For the purpose of this discussion, the active or connect side of the TCP connection (the side of a TCP connection (the side sending the first TCP SYN packet) is called outgoing. The passive or listening side (the sender of the first SYN ACK) is called an incoming connection. [See section on the terms active and passive below.] A BGP implementation must connect to and listen on TCP port 179 for incoming connections in addition to trying to connect to peers. Fro each incoming connection, a state machine must be instantiated. There exists a period in which the identity of the peer on the other end of an incoming connection is known but the BGP identifier is not known. During this time, both an incoming and an outgoing connection for the same configured peering may exist. This is referred to as a connection collision (see Section x.x, was 6.8). A BGP implementation will have at most one FSM for each configured peering plus one FSM for each incoming TCP connection for which the peer has not yet been identified. Each FSM corresponds to exactly one TCP connection. There may be more than one connections between a pair of peers if the connections are configured to use a different pair of IP addresses. This is referred to as multiple "configured peerings" to the same peer. 8.2.1.1) Terms "active" and "passive" The terms active and passive have been in our vocabulary for almost a decade and have proven useful. The words active and passive have slightly different meanings applied to a TCP connection or applied to a peer. There is only one active side and one passive side to any one TCP connection per the definition above [and the state machine below.] When a BGP speaker is configured active it may end up on either the active or passive side of the connection that eventually gets established. Once the TCP connection is completed, it doesn't matter which end was active and which end was passive and the only difference is which side of the TCP connection has port number 179. For additional text, see issue 46. Sue solicited additional comments, the only one was on capitalization, so it would appear we are at consensus with this issue. Lange [Page 61] INTERNET DRAFT May 2003 2.49 Explicitly Define Event Generation Status: Consensus Change: No Summary: Suggested that we explicitly define BGP message processing. No text proposed. There has been no further discussion on this issue, it is assumed that the consensus is that things are ok the way they are. Discussion: Alex suggested we explicitly define: - generation of events while processing BGP messages, i.e., the text describing message processing should say where needed that a specific event for the BGP session should be generated. No text was proposed. This discussion has received no further comment. Unless someone wants to reopen it, it is assumed it has reached a happy ending. This was discussed in the "Generial Editorial Comment" thread. 2.50 FSM Timers Status: Consensus Change: No Summary: Discussion tabled, because new document version rendered the discussion moot. Discussion: This discussion began with a suggestion that the timers currently in the FSM: In the 26 Aug text, I find the timer terminology still confusing. Timers can, I find, stop start restart clear set reset expire Can be cleaned up and simplified to: start with initial value (spell it out just to be sure) stop expire A response to this proposal was, that the existing set is clear, and that the proposed set is insufficiently rich to describe a concept like "reset" which encompasses: "Stop the timer, and reset it to its initial value." Lange [Page 62] INTERNET DRAFT May 2003 This discussion reached an impasse, when Sue pointed out that the text had been updated, and to please review the new text. This was discussed in the "FSM more words" thread. 2.51 FSM ConnectRetryCnt Status: Consensus Change: No Summary: Discussion tabled, because new document version rendered the discussion moot. Discussion: This started with the ob