bgpd: allow batch handling of peer shutdown/failure #17505

mjstapp · 2024-11-25T21:20:29Z

When a peer connection fails or is closed, bgp does cleanup processing on a per-peer basis. At scale, this can become a problem - bgp can be forced to make a complete rib walk to clean up for each peer involved. This PR makes peer error-handling more visible at the bgp object level, and then adds a batching path if there are multiple peers who need cleanup/clearing processing at the same time.

Replace the per-peer connection error with a per-bgp event and a list. The io pthread enqueues peers per-bgp-instance, and the error-handing code can process multiple peers if there have been multiple failures.
When peer connections encounter errors, attempt to batch some of the clearing processing that occurs. Add a new batch object, add multiple peers to it, if possible. Do one rib walk for the batch, rather than one walk per peer. Use a handler callback per batch to check and remove peers' path-infos, rather than a work-queue and callback per peer. The original clearing code remains; it's used for single peers.

Replace the per-peer connection error with a per-bgp event and a list. The io pthread enqueues peers per-bgp-instance, and the error-handing code can process multiple peers if there have been multiple failures. Signed-off-by: Mark Stapp <[email protected]>

Remove a couple of apis that don't exist. Signed-off-by: Mark Stapp <[email protected]>

ton31337

Very nice improvement ahead!

ton31337 · 2024-11-26T07:19:43Z

tests/topotests/bgp_peer_shut/r1/bgpd.conf

Can we switch to frr.conf (unified config)?

When peer connections encounter errors, attempt to batch some of the clearing processing that occurs. Add a new batch object, add multiple peers to it, if possible. Do one rib walk for the batch, rather than one walk per peer. Use a handler callback per batch to check and remove peers' path-infos, rather than a work-queue and callback per peer. The original clearing code remains; it's used for single peers. Signed-off-by: Mark Stapp <[email protected]>

Move the peer connection error list to the peer_connection struct; that seems to line up better with the way that struct works. Signed-off-by: Mark Stapp <[email protected]>

Add a simple topotest using multiple bgp peers; based on the ecmp_topo1 test. Signed-off-by: Mark Stapp <[email protected]>

mjstapp · 2024-11-26T13:20:27Z

Pushed to try to clean up the build problem

riw777

looks good ... waiting on @ton31337 's one comment

Mark Stapp added 2 commits November 25, 2024 14:13

bgpd: remove apis from bgp_route.h

affc54a

Remove a couple of apis that don't exist. Signed-off-by: Mark Stapp <[email protected]>

frrbot bot added bgp tests Topotests, make check, etc zebra labels Nov 25, 2024

github-actions bot added size/XXL master labels Nov 25, 2024

ton31337 reviewed Nov 26, 2024

View reviewed changes

tests/topotests/bgp_peer_shut/r1/bgpd.conf

Copy link

Member

ton31337 Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we switch to frr.conf (unified config)?

Mark Stapp added 3 commits November 26, 2024 08:18

zebra: move peer conn error list to connection struct

cdd3a61

Move the peer connection error list to the peer_connection struct; that seems to line up better with the way that struct works. Signed-off-by: Mark Stapp <[email protected]>

tests: add bgp peer-shutdown topotest

0d2605a

Add a simple topotest using multiple bgp peers; based on the ecmp_topo1 test. Signed-off-by: Mark Stapp <[email protected]>

mjstapp force-pushed the bgp_peer_shut branch from 912adc5 to 0d2605a Compare November 26, 2024 13:19

riw777 approved these changes Nov 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bgpd: allow batch handling of peer shutdown/failure #17505

bgpd: allow batch handling of peer shutdown/failure #17505

mjstapp commented Nov 25, 2024

ton31337 left a comment

ton31337 Nov 26, 2024

mjstapp commented Nov 26, 2024

riw777 left a comment

bgpd: allow batch handling of peer shutdown/failure #17505

Are you sure you want to change the base?

bgpd: allow batch handling of peer shutdown/failure #17505

Conversation

mjstapp commented Nov 25, 2024

ton31337 left a comment

Choose a reason for hiding this comment

ton31337 Nov 26, 2024

Choose a reason for hiding this comment

mjstapp commented Nov 26, 2024

riw777 left a comment

Choose a reason for hiding this comment