-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too frustrating: converting from ExaBGP 3.4 to 4.2.21 #1179
Comments
You have quite eloquently expressed how bad I am at being responsible for the maintenance of this code and I will gladly accept some help: I gave you write access to the repository years ago.
Now if you wish to know why these decisions were made, even bone-headed ones, happy to explain. Many came from trying to be nice to users, a mistake I do not do as much anymore, and trying to not add too much complexity to a code base which was authored in 2009, before async, and is way from ideal and would require weeks of work to bring to modern python standards and even more to the level of quality of rustyBGP. Some people find useful and I am doing my best to keep it working. I worked on many prototypes over the years in one in Go - with some brain dead ideas, one V - which has decoding and the start of a Yang based cli, and even within this repo for my first attempt to Yang support. I am also not as young as I used to be. I have a business to run which 50 people rely on for their living. I also need time to relax to remain healthy and time for my family so it is a case of https://xkcd.com/2347/ still open from you: and some other motivational issue: |
Well, now, I realize that was deserved, and I'm sorry if my rant rubbed you the wrong way.
Getting write access to e.g. keep the exabgp.conf man page updated is one thing, decoding the actual supported grammar from the code is quite another, and me being a python newbie doesn't exactly help in that endavour. I'll see what I can come up with on that front. |
I have removed some of the "marketing" blurb on the README. I will look at |
I appreciate and appreciated your help over the years. I currently have code in 4.2.22 which I believe is not correct for all users (following the acceptance of a patch from a user) and can not release 4.2.22 without a rollback of his contribution or figuring out why my own testing led me to believe there is an issue but the main branch should be fine. |
I am surprised to hear that normal announce did not work for you, I have a test to make sure it did: ❯ ./qa/bin/functional encoding 2
✖ ✖ ✓ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖
❯ ./qa/bin/functional encoding --list
The available tests are:
0 api-add-remove L api-no-respawn g conf-ipv46routes4family
1 api-announce-star M api-notification h conf-ipv46routes6family
2 api-announce N api-open i conf-ipv6grouping
3 api-announcement O api-reload j conf-l2vpn
4 api-api P api-rib k conf-largecommunity
5 api-attributes-path Q api-rr-rib l conf-name
6 api-attributes-vpn R api-teardown m conf-new-v4
7 api-attributes S api-vpls n conf-new-v6
8 api-broken-flow T api-vpnv4 o conf-no-asn4
9 api-check U conf-addpath p conf-parity
A api-eor V conf-aggregator q conf-path-information
B api-fast W conf-attributes r conf-prefix-sid
C api-flow X conf-ebgp s conf-split
D api-ipv4 Y conf-extended-attributes t conf-srv6-mup
E api-ipv6 Z conf-flow-redirect u conf-template
F api-manual-eor a conf-flow v conf-unknowncap
G api-multi-neighbor b conf-generic-attribute w conf-vpn
H api-multiple-api c conf-group-limit x conf-watchdog
J api-nexthop-self e conf-ipself4
|
ah! you have two sessions, it may be why ... |
Do you want to try this totally untested patch:
Will do first thing after tomorrows morning meeting.
Thanks!
- Håvard
|
Hm, the first obstacle I butted into was that healthcheck.py didn't want to accept I'm wondering if
However, with that I still get no matching neighbors (with tweaked logging):
So I expanded logging in announce.py to say
and with that I get logged
So... This, unfortunately, comes back to the documentation of |
Does this refer to this change (#1128) ? We have been running this for quite a while without issue. If you can add a ticket with the issues you see I can try and find time to help have a look to try and root cause what's happening. |
Just for reference, here's a redacted
and my slightly redacted
|
More... I've found
However, I have not been able to find out how anything ends up in
but that is way too much magic in one sentence for me to decipher and translate into how that's configured. I'm reasonably good with |
@he32 Sorry, I deleted the patch I posted here, added two commits on the same evening on the master and 4.2 branch with And for some reason I did not get emails from GH for your posts. |
@longmalx oh, thank you for following issues! Yes, I performed a test (I can not recall what) which showed that in some scenario something was off with what ended up in (or out of) the RIB - as I was in between things I did not document what caused it and was later not able to remember what it was 🤦 It may be a bug in the RIB code itself which is really ugly, right now I do not know. I am pleased to hear that you have no issue. I wrote when merging "I have merged the patch but now wonder if there will be an issue with API routes? I need to check." .. it may have been what I checked ... |
Hmm... In my case the output from the healthcheck program looks reasonably sane:
So the logging I extended is inside exabgp itself (in And ... this goes back to the configuration which I posted above -- what (if anything) is missing to make
That part I cannot explain, sorry. |
Never mind, found what's needed to tie a neighbor to a process:
inside the The issue I then get stuck on is that when processing the two lines (which occur about the same time):
The IPv4 route gets announced to the IPv4 neighbor, but the IPv6 route does
So, it stops doing anything on the first occurrance of that error for the neighbor? Hm, that pushes me in the direction of two processes, one for IPv4, one for IPv6, |
Thank you for this investigation, please give me a few days to look into fixing this. |
I looked at the code and this string, while reported as an error, does not stop any processing. The route with the wrong family for the peer will be ignored. So if the neighbours have explicit ipv4 only, and ipv6 only, it will be verbose but should work as expected. |
That did not match with what I observed at the time on the neighboring BGP speaker (a Juniper router). However, I will re-test to confirm, and look a bit closer at the issue. To be continued. |
The behaviour was to return an issue if any of the peers could not take the RIB. With the new `neighbor *` feature bulk sending routes should now be accepted. If any peer can accept the route as the family was negotiated, then the announcement was a success, only a failure if no peers could take it.
@he32 I have changed the behaviour on error returning as it made sense, but the patch above may not resolve your issue. I am still looking into it. |
Allow "neighbor *" in route announce command, to match "all configured neighbors". Ref. Exa-Networks/exabgp#1179 Adapt the healthcheck module to allow this argument. Bump PKGREVISION.
But the lack of documentation for the migration from 3.4 to 4.0 (and the need for a 5.0 release soon and the similar associated pain). Is there anything left here? |
The
exabgp.conf
syntax changed, but the documentation didn't(exabgp.conf(5)). No "group" at outer layer anymore. Apparently,
nobody took the hint to "please keep this updated" and "please make
this reasonably complete" when I submitted it earlier.
Also, ExaBGP 4 did not complain when I asked it to validate the old 3.4-based configuration file(!)
Running with
and adding the new
because I use the healthcheck module and I don't think it consumes any input, I earlier got only
in the log. Nothing else.
Commenting out "daemonize" and "destination" and running with
-d
gets more detailed information: it turns out that/var/run/exabgp/exabgp.in
andexabgp.out
were missing, they apparently need to exist as named pipes and be writable bynobody
.Now, I use ExaBGP to set up two BGP sessions, one for IPv4 and one
for IPv6, to announce members of a resolver cluster to our routers.
The healthcheck module outputs (when run interactively)
This used to work nicely with ExaBGP 3.4, but apparently will not with ExaBGP 4.2.21. Running ExaBGP in the foreground with debug reveals
So... ExaBGP is now bone-headed enough to insist on an explicit neighbor specification
from the healthcheck script, even if there is only a single address-family-matching BGP session?
That makes it impossible to share the healthcheck config file between installations, because the neighbor
address (which is "of course" site-specific) creeps over from the ExaBGP config file itself into
the healthcheck configuration file, which is unfortunate.
--help
output of healthcheck that one can specifyHowever, how
NEIGHBOR
goes from singular to plural (neighbors
) is not explained.What is the syntax for supplying multiple
NEIGHBOR
s?And ... possibly the "neighbor" parameter can also be specified in the healthcheck configuration file?
But ... one probably does not want an IPv4 route to be announced over an IPv6 BGP session, and I'm
assuming there is no logic in healthcheck to match address families for route and neighborships?
So this pushes me in the direction of having two separate healthcheck processes, one for IPv4 and
one for IPv6? That ups the config complexity and the cost for the health checking, which is again unfortunate.
Because I think the healthcheck script does not consume input
(true? I've not been able to find a definitive answer to that...),
I think I need to run with
That has the unfortunate side-effect that
exabgpcli
gets partially stuckafter executing a command, as has been noted already in #1164.
So, sorry, more of a rant to vent my frustration of the process, which is so far not finished.
I've already read https://github.com/Exa-Networks/exabgp/wiki/Migration-from-3.4-to-4.0
I also tried (and failed) to make sense of
https://github.com/Exa-Networks/exabgp/wiki/Configuration-:-Process
because this is, IMHO, to be charitable "light on explanation of actual semantics / behaviour modifications". I think I figured out that ExaBGP in 4.x needs "encoder text" added to the process section if using the "healthcheck" module, but that is by now more than a guess than anything else.
Answers or hints would be greatly appreciated.
The text was updated successfully, but these errors were encountered: