-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zebra: Import the kernel's notion of connected routes #16418
Conversation
When importing routes from the kernel, the zebra daemon ignores any routes marked as 'proto kernel', such as the link-scoped routes that the kernel generates for addresses assigned to interfaces. Instead, zebra implements its own logic to synthesise routes for each address assignment, installing them into the RIB with the ZEBRA_ROUTE_CONNECT proto set. This behaviour requires zebra to mirror the logic of the kernel, to avoid having the kernel FIB diverge from the FRR RIB, which can cause routing loops or other failures. One example of this was the recent addition of support for the 'noprefixroute' flag to zebra[0]. However, attempting to mirror the kernel behaviour this way causes problems when the mirroring is imperfect. An example of this was seen as a result of the change mentioned above, where zebra honouring the noprefixroute flag leads to routes missing from the RIB in some cases. Specifically, this happens when network management daemons set the noprefixroute on the address assignment, but subsequently installs a link-scoped route into the kernel identical to the prefix route the kernel would have installed automatically. The use case for this is enable the network management daemon to atomically change route attributes (such as route metric) on the prefix route, but otherwise keep the behaviour identical to the case where the kernel creates the prefix route itself. The failure described above was noticed for NetworkManager and reported as a NetworkManager bug[1] as well as an FRR issue[2]. Other network management daemons use the noprefixroute flag for similar purposes (e.g., systemd-networkd[3]). [0] FRRouting#14957 [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1452 [2] FRRouting#16101 [3] https://github.com/systemd/systemd/blob/main/src/network/networkd-dhcp4.c#L962 To resolve this discrepancy between the kernel FIB and the FRR RIB, this patch changes zebra's behaviour to import 'proto kernel' instead of ignoring them, and to treat routes with 'scope link' as ZEBRA_ROUTE_CONNECT routes, just like the ones synthesised by zebra itself. This allows the noprefixroute flag to work correctly, while still playing nice with network management daemons that install a different link-scope route for installed addresses. The change in behaviour can be seen from the following example: Kernel config: 5: veth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether fe:da:bb:eb:74:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.11.1.2/24 scope global veth0 valid_lft forever preferred_lft forever inet 10.12.0.0/24 scope global noprefixroute veth0 valid_lft forever preferred_lft forever 10.11.0.0/16 via 10.11.1.1 dev veth0 10.11.1.0/24 dev veth0 proto kernel scope link src 10.11.1.2 10.12.0.0/24 dev veth0 proto kernel scope link metric 100 The 10.12.0.0/24 route was manually added with: Running zebra, pre-patch: Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:22 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:22 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:22 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:22 Notice that the 10.12.0.0/24 route is missing from the RIB. After the patch: Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:05 C * 10.11.1.0/24 is directly connected, veth0, 00:00:05 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:05 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:05 C>* 10.12.0.0/24 [0/100] is directly connected, veth0, 00:00:05 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:05 The prefix is now shown as connected (C>) as it should. Note also that the other prefix (10.11.1.0/24, without the noprefix flag) now appears twice, because it's both created by zebra from the interface config, and imported from the kernel. This is harmless as the routes are identical, and an arbitrary one just ends up being selected. Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Indeed this has been discussed for a while now; the current behavior essentially creates a potential for desync for no good reason.
I'm 85% sure there are situations where the kernel doesn't send a |
Kernel does not Always send a delroute when a route is removed. As such we need to handle that case as well. |
Right, played around with |
After writing this up, I noticed that there is already a similar PR open as #16300 - however, I don't quite grok why the additional logic in that PR is needed (instead of just relying on the kernel to send DELROUTEs when an interface goes down). So I'm opening this as a draft PR just to solicit feedback on whether this approach might be viable as an alternative with a bit less code. If not, feel free to just close this and proceed with #16300 instead :)
Cc @ffmancera
Patch description
When importing routes from the kernel, the zebra daemon ignores any routes marked as 'proto kernel', such as the link-scoped routes that the kernel generates for addresses assigned to interfaces. Instead, zebra implements its own logic to synthesise routes for each address assignment, installing them into the RIB with the ZEBRA_ROUTE_CONNECT proto set.
This behaviour requires zebra to mirror the logic of the kernel, to avoid having the kernel FIB diverge from the FRR RIB, which can cause routing loops or other failures. One example of this was the recent addition of support for the 'noprefixroute' flag to zebra[0].
However, attempting to mirror the kernel behaviour this way causes problems when the mirroring is imperfect. An example of this was seen as a result of the change mentioned above, where zebra honouring the noprefixroute flag leads to routes missing from the RIB in some cases. Specifically, this happens when network management daemons set the noprefixroute on the address assignment, but subsequently installs a link-scoped route into the kernel identical to the prefix route the kernel would have installed automatically. The use case for this is enable the network management daemon to atomically change route attributes (such as route metric) on the prefix route, but otherwise keep the behaviour identical to the case where the kernel creates the prefix route itself.
The failure described above was noticed for NetworkManager and reported as a NetworkManager bug[1] as well as an FRR issue[2]. Other network management daemons use the noprefixroute flag for similar purposes (e.g., systemd-networkd[3]).
[0] #14957
[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1452
[2] #16101
[3] https://github.com/systemd/systemd/blob/main/src/network/networkd-dhcp4.c#L962
To resolve this discrepancy between the kernel FIB and the FRR RIB, this patch changes zebra's behaviour to import 'proto kernel' instead of ignoring them, and to treat routes with 'scope link' as ZEBRA_ROUTE_CONNECT routes, just like the ones synthesised by zebra itself. This allows the noprefixroute flag to work correctly, while still playing nice with network management daemons that install a different link-scope route for installed addresses. The change in behaviour can be seen from the following example:
Kernel config:
5: veth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether fe:da:bb:eb:74:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.11.1.2/24 scope global veth0
valid_lft forever preferred_lft forever
inet 10.12.0.0/24 scope global noprefixroute veth0
valid_lft forever preferred_lft forever
10.11.0.0/16 via 10.11.1.1 dev veth0
10.11.1.0/24 dev veth0 proto kernel scope link src 10.11.1.2 10.12.0.0/24 dev veth0 proto kernel scope link metric 100
The 10.12.0.0/24 route was manually added with:
Running zebra, pre-patch:
Codes: K - kernel route, C - connected, L - local, S - static,
R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric, t - Table-Direct,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:22 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:22 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:22 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:22
Notice that the 10.12.0.0/24 route is missing from the RIB.
After the patch:
Codes: K - kernel route, C - connected, L - local, S - static,
R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric, t - Table-Direct,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:05 C * 10.11.1.0/24 is directly connected, veth0, 00:00:05 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:05 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:05 C>* 10.12.0.0/24 [0/100] is directly connected, veth0, 00:00:05 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:05
The prefix is now shown as connected (C>) as it should. Note also that the other prefix (10.11.1.0/24, without the noprefix flag) now appears twice, because it's both created by zebra from the interface config, and imported from the kernel. This is harmless as the routes are identical, and an arbitrary one just ends up being selected.