Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: Import the kernel's notion of connected routes #16418

Closed
wants to merge 1 commit into from

Conversation

tohojo
Copy link

@tohojo tohojo commented Jul 17, 2024

After writing this up, I noticed that there is already a similar PR open as #16300 - however, I don't quite grok why the additional logic in that PR is needed (instead of just relying on the kernel to send DELROUTEs when an interface goes down). So I'm opening this as a draft PR just to solicit feedback on whether this approach might be viable as an alternative with a bit less code. If not, feel free to just close this and proceed with #16300 instead :)

Cc @ffmancera

Patch description

When importing routes from the kernel, the zebra daemon ignores any routes marked as 'proto kernel', such as the link-scoped routes that the kernel generates for addresses assigned to interfaces. Instead, zebra implements its own logic to synthesise routes for each address assignment, installing them into the RIB with the ZEBRA_ROUTE_CONNECT proto set.

This behaviour requires zebra to mirror the logic of the kernel, to avoid having the kernel FIB diverge from the FRR RIB, which can cause routing loops or other failures. One example of this was the recent addition of support for the 'noprefixroute' flag to zebra[0].

However, attempting to mirror the kernel behaviour this way causes problems when the mirroring is imperfect. An example of this was seen as a result of the change mentioned above, where zebra honouring the noprefixroute flag leads to routes missing from the RIB in some cases. Specifically, this happens when network management daemons set the noprefixroute on the address assignment, but subsequently installs a link-scoped route into the kernel identical to the prefix route the kernel would have installed automatically. The use case for this is enable the network management daemon to atomically change route attributes (such as route metric) on the prefix route, but otherwise keep the behaviour identical to the case where the kernel creates the prefix route itself.

The failure described above was noticed for NetworkManager and reported as a NetworkManager bug[1] as well as an FRR issue[2]. Other network management daemons use the noprefixroute flag for similar purposes (e.g., systemd-networkd[3]).

[0] #14957
[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1452
[2] #16101
[3] https://github.com/systemd/systemd/blob/main/src/network/networkd-dhcp4.c#L962

To resolve this discrepancy between the kernel FIB and the FRR RIB, this patch changes zebra's behaviour to import 'proto kernel' instead of ignoring them, and to treat routes with 'scope link' as ZEBRA_ROUTE_CONNECT routes, just like the ones synthesised by zebra itself. This allows the noprefixroute flag to work correctly, while still playing nice with network management daemons that install a different link-scope route for installed addresses. The change in behaviour can be seen from the following example:

Kernel config:
5: veth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether fe:da:bb:eb:74:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.11.1.2/24 scope global veth0
valid_lft forever preferred_lft forever
inet 10.12.0.0/24 scope global noprefixroute veth0
valid_lft forever preferred_lft forever

10.11.0.0/16 via 10.11.1.1 dev veth0
10.11.1.0/24 dev veth0 proto kernel scope link src 10.11.1.2 10.12.0.0/24 dev veth0 proto kernel scope link metric 100

The 10.12.0.0/24 route was manually added with:

Running zebra, pre-patch:

Codes: K - kernel route, C - connected, L - local, S - static,
R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric, t - Table-Direct,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure

K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:22 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:22 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:22 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:22

Notice that the 10.12.0.0/24 route is missing from the RIB.

After the patch:

Codes: K - kernel route, C - connected, L - local, S - static,
R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric, t - Table-Direct,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure

K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:05 C * 10.11.1.0/24 is directly connected, veth0, 00:00:05 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:05 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:05 C>* 10.12.0.0/24 [0/100] is directly connected, veth0, 00:00:05 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:05

The prefix is now shown as connected (C>) as it should. Note also that the other prefix (10.11.1.0/24, without the noprefix flag) now appears twice, because it's both created by zebra from the interface config, and imported from the kernel. This is harmless as the routes are identical, and an arbitrary one just ends up being selected.

@frrbot frrbot bot added the zebra label Jul 17, 2024
When importing routes from the kernel, the zebra daemon ignores any routes
marked as 'proto kernel', such as the link-scoped routes that the kernel
generates for addresses assigned to interfaces. Instead, zebra implements its
own logic to synthesise routes for each address assignment, installing them into
the RIB with the ZEBRA_ROUTE_CONNECT proto set.

This behaviour requires zebra to mirror the logic of the kernel, to avoid having
the kernel FIB diverge from the FRR RIB, which can cause routing loops or other
failures. One example of this was the recent addition of support for the
'noprefixroute' flag to zebra[0].

However, attempting to mirror the kernel behaviour this way causes problems when
the mirroring is imperfect. An example of this was seen as a result of the
change mentioned above, where zebra honouring the noprefixroute flag leads to
routes missing from the RIB in some cases. Specifically, this happens when
network management daemons set the noprefixroute on the address assignment, but
subsequently installs a link-scoped route into the kernel identical to the
prefix route the kernel would have installed automatically. The use case for
this is enable the network management daemon to atomically change route
attributes (such as route metric) on the prefix route, but otherwise keep the
behaviour identical to the case where the kernel creates the prefix route
itself.

The failure described above was noticed for NetworkManager and reported as a
NetworkManager bug[1] as well as an FRR issue[2]. Other network management
daemons use the noprefixroute flag for similar purposes (e.g.,
systemd-networkd[3]).

[0] FRRouting#14957
[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1452
[2] FRRouting#16101
[3] https://github.com/systemd/systemd/blob/main/src/network/networkd-dhcp4.c#L962

To resolve this discrepancy between the kernel FIB and the FRR RIB, this patch
changes zebra's behaviour to import 'proto kernel' instead of ignoring them, and
to treat routes with 'scope link' as ZEBRA_ROUTE_CONNECT routes, just like the
ones synthesised by zebra itself. This allows the noprefixroute flag to work
correctly, while still playing nice with network management daemons that install
a different link-scope route for installed addresses. The change in behaviour
can be seen from the following example:

Kernel config:
5: veth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen
1000
    link/ether fe:da:bb:eb:74:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.11.1.2/24 scope global veth0
       valid_lft forever preferred_lft forever
    inet 10.12.0.0/24 scope global noprefixroute veth0
       valid_lft forever preferred_lft forever

10.11.0.0/16 via 10.11.1.1 dev veth0
10.11.1.0/24 dev veth0 proto kernel scope link src 10.11.1.2
10.12.0.0/24 dev veth0 proto kernel scope link metric 100

The 10.12.0.0/24 route was manually added with:

Running zebra, pre-patch:

Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:22
C>* 10.11.1.0/24 is directly connected, veth0, 00:00:22
L>* 10.11.1.2/32 is directly connected, veth0, 00:00:22
L>* 10.12.0.0/32 is directly connected, veth0, 00:00:22

Notice that the 10.12.0.0/24 route is missing from the RIB.

After the patch:

Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:05
C * 10.11.1.0/24 is directly connected, veth0, 00:00:05
C>* 10.11.1.0/24 is directly connected, veth0, 00:00:05
L>* 10.11.1.2/32 is directly connected, veth0, 00:00:05
C>* 10.12.0.0/24 [0/100] is directly connected, veth0, 00:00:05
L>* 10.12.0.0/32 is directly connected, veth0, 00:00:05

The prefix is now shown as connected (C>) as it should. Note also that the other
prefix (10.11.1.0/24, without the noprefix flag) now appears twice, because it's
both created by zebra from the interface config, and imported from the kernel.
This is harmless as the routes are identical, and an arbitrary one just ends up
being selected.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
@tohojo tohojo force-pushed the zebra-scope-link branch from da672b6 to a3a2902 Compare July 17, 2024 22:04
@eqvinox
Copy link
Contributor

eqvinox commented Jul 22, 2024

noticed that there is already a similar PR open

Indeed this has been discussed for a while now; the current behavior essentially creates a potential for desync for no good reason.

however, I don't quite grok why the additional logic in that PR is needed (instead of just relying on the kernel to send DELROUTEs when an interface goes down)

I'm 85% sure there are situations where the kernel doesn't send a DELROUTE. It might've been address and/or interface deletion rather than change to down. Unfortunately there's no good specification for this 😞

@donaldsharp
Copy link
Member

Kernel does not Always send a delroute when a route is removed. As such we need to handle that case as well.

@donaldsharp donaldsharp self-requested a review July 23, 2024 15:17
@riw777 riw777 self-requested a review July 23, 2024 15:18
@tohojo
Copy link
Author

tohojo commented Jul 24, 2024

Kernel does not Always send a delroute when a route is removed. As such we need to handle that case as well.

Right, played around with ip monitor a bit and I see your point. I'll close this PR then, and go comment on #16300 instead :)

@tohojo tohojo closed this Jul 24, 2024
@tohojo tohojo mentioned this pull request Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants