Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FRR 7 on Debian Stretch not loading route-map "set src <address>" entries at first startup #4249

Closed
thehonker opened this issue May 3, 2019 · 13 comments
Assignees
Labels
triage Needs further investigation

Comments

@thehonker
Copy link

[x] Did you check if this is a duplicate issue?
[ ] Did you test it on the latest FRRouting/frr master branch?

To Reproduce

  1. Install a minimal Debian Stretch (I'm reproducing this on a omvf qemu/kvm vm, but I'm also seeing this issue on a Dell PE R900... old lab is old)
  2. Install FRR from the deb https://deb.frrouting.org/frr stretch frr-7 repo and systemctl enable frr
  3. Create a route-map with a set src <some local address> directive
  4. Restart the system
  5. Check FRR running-config, the set src directive was not loaded
  6. systemctl restart frr and check running-config again, the set src directive will be loaded this time

Expected behavior
The set src directive to be loaded at boot

Versions

  • OS/Kernel: Debian Stretch Linux frr7-test 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1 (2019-04-12) x86_64 GNU/Linux
  • FRR Version 7.0 installed from the deb.frrouting.org repos (7.0-1~deb9u1)
frr7-test# sh version 
FRRouting 7.0 (frr7-test).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--enable-exampledir=/usr/share/doc/frr/examples/' '--localstatedir=/var/run/frr' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-systemd=yes' '--enable-rpki' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu'

Additional context
We're using the set src directive of a route-map to set the source address of ospf-learned routes to the loopback address.

When frr is first started at boot, it's not loading this set src route-map entry that is defined in /etc/frr/frr.conf
If I systemctl restart frr after logging in, then these entries are loaded and applied correctly.
If I systemctl disable frr, and then manually systemctl start frr after boot is complete and I log in, it works properly.
Other route-map entries (such as match ip address 1 to tie the route-map to an access-list) seem to work properly.

route-map created, system rebooted:

First let's look at the FRR configuration file:

root@frr7-test:~# cat /etc/frr/frr.conf
frr version 7.0
frr defaults traditional
hostname frr7-test
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
interface ens18
 ip ospf area 10.66.20.120
!
interface lo
 ip ospf area 10.66.66.10
!
router ospf
 ospf router-id 10.66.66.10
 area 10.66.20.120 stub
!
access-list 1 permit any
!
route-map set_src_addr permit 1
 set src 10.66.66.10  <<-- the route-map entry in question
!
route-map map2 permit 1  <<-- this route-map entry works
 match ip address 1
!
ip protocol ospf route-map set_src_addr
!
line vty
!

And now let's check running-config as loaded at boot:

root@frr7-test:~# vtysh

Hello, this is FRRouting (version 7.0).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

frr7-test# sh ru
Building configuration...

Current configuration:
!
frr version 7.0
frr defaults traditional
hostname frr7-test
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
interface ens18
 ip ospf area 10.66.20.120
!
interface lo
 ip ospf area 10.66.66.10
!
router ospf
 ospf router-id 10.66.66.10
 area 10.66.20.120 stub
!
access-list 1 permit any
!
route-map set_src_addr permit 1  <<-- our set src entry is missing here
!
route-map map2 permit 1
 match ip address 1
!
ip protocol ospf route-map set_src_addr
!
line vty
!
end

Now let's restart frr and look at the running config again:

root@frr7-test:~# systemctl restart frr
root@frr7-test:~# vtysh

Hello, this is FRRouting (version 7.0).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

frr7-test# sh ru
Building configuration...

Current configuration:
!
frr version 7.0
frr defaults traditional
hostname frr7-test
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
interface ens18
 ip ospf area 10.66.20.120
!
interface lo
 ip ospf area 10.66.66.10
!
router ospf
 ospf router-id 10.66.66.10
 area 10.66.20.120 stub
!
access-list 1 permit any
!
route-map set_src_addr permit 1
 set src 10.66.66.10  <<-- the route-map entry in question, now properly loaded and applied
!
route-map map2 permit 1
 match ip address 1
!
ip protocol ospf route-map set_src_addr
!
line vty
!
end

Further reboots of the system show the same behavior - the set src entry isn't loaded when FRR is started at boot, but it comes back as expected if FRR is restarted.

And here's /e/n/i, just for more context (using ifupdown2):

root@frr7-test:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback
  address 10.66.66.10/32

allow-hotplug ens18
iface ens18 inet static
  address 10.66.20.125/29
# vmbr310 on host is 10.66.20.126/29

Let me know if I can provide any other logs or do any other testing.
As a workaround for now, I've created a systemd timer that restarts frr one minute after multi-user.target is reached.

@thehonker thehonker added the triage Needs further investigation label May 3, 2019
@thehonker
Copy link
Author

thehonker commented May 6, 2019

Okay, so this is weird.... I systemctl disable apt-daily.timer && systemctl disable apt-daily-upgrade.timer and now it loads that entry on boot just fine.

Maybe there's a race condition where these two timers are keeping lo from coming up, and FRR doesn't load that line of the config because that address doesn't exist on the system yet?? But that doesn't make any sense, apt-daily-* shouldn't be affecting networking... I'd think.

EDIT - it only works on the vm I setup to repro this problem. Back on the lab system this doesn't work.

@qlyoung
Copy link
Member

qlyoung commented May 6, 2019

I agree that apt timers should not affect this at all. You say you can reliably recreate it; does systemctl reload frr succeed?

@thehonker
Copy link
Author

thehonker commented May 6, 2019

Both systemctl reload frr and /usr/lib/frr/frr-reload work, yes.
While testing that I noticed that it loads set src on boot maybe 4/10 times (this is on the vm), but on the R900 it never loads on boot. The vm (with one virtio nic) boots a lot faster than the R900 (4x bnx2 nics), so I'm thinking maybe ifupdown2 hasn't finished doing its thing when frr starts up?

Although that's still a WAG, and I'm not seeing any errors in the logs below (in fact, it looks like frr finds the /32 on lo before the /29 on ens18)

I added these lines to /etc/frr/frr.conf:

log file /var/log/frr/debug.log
debug zebra events
debug zebra kernel

And added these launch options to zebra in /etc/frr/daemons:
--log file:/var/log/frr/zebra_debug.log --log-level debug

And in /etc/default/networking I set both VERBOSE="yes" and DEBUG="yes"

But I didn't see anything too interesting in any of the output.
These logs are from a boot of the vm where the set src entry wasn't loaded:
/var/log/frr/debug.log
/var/log/frr/zebra_debug.log
ifupdown2 debug log

I'm going to try it on the R900 with the same debugging options, and I'll keep rebooting that vm until I get a boot where it does load, and then compare those logs.

EDIT - looking closer at debug.log, I see this:

2019/05/06 17:19:24 ZEBRA: Event driven route-map update triggered
2019/05/06 17:19:24 ZEBRA: Event handler for route-map: set_src_addr

Not sure what to make of it. Looks like frr sees the config entry and passes it off somewhere else to be applied? I don't know much about how the guts of frr work.

EDIT 2 - I see in zebra/zebra_routemap.c#L626 that the not a local address error message is only sent to vty in interactive mode.
Is there a way I can have this logged to a file, either by enabling another debug option or by changing/adding to that line and recompiling?

@thehonker
Copy link
Author

Update - If I switch from ifupdown2 to the classic ifupdown, it works just fine every time.
To comply with classic ifupdown's syntax, I made this change:

root@frr7-test:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

auto lo:1
iface lo:1 inet static
  address 10.66.66.10/32

Here's some logs:
Working at boot, frr debug
Working at boot, ifupdown verbose

I then reinstalled ifupdown2 and rebooted with that same /e/n/i syntax for lo, and no toast.

Still thinking it's a race condition of some sort, I started looking closer at the systemd unit files for ifupdown, ifupdown2, and both frr version 6.0.2 and 7.0.

Classic ifupdown and ifupdown2 have some pretty big differences in their dependency trees:

### Classic ifupdown:
[Unit]
<snip>
DefaultDependencies=no
Wants=network.target
After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service systemd-modules-load.service
Before=network.target shutdown.target network-online.target
Conflicts=shutdown.target
[Install]
WantedBy=multi-user.target
WantedBy=network-online.target

### ifupdown2:
### no deps listed in [Unit] section
[Install]
WantedBy=basic.target network.target multi-user.target

And between frr 6.0.2 and 7.0 we had this change:

### 6.0.2:
[Unit]
After=networking.service
OnFailure=heartbeat-failed@%n.service

### 7.0
[Unit]
Wants=network.target
After=network-pre.target systemd-sysctl.service
Before=network.target
OnFailure=heartbeat-failed@%n.service

If I replace the 7.0 unit file with the 6.0.2 unit file, then it starts working just fine with ifupdown2, every time.

Working boot with ifupdown2 and 6.0.2 unit file - frr debug log
Working boot with ifupdown2 and 6.0.2 unit file - ifupdown2 debug log

@pguibert6WIND
Copy link
Member

'set src' command just checks for local address available on the system.
if not, the command is purely not taken into acount.

some improvement needs to be done on that command.

@thehonker
Copy link
Author

When we went into production with FRR-7 we ended up doing this:

# make frr.service dep on networking.service
mkdir -p /etc/systemd/system/frr.service.d/
cat >/etc/systemd/system/frr.service.d/override.conf <<'EOF'
[Unit]
After=networking.service
EOF
systemctl daemon-reload

This works fine in our environment but I'm sure there's use cases out there where you'd want frr to come up before ifupdown/ifupdown2, so this isn't anything more than a bandaid.

@kooky
Copy link
Contributor

kooky commented May 11, 2020

I have a very similar problem. with classic ifupdown. For IPv4 and IPv6 as well.

set src does not come back after a reboot.

@thehonker
Copy link
Author

@kooky is this with 7.1 or 7.3? Kernel 4.x or 5.x?
We've run into a (probably) unrelated problem where frr 7.3 + kernel 5.x will not apply set src even when the address exists before frr starts.
I've been meaning to open a new issue for it but haven't had time to reproduce the problem to verify.

@kooky
Copy link
Contributor

kooky commented May 11, 2020

@epers I'm on frr 7.2 with kernel 4.19.0-9 (Debian buster)

I'll test the making systemd bring up the network interfaces come up before frr.
And if I get a chance, will test with frr only bringing up the internal interfaces (not in interfaces file at all)

(Can't upgrade to 7.3 because of a ospf6 problem.)

kooky added a commit to kooky/frr that referenced this issue May 12, 2020
Include an IPv6 example for set src

And a note that the IP address has to exist.   This is to try and make
people aware to avoid things like issue FRRouting#4249
FRRouting#4249

Signed-off-by: Tim Bray <[email protected]>
@kooky
Copy link
Contributor

kooky commented May 12, 2020

@epers

@kooky is this with 7.1 or 7.3? Kernel 4.x or 5.x?
We've run into a (probably) unrelated problem where frr 7.3 + kernel 5.x will not apply set src even when the address exists before frr starts.

This problem is fixed is the latest git and also in 7.3.1.

after a some testing:
Frr 7.3 with linux kernel 5.4.11 (or newer) is broken for set src. (I tested with linux 5.6.12, 5.4.11, 5.2.13, 4.19)

on slack, @sworleys identified that the fix is in
e85c67d

@sworleys
Copy link
Member

sworleys commented May 12, 2020 via email

kooky added a commit to kooky/frr that referenced this issue May 12, 2020
Include an IPv6 example for set src

And a note that the IP address has to exist.   This is to try and make
people aware to avoid things like issue FRRouting#4249
FRRouting#4249

Signed-off-by: Tim Bray <[email protected]>
@ton31337
Copy link
Member

@polychaeta autoclose in 1 day.

@thehonker
Copy link
Author

This would be broken on any kernel with nexthop objects functionality (>
5.3) and only FRR version 7.3
...
This problem is fixed is the latest git and also in 7.3.1.

Oh cool, thank you!

idryzhov added a commit to idryzhov/frr that referenced this issue Jul 29, 2021
1. This check is absolutely useless. Nothing keeps user from deleting
   the address right after this check.
2. This check prevents zebra from correctly reading the user config with
   "set src" because of a race with interface startup (see FRRouting#4249).
3. NO OPERATIONAL DATA USAGE ON VALIDATION STAGE.

Fixes FRRouting#7319.

Signed-off-by: Igor Ryzhov <[email protected]>
idryzhov added a commit to idryzhov/frr that referenced this issue Aug 2, 2021
1. This check is absolutely useless. Nothing keeps user from deleting
   the address right after this check.
2. This check prevents zebra from correctly reading the user config with
   "set src" because of a race with interface startup (see FRRouting#4249).
3. NO OPERATIONAL DATA USAGE ON VALIDATION STAGE.

Fixes FRRouting#7319.

Signed-off-by: Igor Ryzhov <[email protected]>
mergify bot pushed a commit that referenced this issue Aug 3, 2021
1. This check is absolutely useless. Nothing keeps user from deleting
   the address right after this check.
2. This check prevents zebra from correctly reading the user config with
   "set src" because of a race with interface startup (see #4249).
3. NO OPERATIONAL DATA USAGE ON VALIDATION STAGE.

Fixes #7319.

Signed-off-by: Igor Ryzhov <[email protected]>
(cherry picked from commit 1f74d96)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

8 participants