Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZEBRA: kernel warnings logs when using PPP interface route #6089

Closed
EasyNetDev opened this issue Mar 25, 2020 · 19 comments
Closed

ZEBRA: kernel warnings logs when using PPP interface route #6089

EasyNetDev opened this issue Mar 25, 2020 · 19 comments
Assignees
Labels
triage Needs further investigation zebra

Comments

@EasyNetDev
Copy link
Contributor

When you have a PPPoE connexion and you want to add default route to the PPP interface, the kernel is giving me this kind of error:

Mar 25 12:16:35 R02 kernel: [ 8633.839816] ------------[ cut here ]------------
Mar 25 12:16:35 R02 kernel: [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
Mar 25 12:16:35 R02 kernel: [ 8633.839819] Modules linked in: mpls_iptunnel mpls_router ip_tunnel pppoe pppox ppp_generic slhc macvlan 8021q garp stp mrp llc vmw_vsock_vmci_transport vsock dummy nft_counter vmw_balloon nft_chain_nat joydev serio_raw pcspkr vmwgfx ttm xt_MASQUERADE sg drm_kms_helper evdev drm xt_nat nf_nat nf_conntrack vmw_vmci nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_policy nft_compat button ac nf_tables nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid sr_mod cdrom ata_generic sd_mod psmouse ahci libahci ata_piix uhci_hcd ehci_pci ehci_hcd libata vmw_pvscsi usbcore vmxnet3 scsi_mod usb_common i2c_piix4
Mar 25 12:16:35 R02 kernel: [ 8633.839843] CPU: 0 PID: 1719 Comm: dnsdist/healthC Tainted: G        W         5.4.0-4-amd64 #1 Debian 5.4.19-1
Mar 25 12:16:35 R02 kernel: [ 8633.839844] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
Mar 25 12:16:35 R02 kernel: [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
Mar 25 12:16:35 R02 kernel: [ 8633.839847] Code: 48 85 c0 75 41 48 8d 87 80 00 00 00 49 89 44 24 10 66 41 89 73 20 e9 6b fd ff ff 4d 3b 4c 24 18 75 9c 49 89 eb e9 e6 fe ff ff <0f> 0b e9 61 fe ff ff 80 7a 56 00 75 1f 48 8b 52 70 48 83 c2 20 eb
Mar 25 12:16:35 R02 kernel: [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
Mar 25 12:16:35 R02 kernel: [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
Mar 25 12:16:35 R02 kernel: [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
Mar 25 12:16:35 R02 kernel: [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
Mar 25 12:16:35 R02 kernel: [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
Mar 25 12:16:35 R02 kernel: [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
Mar 25 12:16:35 R02 kernel: [ 8633.839857] FS:  00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
Mar 25 12:16:35 R02 kernel: [ 8633.839858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 25 12:16:35 R02 kernel: [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
Mar 25 12:16:35 R02 kernel: [ 8633.839867] Call Trace:
Mar 25 12:16:35 R02 kernel: [ 8633.839871]  ip_route_output_key_hash_rcu+0x421/0x890
Mar 25 12:16:35 R02 kernel: [ 8633.839873]  ip_route_output_key_hash+0x5e/0x80
Mar 25 12:16:35 R02 kernel: [ 8633.839876]  ip_route_output_flow+0x1a/0x50
Mar 25 12:16:35 R02 kernel: [ 8633.839878]  __ip4_datagram_connect+0x154/0x310
Mar 25 12:16:35 R02 kernel: [ 8633.839880]  ip4_datagram_connect+0x28/0x40
Mar 25 12:16:35 R02 kernel: [ 8633.839882]  __sys_connect+0xd6/0x100
Mar 25 12:16:35 R02 kernel: [ 8633.839885]  ? syscall_trace_enter+0x131/0x2c0
Mar 25 12:16:35 R02 kernel: [ 8633.839887]  __x64_sys_connect+0x16/0x20
Mar 25 12:16:35 R02 kernel: [ 8633.839889]  do_syscall_64+0x52/0x160
Mar 25 12:16:35 R02 kernel: [ 8633.839891]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar 25 12:16:35 R02 kernel: [ 8633.839893] RIP: 0033:0x7fcad034507b
Mar 25 12:16:35 R02 kernel: [ 8633.839895] Code: 83 ec 18 89 54 24 0c 48 89 34 24 89 7c 24 08 e8 ab fa ff ff 8b 54 24 0c 48 8b 34 24 41 89 c0 8b 7c 24 08 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 89 44 24 08 e8 e1 fa ff ff 8b 44
Mar 25 12:16:35 R02 kernel: [ 8633.839896] RSP: 002b:00007fcac2e00e30 EFLAGS: 00000293 ORIG_RAX: 000000000000002a
Mar 25 12:16:35 R02 kernel: [ 8633.839897] RAX: ffffffffffffffda RBX: 000055ee7761d380 RCX: 00007fcad034507b
Mar 25 12:16:35 R02 kernel: [ 8633.839898] RDX: 0000000000000010 RSI: 000055ee77623960 RDI: 000000000000001a
Mar 25 12:16:35 R02 kernel: [ 8633.839899] RBP: 000055ee77623960 R08: 0000000000000000 R09: 00007fcacff816f0
Mar 25 12:16:35 R02 kernel: [ 8633.839900] R10: 00007fcab0000c00 R11: 0000000000000293 R12: 000000000000001a
Mar 25 12:16:35 R02 kernel: [ 8633.839901] R13: 0000000000000000 R14: 0000000000000001 R15: 00007fcac2e011b0
Mar 25 12:16:35 R02 kernel: [ 8633.839903] ---[ end trace d64b745aea08a0ea ]---

Is flooding the logs with the same error.

(put "x" in "[ ]" if you already tried following)
[X] Did you check if this is a duplicate issue?
[X] Did you test it on the latest FRRouting/frr master branch?

To Reproduce
Steps to reproduce the behavior:

  1. Start a PPPoE connexion
  2. Enter in vtysh
  3. Add a default static route: ip route 0.0.0.0/0 ppp0 or ip route 0.0.0.0/0 ppp0 20
  4. See error in syslog/dmesg

Expected behavior
No errors in logs

Versions

@EasyNetDev EasyNetDev added the triage Needs further investigation label Mar 25, 2020
@sworleys
Copy link
Member

This is in the nexthop group kernel code. I will take a look and discuss with the kernel maintainer as well.

@sworleys sworleys self-assigned this Mar 25, 2020
@qlyoung qlyoung added the zebra label Mar 25, 2020
@sworleys
Copy link
Member

I tried recreating this by setting up a pppoe connection across two veths in different namespaces but have had no luck. I even generated some traffic across the the pppoe connection via pings, nmap -PS and curl (for tcp traffic).

From ftrace in the kernel calls it seems to hit that function but doesnt error out a dmesg.

[root@alfred tracing]# ip ro
default nhid 8 dev ppp0 proto 196 metric 20 
4.4.4.2 dev ppp0 proto kernel scope link src 4.4.4.1 
[root@alfred tracing]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: veth1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:00:00:01 brd ff:ff:ff:ff:ff:ff link-netns ns2
    inet6 fe80::5054:ff:fe00:1/64 scope link 
       valid_lft forever preferred_lft forever
4: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 3
    link/ppp 
    inet 4.4.4.1 peer 4.4.4.2/32 scope global ppp0
       valid_lft forever preferred_lft forever
[root@alfred tracing]# 
[root@alfred linux]# ip ro
4.4.4.1 dev ppp0 proto kernel scope link src 4.4.4.2 
[root@alfred linux]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 2.2.2.1/32 scope global lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: veth2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:00:00:02 brd ff:ff:ff:ff:ff:ff link-netns ns1
    inet6 fe80::5054:ff:fe00:2/64 scope link 
       valid_lft forever preferred_lft forever
3: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 3
    link/ppp 
    inet 4.4.4.2 peer 4.4.4.1/32 scope global ppp0
       valid_lft forever preferred_lft forever
[root@alfred linux]# 

output from:

echo 1 > tracing_on; ping -c 1 2.2.2.1; echo 0 > tracing_on; less trace
           <...>-3185927 [003] .... 1234610.733231: ip_route_output_key_hash <-__ip4_datagram_connect
           <...>-3185927 [003] .... 1234610.733231: ip_route_output_key_hash_rcu <-ip_route_output_key_hash
           <...>-3185927 [003] .... 1234610.733231: fib_table_lookup <-ip_route_output_key_hash_rcu
           <...>-3185927 [003] .... 1234610.733234: fib_select_path <-ip_route_output_key_hash_rcu
           <...>-3185927 [003] .... 1234610.733234: fib_result_prefsrc <-fib_select_path
           <...>-3185927 [003] .... 1234610.733235: find_exception <-ip_route_output_key_hash_rcu
           <...>-3185927 [003] .... 1234610.733235: dst_release <-__ip4_datagram_connect
           <...>-3185927 [003] .... 1234610.733235: security_sk_classify_flow <-__ip4_datagram_connect
           <...>-3185927 [003] .... 1234610.733235: selinux_sk_getsecid <-security_sk_classify_flow
           <...>-3185927 [003] .... 1234610.733236: ip_route_output_flow <-__ip4_datagram_connect
           <...>-3185927 [003] .... 1234610.733236: ip_route_output_key_hash <-ip_route_output_flow
           <...>-3185927 [003] .... 1234610.733236: ip_route_output_key_hash_rcu <-ip_route_output_key_hash
           <...>-3185927 [003] .... 1234610.733236: __ip_dev_find <-ip_route_output_key_hash_rcu
           <...>-3185927 [003] .... 1234610.733236: inet_lookup_ifaddr_rcu <-__ip_dev_find

But I am not getting an error message.

[root@alfred linux]# uname -a
Linux alfred 5.3.16-300.fc31.x86_64 #1 SMP Fri Dec 13 17:59:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@alfred linux]# 

@sworleys
Copy link
Member

sworleys commented Mar 26, 2020

@adrianban can you provide the exact configuration your for your pppoe connection?

Also,
what does your frr.conf look like?
show ip ro if possible as well.

@sworleys
Copy link
Member

ip link show probably too if possible.

@sworleys
Copy link
Member

If you can add exact repro steps for setting up that pppoe connection that would be helpful too.

@EasyNetDev
Copy link
Contributor Author

Hi sworleys,

This is the logs for PPPoE:

Mar 25 13:38:14 R01-ROM pppd[25297]: Plugin rp-pppoe.so loaded.
Mar 25 13:38:14 R01-ROM pppd[25298]: pppd 2.4.7 started by root, uid 0
Mar 25 13:38:14 R01-ROM pppd[25298]: Send PPPOE Discovery V1T1 PADI session 0x0 length 4
Mar 25 13:38:14 R01-ROM pppd[25298]:  dst ff:ff:ff:ff:ff:ff  src b4:96:91:01:1e:7c
Mar 25 13:38:14 R01-ROM pppd[25298]:  [service-name]
Mar 25 13:38:14 R01-ROM pppd[25298]: Recv PPPOE Discovery V1T1 PADO session 0x0 length 43
Mar 25 13:38:14 R01-ROM pppd[25298]:  dst b4:96:91:01:1e:7c  src 3c:fd:fe:9c:c9:7d
Mar 25 13:38:14 R01-ROM pppd[25298]:  [AC-name pppoe-3] [service-name] [AC-cookie  ca 68 29 50 b6 2e f2 4e 7d 97 2a 7c b3 ab ae cf 18 46 35 c8 b9 9f 32 08]
Mar 25 13:38:14 R01-ROM pppd[25298]: Send PPPOE Discovery V1T1 PADR session 0x0 length 32
Mar 25 13:38:14 R01-ROM pppd[25298]:  dst 3c:fd:fe:9c:c9:7d  src b4:96:91:01:1e:7c
Mar 25 13:38:14 R01-ROM pppd[25298]:  [service-name] [AC-cookie  ca 68 29 50 b6 2e f2 4e 7d 97 2a 7c b3 ab ae cf 18 46 35 c8 b9 9f 32 08]
Mar 25 13:38:14 R01-ROM pppd[25298]: Recv PPPOE Discovery V1T1 PADS session 0x4542 length 15
Mar 25 13:38:14 R01-ROM pppd[25298]:  dst b4:96:91:01:1e:7c  src 3c:fd:fe:9c:c9:7d
Mar 25 13:38:14 R01-ROM pppd[25298]:  [AC-name pppoe-3] [service-name]
Mar 25 13:38:14 R01-ROM pppd[25298]: PADS: Service-Name: ''
Mar 25 13:38:14 R01-ROM pppd[25298]: PPP session is 17730
Mar 25 13:38:14 R01-ROM pppd[25298]: Connected to 3c:fd:fe:9c:c9:7d via interface gi0-0.5
Mar 25 13:38:14 R01-ROM pppd[25298]: using channel 17
Mar 25 13:38:14 R01-ROM pppd[25298]: Renamed interface ppp0 to tfiber0
Mar 25 13:38:14 R01-ROM pppd[25298]: Using interface tfiber0
Mar 25 13:38:14 R01-ROM pppd[25298]: Connect: tfiber0 <--> gi0-0.5
Mar 25 13:38:14 R01-ROM pppd[25298]: sent [LCP ConfReq id=0x1 <mru 1492> <magic 0xcc1bc09e>]
Mar 25 13:38:14 R01-ROM pppd[25298]: rcvd [LCP ConfReq id=0x6f <auth pap> <mru 1492> <magic 0x34a0a46b>]
Mar 25 13:38:14 R01-ROM pppd[25298]: sent [LCP ConfAck id=0x6f <auth pap> <mru 1492> <magic 0x34a0a46b>]
Mar 25 13:38:14 R01-ROM pppd[25298]: rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0xcc1bc09e>]
Mar 25 13:38:14 R01-ROM pppd[25298]: sent [LCP EchoReq id=0x0 magic=0xcc1bc09e]
Mar 25 13:38:14 R01-ROM pppd[25298]: sent [PAP AuthReq id=0x1 user="USER" password="PASS"]
Mar 25 13:38:14 R01-ROM pppd[25298]: rcvd [LCP EchoRep id=0x0 magic=0x34a0a46b]
Mar 25 13:38:14 R01-ROM pppd[25298]: rcvd [PAP AuthAck id=0x1 "Authentication succeeded"]
Mar 25 13:38:14 R01-ROM pppd[25298]: Remote message: Authentication succeeded
Mar 25 13:38:14 R01-ROM pppd[25298]: PAP authentication succeeded
Mar 25 13:38:14 R01-ROM pppd[25298]: peer from calling number 3C:FD:FE:9C:C9:7D authorized
Mar 25 13:38:14 R01-ROM pppd[25298]: sent [IPCP ConfReq id=0x1 <addr 0.0.0.0>]
Mar 25 13:38:14 R01-ROM pppd[25298]: rcvd [IPCP ConfReq id=0x46 <addr 10.0.0.3>]
Mar 25 13:38:14 R01-ROM pppd[25298]: sent [IPCP ConfAck id=0x46 <addr 10.0.0.3>]
Mar 25 13:38:14 R01-ROM pppd[25298]: rcvd [IPCP ConfNak id=0x1 <addr 93.A.B.255>]
Mar 25 13:38:14 R01-ROM pppd[25298]: sent [IPCP ConfReq id=0x2 <addr 93.A.B.255>]
Mar 25 13:38:14 R01-ROM pppd[25298]: rcvd [IPCP ConfAck id=0x2 <addr 93.A.B.255>]
Mar 25 13:38:14 R01-ROM pppd[25298]: replacing old default route to gi0-1 [10.50.1.194]
Mar 25 13:38:14 R01-ROM pppd[25298]: del old default route ioctl(SIOCDELRT): No such process(3)
Mar 25 13:38:14 R01-ROM pppd[25298]: local  IP address 93.A.B.255
Mar 25 13:38:14 R01-ROM pppd[25298]: remote IP address 10.0.0.3
Mar 25 13:38:14 R01-ROM pppd[25298]: Script /etc/ppp/ip-up started (pid 25309)
Mar 25 13:38:14 R01-ROM kernel: [69547.586726] CPU: 1 PID: 25298 Comm: pppd Tainted: G        W         5.4.0-4-amd64 #1 Debian 5.4.19-1
Mar 25 13:38:14 R01-ROM pppd[25298]: Script /etc/ppp/ip-up finished (pid 25309), status = 0x0

PPP configuration:

# Configuration file for PPP, using PPP over Ethernet 
# to connect to a DSL provider.
#
# See the manual page pppd(8) for information on all the options.

##
# Section 1
#
# Stuff to configure...

# MUST CHANGE: Uncomment the following line, replacing the [email protected]
# by the DSL user name given to your by your DSL provider.
# (There should be a matching entry in /etc/ppp/pap-secrets with the password.)
#user [email protected]

# Use the pppoe program to send the ppp packets over the Ethernet link
# This line should work fine if this computer is the only one accessing
# the Internet through this DSL connection. This is the right line to use
# for most people.
#pty "/usr/sbin/pppoe -I eth0 -T 80 -m 1452"

# An even more conservative version of the previous line, if things
# don't work using -m 1452... 
#pty "/usr/sbin/pppoe -I eth0 -T 80 -m 1412"

# If the computer connected to the Internet using pppoe is not being used
# by other computers as a gateway to the Internet, you can try the following
# line instead, for a small gain in speed:
#pty "/usr/sbin/pppoe -I eth0 -T 80"


# The following two options should work fine for most DSL users.

# Assumes that your IP address is allocated dynamically
# by your DSL provider...
noipdefault
# Try to get the name server addresses from the ISP.
# Use this connection as the default route.
# Comment out if you already have the correct default route installed.
defaultroute
replacedefaultroute

##
# Section 2
#
# Uncomment if your DSL provider charges by minute connected
# and you want to use demand-dialing. 
#
# Disconnect after 300 seconds (5 minutes) of idle time.

#demand
#idle 300

##
# Section 3
#
# You shouldn't need to change these options...

#hide-password
show-password
lcp-echo-interval 20
lcp-echo-failure 3
# Override any connect script that may have been set in /etc/ppp/options.
connect /bin/true
noauth
persist
maxfail 0
mtu 1492

# RFC 2516, paragraph 7 mandates that the following options MUST NOT be
# requested and MUST be rejected if requested by the peer:
# Address-and-Control-Field-Compression (ACFC)
noaccomp
# Asynchronous-Control-Character-Map (ACCM)
default-asyncmap

plugin rp-pppoe.so
nic-gi0-0.5
user "USER"
debug
kdebug 1

ifname tfiber0

The FRR config it is an empty config with only one command. I've created an empty FRR config and I've took line by line and when I've added the the static route, this happen. 2 routers, one a real PC and second is an ESXi 6.5 VMware PC.

frr version 7.4-dev-20200324-17-g9d7bc42a4
frr defaults traditional
hostname R01-ROM
log syslog informational
log timestamp precision 3
log file /var/log/frr/frr.log
service integrated-vtysh-config

ip route 0.0.0.0/0 tfiber0

Maybe the problem is caused by the renaming iface in PPP? Instead to have the name ppp0 is tfiber0?

@EasyNetDev
Copy link
Contributor Author

Found another situation:

FRR config:

Building configuration...

Current configuration:
!
frr version 7.4-dev-20200324-17-g9d7bc42a4
frr defaults traditional
hostname R02
log file /var/log/frr/frr.log
service integrated-vtysh-config
!
interface gi0-0
 bandwidth 1000
 description R02-SW02
!
interface gi0-0.2001
 ip ospf cost 50
!
interface gi0-1
 bandwidth 1000
 description R02-CHECKPOINT
 ip address 10.50.1.197/30
 ip ospf cost 25
!
interface gi0-1.2002
 ip ospf cost 50
!
interface gi0-2
 bandwidth 1000
!
interface gi0-3
 bandwidth 1000
 description R01-R02 interco
 ip ospf bfd
 ip ospf network point-to-point
!
interface lo0
 ip address 172.30.1.4/32
!
interface lo1
 description Backup IP
!
interface tfiber0
 description ISP;TFIBER
!
interface tun1
 description ABTELECOM;BACKUP VIA ORANGE;
 ip address 10.31.0.142/30
!
interface vrrp4-gi0-0.10
 description ATHENA LL; VRRP IPv4;
!
router-id 172.30.1.4
!
router ospf
 ospf router-id 172.30.1.4
 redistribute connected
 network 10.50.1.0/30 area 0
!
ip prefix-list pl-DEFAULT-ROUTE seq 5 permit 0.0.0.0/0
ip prefix-list pl-DEFAUTL-ROUTE seq 5 permit 0.0.0.0/0
ip prefix-list pl-ROMCOLOR seq 5 permit 89.A.B.192/28
ip prefix-list pl-SAP-in description SAP subnet
ip prefix-list pl-SAP-in seq 10 permit 10.1.64.128/28
ip prefix-list pl-SAP-in seq 5 permit 10.1.65.0/24
ip prefix-list pl-SAP-out description ROMCOLOR internal subnet
ip prefix-list pl-SAP-out seq 10 permit 192.168.16.0/23
ip prefix-list pl-SAP-out seq 5 permit 192.168.10.0/24
!
route-map rm-BGP-default-route deny 10000
!
route-map rm-BGP-default-route permit 1000
 match ip address prefix-list pl-DEFAULT-ROUTE
!
route-map rm-BGP-deny-all deny 10000
!
route-map rm-OSPF-default deny 10000
!
route-map rm-OSPF-default permit 1000
 match ip address prefix-list pl-DEFAULT-ROUTE
!
route-map rm-OSPF-to-BGP deny 10000
!
route-map rm-OSPF-to-BGP permit 100
 match ip address prefix-list pl-ROMCOLOR
!
route-map rm-OSPF-to-BGP-routes deny 10000
!
route-map rm-OSPF-to-BGP-routes permit 100
 match ip address prefix-list pl-SAP-out
!
route-map rm-OSPF-to-BGP-routes permit 200
 match ip address prefix-list pl-ROMCOLOR
!
route-map rm-R01-in deny 40000
!
route-map rm-R01-in permit 10000
 match ip address prefix-list pl-DEFAULT-ROUTE
!
route-map rm-R01-out deny 40000
!
route-map rm-R01-out permit 1000
 match ip address prefix-list pl-SAP-in
!
route-map rm-R01-out permit 10000
 match ip address prefix-list pl-DEFAULT-ROUTE
!
route-map rm-ROMCOLOR-in deny 10000
!
route-map rm-ROMCOLOR-in permit 100
 match ip address prefix-list pl-DEFAULT-ROUTE
!
route-map rm-ROMCOLOR-out deny 10000
!
route-map rm-ROMCOLOR-out permit 100
 match ip address prefix-list pl-ROMCOLOR
 set as-path prepend 65509
!
route-map rm-SAP-main-in deny 10000
!
route-map rm-SAP-main-in permit 100
 match ip address prefix-list pl-SAP-in
 set local-preference 150
!
route-map rm-SAP-main-out deny 10000
!
route-map rm-SAP-main-out permit 100
 match ip address prefix-list pl-SAP-out
!
route-map rm-SAP-sec-in deny 10000
!
route-map rm-SAP-sec-in permit 100
 match ip address prefix-list pl-SAP-in
 set local-preference 90
!
route-map rm-SAP-sec-out deny 10000
!
route-map rm-SAP-sec-out permit 100
 match ip address prefix-list pl-SAP-out
 set local-preference 90
 set metric 50
!
line vty
!
bfd
!
end

When I'm adding this command:

 network 10.50.1.0/30 area 0

Under OSPF, the warnings starting to appear. This issue is on the ESXi PC.

System interfaces:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: gi0-3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:9f:9f:45 brd ff:ff:ff:ff:ff:ff
    inet 10.50.1.2/30 brd 10.50.1.3 scope global gi0-3
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe9f:9f45/64 scope link 
       valid_lft forever preferred_lft forever
3: gi0-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:9f:9f:27 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::20c:29ff:fe9f:9f27/64 scope link 
       valid_lft forever preferred_lft forever
4: gi0-1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:9f:9f:31 brd ff:ff:ff:ff:ff:ff
    inet 10.50.1.197/30 brd 10.50.1.199 scope global gi0-1
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe9f:9f31/64 scope link 
       valid_lft forever preferred_lft forever
5: gi0-2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:0c:29:9f:9f:3b brd ff:ff:ff:ff:ff:ff
6: lo0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 65535 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether aa:4d:98:30:6c:e7 brd ff:ff:ff:ff:ff:ff
    inet 172.30.1.4/32 brd 172.30.1.4 scope global lo0
       valid_lft forever preferred_lft forever
    inet6 fe80::a84d:98ff:fe30:6ce7/64 scope link 
       valid_lft forever preferred_lft forever
7: lo1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 65535 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 6a:04:56:70:06:2a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::6804:56ff:fe70:62a/64 scope link 
       valid_lft forever preferred_lft forever
8: gi0-0.2002@gi0-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:0c:29:9f:9f:27 brd ff:ff:ff:ff:ff:ff
    inet 10.50.1.69/30 brd 10.50.1.71 scope global gi0-0.2002
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe9f:9f27/64 scope link 
       valid_lft forever preferred_lft forever
9: gi0-0.10@gi0-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:0c:29:9f:9f:27 brd ff:ff:ff:ff:ff:ff
    inet 172.25.10.45/27 brd 172.25.10.63 scope global gi0-0.10
       valid_lft forever preferred_lft forever
    inet 172.25.9.45/27 brd 172.25.9.63 scope global gi0-0.10
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe9f:9f27/64 scope link 
       valid_lft forever preferred_lft forever
10: [email protected]: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
    link/ether 00:00:5e:00:01:01 brd ff:ff:ff:ff:ff:ff protodown on 
11: [email protected]: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
    link/ether 00:00:5e:00:02:01 brd ff:ff:ff:ff:ff:ff protodown on 
12: gi0-0.5@gi0-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:0c:29:9f:9f:27 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::20c:29ff:fe9f:9f27/64 scope link 
       valid_lft forever preferred_lft forever

and the logs:

Mar 27 13:21:41 R02 kernel: [185329.362969] ------------[ cut here ]------------
Mar 27 13:21:41 R02 kernel: [185329.362971] WARNING: CPU: 1 PID: 8832 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
Mar 27 13:21:41 R02 kernel: [185329.362971] Modules linked in: mpls_iptunnel mpls_router ip_tunnel pppoe pppox ppp_generic slhc macvlan 8021q garp stp mrp llc vmw_vsock_vmci_transport vsock dummy nft_counter vmw_balloon nft_chain_nat joydev serio_raw pcspkr vmwgfx ttm xt_MASQUERADE sg drm_kms_helper evdev drm xt_nat nf_nat nf_conntrack vmw_vmci nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_policy nft_compat button ac nf_tables nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid sr_mod cdrom ata_generic sd_mod psmouse ahci libahci ata_piix uhci_hcd ehci_pci ehci_hcd libata vmw_pvscsi usbcore vmxnet3 scsi_mod usb_common i2c_piix4
Mar 27 13:21:41 R02 kernel: [185329.362988] CPU: 1 PID: 8832 Comm: dnsdist/healthC Tainted: G        W         5.4.0-4-amd64 #1 Debian 5.4.19-1
Mar 27 13:21:41 R02 kernel: [185329.362988] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
Mar 27 13:21:41 R02 kernel: [185329.362990] RIP: 0010:fib_select_path+0x303/0x381
Mar 27 13:21:41 R02 kernel: [185329.362991] Code: 48 85 c0 75 41 48 8d 87 80 00 00 00 49 89 44 24 10 66 41 89 73 20 e9 6b fd ff ff 4d 3b 4c 24 18 75 9c 49 89 eb e9 e6 fe ff ff <0f> 0b e9 61 fe ff ff 80 7a 56 00 75 1f 48 8b 52 70 48 83 c2 20 eb
Mar 27 13:21:41 R02 kernel: [185329.362992] RSP: 0018:ffffb04d40fcfd00 EFLAGS: 00010282
Mar 27 13:21:41 R02 kernel: [185329.362993] RAX: 0000000000000000 RBX: ffff9460bbd1e568 RCX: 00000000000000fe
Mar 27 13:21:41 R02 kernel: [185329.362994] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
Mar 27 13:21:41 R02 kernel: [185329.362994] RBP: ffff9460b79537a8 R08: 0000000059263af4 R09: ffff9460b6e71d80
Mar 27 13:21:41 R02 kernel: [185329.362995] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d40fcfdc0
Mar 27 13:21:41 R02 kernel: [185329.362996] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
Mar 27 13:21:41 R02 kernel: [185329.362998] FS:  00007f9815ffb700(0000) GS:ffff9460bdd00000(0000) knlGS:0000000000000000
Mar 27 13:21:41 R02 kernel: [185329.362999] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 27 13:21:41 R02 kernel: [185329.363000] CR2: 00007f772006b218 CR3: 0000000076472000 CR4: 00000000000006e0
Mar 27 13:21:41 R02 kernel: [185329.363005] Call Trace:
Mar 27 13:21:41 R02 kernel: [185329.363008]  ip_route_output_key_hash_rcu+0x421/0x890
Mar 27 13:21:41 R02 kernel: [185329.363009]  ip_route_output_key_hash+0x5e/0x80
Mar 27 13:21:41 R02 kernel: [185329.363011]  ip_route_output_flow+0x1a/0x50
Mar 27 13:21:41 R02 kernel: [185329.363013]  __ip4_datagram_connect+0x154/0x310
Mar 27 13:21:41 R02 kernel: [185329.363014]  ip4_datagram_connect+0x28/0x40
Mar 27 13:21:41 R02 kernel: [185329.363016]  __sys_connect+0xd6/0x100
Mar 27 13:21:41 R02 kernel: [185329.363017]  ? syscall_trace_enter+0x131/0x2c0
Mar 27 13:21:41 R02 kernel: [185329.363019]  __x64_sys_connect+0x16/0x20
Mar 27 13:21:41 R02 kernel: [185329.363020]  do_syscall_64+0x52/0x160
Mar 27 13:21:41 R02 kernel: [185329.363022]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar 27 13:21:41 R02 kernel: [185329.363023] RIP: 0033:0x7f983613d07b
Mar 27 13:21:41 R02 kernel: [185329.363024] Code: 83 ec 18 89 54 24 0c 48 89 34 24 89 7c 24 08 e8 ab fa ff ff 8b 54 24 0c 48 8b 34 24 41 89 c0 8b 7c 24 08 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 89 44 24 08 e8 e1 fa ff ff 8b 44
Mar 27 13:21:41 R02 kernel: [185329.363025] RSP: 002b:00007f9815ff9e30 EFLAGS: 00000293 ORIG_RAX: 000000000000002a
Mar 27 13:21:41 R02 kernel: [185329.363026] RAX: ffffffffffffffda RBX: 0000562bc500f480 RCX: 00007f983613d07b
Mar 27 13:21:41 R02 kernel: [185329.363027] RDX: 0000000000000010 RSI: 0000562bc502f590 RDI: 0000000000000024
Mar 27 13:21:41 R02 kernel: [185329.363028] RBP: 0000562bc502f590 R08: 0000000000000000 R09: 00007f9835d796f0
Mar 27 13:21:41 R02 kernel: [185329.363028] R10: 00007f97e8021900 R11: 0000000000000293 R12: 0000000000000024
Mar 27 13:21:41 R02 kernel: [185329.363029] R13: 0000000000000000 R14: 0000000000000001 R15: 00007f9815ffa1b0
Mar 27 13:21:41 R02 kernel: [185329.363030] ---[ end trace d64b745aea08a9f5 ]---

@sworleys
Copy link
Member

So I'm sadly still not able to reproduce it but I am little confused about your configs.

# Assumes that your IP address is allocated dynamically
# by your DSL provider...
noipdefault
# Try to get the name server addresses from the ISP.
# Use this connection as the default route.
# Comment out if you already have the correct default route installed.
defaultroute
replacedefaultroute

This option means that the PPPoE connection will setup a default route itself pointing out the pppoe device? At least on my system thats what its doing when I add that config.

So I don't think adding that static route would cause this issue since FRR would just ignore it in favor of the kernel default route installed by pppd and therefore never interact with the kernel.

eva(config)# ip route 0.0.0.0/0 ppp0
eva(config)# end
eva# show ip ro
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued route, r - rejected route

S   0.0.0.0/0 [1/0] is directly connected, ppp0, 00:00:04
K>* 0.0.0.0/0 [0/0] is directly connected, ppp0, 00:00:15   <---- Already installed by pppd
C>* 4.4.4.2/32 is directly connected, ppp0, 00:00:15
eva# 

So... unless our order of operations is wrong, something else is triggering that. Not adding the static route.

@EasyNetDev
Copy link
Contributor Author

I'm notice that even with OSPF is doing the same issue when I'm activating the interface neighbor:

R02(config-router)# do sh interface description
Interface       Status  Protocol  Description
gi0-0           up      up        R02-SW02
gi0-0.5         up      up
gi0-0.10        up      up        ATHENA LL
gi0-0.2001      down    down
gi0-0.2002      up      up        R02-SW02
gi0-1           up      up        R02-CHECKPOINT
gi0-1.2002      down    down
gi0-2           down    down
gi0-3           up      up        R01-R02 interco
lo              up      up
lo0             up      up
lo1             up      up        Backup IP - EASYNET

R02(config-router)# do sh interface gi0-0.10
Interface gi0-0.10 is up, line protocol is up
  Link ups:       1    last: 2020/04/05 15:49:21.78
  Link downs:     1    last: 2020/04/05 15:48:10.08
  vrf: default
  Description: ATHENA LL
  index 17 metric 0 mtu 1500 speed 10000
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: 00:0c:29:9f:9f:27
  inet 172.25.9.45/27
  inet 172.25.10.45/27
  inet6 fe80::20c:29ff:fe9f:9f27/64
  Interface Type Vlan
  VLAN Id 10
  Parent interface: gi0-0
R02(config-router)# do sh interface gi0-0.2002
Interface gi0-0.2002 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  vrf: default
  Description: R02-SW02
  index 8 metric 0 mtu 1500 speed 10000
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: 00:0c:29:9f:9f:27
  inet 10.50.1.69/30
  inet6 fe80::20c:29ff:fe9f:9f27/64
  Interface Type Vlan
  VLAN Id 2002
  Parent interface: gi0-0
R02(config-router)# do sh interface gi0-1
Interface gi0-1 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  vrf: default
  Description: R02-CHECKPOINT
  index 4 metric 0 mtu 1500 speed 10000
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: 00:0c:29:9f:9f:31
  bandwidth 1000 Mbps
  inet 10.50.1.197/30
  inet6 fe80::20c:29ff:fe9f:9f31/64
  Interface Type Other
R02(config-router)# do sh interface gi0-3
Interface gi0-3 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  vrf: default
  Description: R01-R02 interco
  index 2 metric 0 mtu 1500 speed 10000
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: 00:0c:29:9f:9f:45
  bandwidth 1000 Mbps
  inet 10.50.1.2/30
  inet6 fe80::20c:29ff:fe9f:9f45/64
  Interface Type Other
R02(config-router)# do sh interface lo
Interface lo is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  vrf: default
  index 1 metric 0 mtu 65536 speed 0
  flags: <UP,LOOPBACK,RUNNING>
  Type: Loopback
  Interface Type Other
R02(config-router)# do sh interface lo0
Interface lo0 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  vrf: default
  index 6 metric 0 mtu 65535 speed 0
  flags: <UP,BROADCAST,RUNNING,NOARP>
  Type: Ethernet
  HWaddr: aa:4d:98:30:6c:e7
  inet 172.30.1.4/32 unnumbered
  inet6 fe80::a84d:98ff:fe30:6ce7/64
  Interface Type Other
R02(config-router)# do sh interface lo1
Interface lo1 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  vrf: default
  Description: Backup IP - EASYNET
  index 7 metric 0 mtu 65535 speed 0
  flags: <UP,BROADCAST,RUNNING,NOARP>
  Type: Ethernet
  HWaddr: 6a:04:56:70:06:2a
  inet6 fe80::6804:56ff:fe70:62a/64
  Interface Type Other
R02(config-router)# do sh run
Building configuration...

Current configuration:
!
frr version 7.4-dev-20200324-17-g9d7bc42a4
frr defaults traditional
hostname R02
log file /var/log/frr/frr.log
service integrated-vtysh-config
!
interface gi0-0
 bandwidth 1000
 description R02-SW02
!
interface gi0-0.10
 description ATHENA LL
 ip address 172.25.10.45/27
 ip address 172.25.9.45/27
!
interface gi0-0.2001
 ip ospf cost 50
!
interface gi0-0.2002
 description R02-SW02
 ip ospf cost 50
 ip ospf dead-interval minimal hello-multiplier 10
 ip ospf network point-to-point
!
interface gi0-1
 bandwidth 1000
 description R02-CHECKPOINT
 ip address 10.50.1.197/30
 ip ospf cost 25
!
interface gi0-1.2002
 ip ospf cost 50
!
interface gi0-2
 bandwidth 1000
!
interface gi0-3
 bandwidth 1000
 description R01-R02 interco
 ip ospf bfd
 ip ospf network point-to-point
!
interface lo0
 ip address 172.30.1.4/32
!
interface lo1
 description Backup IP - EASYNET
!
router-id 172.30.1.4
!
router ospf
 ospf router-id 172.30.1.4
 redistribute connected
!
ip prefix-list pl-DEFAULT-ROUTE seq 5 permit 0.0.0.0/0
ip prefix-list pl-DEFAUTL-ROUTE seq 5 permit 0.0.0.0/0
ip prefix-list pl-CLINET seq 5 permit 89.X.Y.192/28
ip prefix-list pl-SAP-in description SAP subnet
ip prefix-list pl-SAP-in seq 10 permit 10.1.64.128/28
ip prefix-list pl-SAP-in seq 5 permit 10.1.65.0/24
ip prefix-list pl-SAP-out description ROMCOLOR internal subnet
ip prefix-list pl-SAP-out seq 10 permit 192.168.16.0/23
ip prefix-list pl-SAP-out seq 5 permit 192.168.10.0/24
!
route-map rm-BGP-default-route deny 10000
!
route-map rm-BGP-default-route permit 1000
 match ip address prefix-list pl-DEFAULT-ROUTE
!
route-map rm-BGP-deny-all deny 10000
!
route-map rm-OSPF-default deny 10000
!
route-map rm-OSPF-default permit 1000
 match ip address prefix-list pl-DEFAULT-ROUTE
!
route-map rm-OSPF-to-BGP deny 10000
!
route-map rm-OSPF-to-BGP permit 100
 match ip address prefix-list pl-ROMCOLOR 
!
route-map rm-OSPF-to-BGP-routes deny 10000
!
route-map rm-OSPF-to-BGP-routes permit 100
 match ip address prefix-list pl-SAP-out
!
route-map rm-OSPF-to-BGP-routes permit 200
 match ip address prefix-list pl-ROMCOLOR
!
route-map rm-R01-in deny 40000
!
route-map rm-R01-in permit 10000
 match ip address prefix-list pl-DEFAULT-ROUTE
!
route-map rm-R01-out deny 40000
!
route-map rm-R01-out permit 1000
 match ip address prefix-list pl-SAP-in
!
route-map rm-R01-out permit 10000
 match ip address prefix-list pl-DEFAULT-ROUTE
!
route-map rm-ROMCOLOR-in deny 10000
!
route-map rm-ROMCOLOR-in permit 100
 match ip address prefix-list pl-DEFAULT-ROUTE
!
route-map rm-ROMCOLOR-out deny 10000
!
route-map rm-ROMCOLOR-out permit 100
 match ip address prefix-list pl-ROMCOLOR
 set as-path prepend 65509
!
route-map rm-SAP-main-in deny 10000
!
route-map rm-SAP-main-in permit 100
 match ip address prefix-list pl-SAP-in
 set local-preference 150
!
route-map rm-SAP-main-out deny 10000
!
route-map rm-SAP-main-out permit 100
 match ip address prefix-list pl-SAP-out
!
route-map rm-SAP-sec-in deny 10000
!
route-map rm-SAP-sec-in permit 100
 match ip address prefix-list pl-SAP-in
 set local-preference 90
!
route-map rm-SAP-sec-out deny 10000
!
route-map rm-SAP-sec-out permit 100
 match ip address prefix-list pl-SAP-out
 set local-preference 90
 set metric 50
!
line vty
!
bfd
!
end

Till here everything is ok, but no OSPF neighbors or static IP/BGP:

R02(config-router)# do sh ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued route, r - rejected route

C>* 10.50.1.0/30 is directly connected, gi0-3, 00:11:14
C>* 10.50.1.68/30 is directly connected, gi0-0.2002, 00:11:14
C>* 10.50.1.196/30 is directly connected, gi0-1, 00:11:14
C>* 172.25.9.32/27 is directly connected, gi0-0.10, 00:06:40
C>* 172.25.10.32/27 is directly connected, gi0-0.10, 00:06:40
C>* 172.30.1.4/32 is directly connected, lo0, 00:11:14

R02(config-router)# do sh ip ospf neighbor

Neighbor ID     Pri State           Dead Time Address         Interface                        RXmtL RqstL DBsmL

R02(config-router)# do sh ip bgp summary
% BGP instance not found

Then I'm starting the OSPF on interface gi0-3:

R02(config-router)# network 10.50.1.0/30 area 0
R02(config-router)# do sh ip ospf neighbor

Neighbor ID     Pri State           Dead Time Address         Interface                        RXmtL RqstL DBsmL
172.30.1.1        1 Full/DROther      36.870s 10.50.1.1       gi0-3:10.50.1.2                      2     0     0
R02(config-router)# do sh ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued route, r - rejected route

O>* 0.0.0.0/0 [110/10] via 10.50.1.1, gi0-3, 00:01:17
O>* 10.0.0.3/32 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O   10.50.1.0/30 [110/100] is directly connected, gi0-3, 00:01:29
C>* 10.50.1.0/30 is directly connected, gi0-3, 00:13:57
O>* 10.50.1.64/30 [110/150] via 10.50.1.1, gi0-3, 00:01:18
O   10.50.1.68/30 [110/176] via 10.50.1.1, gi0-3, 00:01:18
C>* 10.50.1.68/30 is directly connected, gi0-0.2002, 00:13:57
O>* 10.50.1.128/30 [110/127] via 10.50.1.1, gi0-3, 00:01:18
O>* 10.50.1.132/30 [110/127] via 10.50.1.1, gi0-3, 00:01:18
O>* 10.50.1.136/30 [110/127] via 10.50.1.1, gi0-3, 00:01:18
O>* 10.50.1.140/30 [110/128] via 10.50.1.1, gi0-3, 00:01:18
O>* 10.50.1.192/30 [110/125] via 10.50.1.1, gi0-3, 00:01:18
O   10.50.1.196/30 [110/126] via 10.50.1.1, gi0-3, 00:01:18
C>* 10.50.1.196/30 is directly connected, gi0-1, 00:13:57
O>* 10.50.1.200/30 [110/126] via 10.50.1.1, gi0-3, 00:01:18
O>* 10.50.1.204/30 [110/126] via 10.50.1.1, gi0-3, 00:01:18
O>* 10.50.50.4/30 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O>* 10.50.50.128/27 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O   172.25.9.32/27 [110/20] via 10.50.1.1, gi0-3, 00:01:17
C>* 172.25.9.32/27 is directly connected, gi0-0.10, 00:09:23
C>* 172.25.10.32/27 is directly connected, gi0-0.10, 00:09:23
O>* 172.30.1.1/32 [110/110] via 10.50.1.1, gi0-3, 00:01:18
C>* 172.30.1.4/32 is directly connected, lo0, 00:13:57
O>* 172.30.1.9/32 [110/127] via 10.50.1.1, gi0-3, 00:01:18
O>* 172.30.1.10/32 [110/127] via 10.50.1.1, gi0-3, 00:01:18
O>* 172.30.1.11/32 [110/128] via 10.50.1.1, gi0-3, 00:01:18
O>* 172.30.1.12/32 [110/129] via 10.50.1.1, gi0-3, 00:01:18
O>* 172.30.1.15/32 [110/125] via 10.50.1.1, gi0-3, 00:01:18
O>* 172.30.1.128/27 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O>* 172.30.2.0/24 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O>* 172.30.2.1/32 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O>* 192.168.1.0/24 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O>* 192.168.9.0/24 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O>* 192.168.10.0/24 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O>* 192.168.10.252/32 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O>* 192.168.16.0/24 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O>* 192.168.17.0/25 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O>* 192.168.17.128/25 [110/20] via 10.50.1.1, gi0-3, 00:01:17
O>* 192.168.60.0/24 [110/20] via 10.50.1.1, gi0-3, 00:01:17

Warnings appears in the system:

Apr  5 15:57:53 R02 kernel: [ 1703.698794] WARNING: CPU: 1 PID: 1915 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
Apr  5 15:57:53 R02 kernel: [ 1703.698794] Modules linked in: xfrm_user xfrm_algo twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common ctr des_generic libdes algif_skcipher camellia_generic crypto_simd cryptd camellia_x86_64 glue_helper xcbc md4 algif_hash af_alg pppoe pppox ppp_generic slhc macvlan 8021q garp stp mrp llc vmw_vsock_vmci_transport vsock dummy nft_counter vmwgfx nft_chain_nat ttm vmw_balloon joydev serio_raw pcspkr drm_kms_helper xt_MASQUERADE evdev sg xt_nat drm nf_nat vmw_vmci nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_policy nft_compat nf_tables button ac nfnetlink mpls_iptunnel mpls_router ip_tunnel ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid sr_mod cdrom ata_generic sd_mod ahci libahci ata_piix uhci_hcd ehci_pci psmouse libata ehci_hcd usbcore vmw_pvscsi vmxnet3 scsi_mod usb_common i2c_piix4
Apr  5 15:57:53 R02 kernel: [ 1703.698826] CPU: 1 PID: 1915 Comm: dnsdist/healthC Tainted: G        W         5.4.0-4-amd64 #1 Debian 5.4.19-1
Apr  5 15:57:53 R02 kernel: [ 1703.698827] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
Apr  5 15:57:53 R02 kernel: [ 1703.698829] RIP: 0010:fib_select_path+0x303/0x381
Apr  5 15:57:53 R02 kernel: [ 1703.698830] Code: 48 85 c0 75 41 48 8d 87 80 00 00 00 49 89 44 24 10 66 41 89 73 20 e9 6b fd ff ff 4d 3b 4c 24 18 75 9c 49 89 eb e9 e6 fe ff ff <0f> 0b e9 61 fe ff ff 80 7a 56 00 75 1f 48 8b 52 70 48 83 c2 20 eb
Apr  5 15:57:53 R02 kernel: [ 1703.698831] RSP: 0018:ffffba1ec0763d00 EFLAGS: 00010286
Apr  5 15:57:53 R02 kernel: [ 1703.698833] RAX: 0000000000000000 RBX: ffff96877a510368 RCX: 00000000000000fe
Apr  5 15:57:53 R02 kernel: [ 1703.698834] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
Apr  5 15:57:53 R02 kernel: [ 1703.698835] RBP: ffff96877845fa10 R08: 0000000000000000 R09: ffff968778ef1580
Apr  5 15:57:53 R02 kernel: [ 1703.698836] R10: 0000000000000014 R11: 0000000000000000 R12: ffffba1ec0763dc0
Apr  5 15:57:53 R02 kernel: [ 1703.698837] R13: ffffffff8b4e3240 R14: 0000000000000000 R15: ffff968735227ea0
Apr  5 15:57:53 R02 kernel: [ 1703.698840] FS:  00007f213dffb700(0000) GS:ffff96877dd00000(0000) knlGS:0000000000000000
Apr  5 15:57:53 R02 kernel: [ 1703.698841] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  5 15:57:53 R02 kernel: [ 1703.698842] CR2: 00007f0f5fed5390 CR3: 000000007a408000 CR4: 00000000000006e0
Apr  5 15:57:53 R02 kernel: [ 1703.698850] Call Trace:
Apr  5 15:57:53 R02 kernel: [ 1703.698853]  ? __fib_lookup+0x6b/0xb0
Apr  5 15:57:53 R02 kernel: [ 1703.698856]  ip_route_output_key_hash_rcu+0x421/0x890
Apr  5 15:57:53 R02 kernel: [ 1703.698858]  ip_route_output_key_hash+0x5e/0x80
Apr  5 15:57:53 R02 kernel: [ 1703.698860]  ip_route_output_flow+0x1a/0x50
Apr  5 15:57:53 R02 kernel: [ 1703.698863]  __ip4_datagram_connect+0x154/0x310
Apr  5 15:57:53 R02 kernel: [ 1703.698865]  ip4_datagram_connect+0x28/0x40
Apr  5 15:57:53 R02 kernel: [ 1703.698867]  __sys_connect+0xd6/0x100
Apr  5 15:57:53 R02 kernel: [ 1703.698869]  ? syscall_trace_enter+0x131/0x2c0
Apr  5 15:57:53 R02 kernel: [ 1703.698872]  __x64_sys_connect+0x16/0x20
Apr  5 15:57:53 R02 kernel: [ 1703.698873]  do_syscall_64+0x52/0x160
Apr  5 15:57:53 R02 kernel: [ 1703.698876]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr  5 15:57:53 R02 kernel: [ 1703.698877] RIP: 0033:0x7f21533c507b
Apr  5 15:57:53 R02 kernel: [ 1703.698879] Code: 83 ec 18 89 54 24 0c 48 89 34 24 89 7c 24 08 e8 ab fa ff ff 8b 54 24 0c 48 8b 34 24 41 89 c0 8b 7c 24 08 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 89 44 24 08 e8 e1 fa ff ff 8b 44
Apr  5 15:57:53 R02 kernel: [ 1703.698880] RSP: 002b:00007f213dff9df0 EFLAGS: 00000293 ORIG_RAX: 000000000000002a
Apr  5 15:57:53 R02 kernel: [ 1703.698882] RAX: ffffffffffffffda RBX: 0000564d48c25660 RCX: 00007f21533c507b
Apr  5 15:57:53 R02 kernel: [ 1703.698883] RDX: 0000000000000010 RSI: 0000564d48c45680 RDI: 0000000000000022
Apr  5 15:57:53 R02 kernel: [ 1703.698884] RBP: 0000564d48c45680 R08: 0000000000000000 R09: 00007f2153001760
Apr  5 15:57:53 R02 kernel: [ 1703.698885] R10: 00007f212c000c00 R11: 0000000000000293 R12: 0000000000000022
Apr  5 15:57:53 R02 kernel: [ 1703.698886] R13: 0000000000000000 R14: 00007f213dffa020 R15: 0000000000000001
Apr  5 15:57:53 R02 kernel: [ 1703.698888] ---[ end trace ea68b4377d80ff52 ]---

How can I debug this issue? :)
Seems that is not related only to PPP connection.

@EasyNetDev
Copy link
Contributor Author

I've did a test only with Zebra:

cat /etc/frr/daemons
# This file tells the frr package which daemons to start.
#
# Sample configurations for these daemons can be found in
# /usr/share/doc/frr/examples/.
#
# ATTENTION:
#
# When activating a daemon for the first time, a config file, even if it is
# empty, has to be present *and* be owned by the user and group "frr", else
# the daemon will not be started by /etc/init.d/frr. The permissions should
# be u=rw,g=r,o=.
# When using "vtysh" such a config file is also needed. It should be owned by
# group "frrvty" and set to ug=rw,o= though. Check /etc/pam.d/frr, too.
#
# The watchfrr and zebra daemons are always started.
#
bgpd=no
ospfd=no
ospf6d=no
ripd=no
ripngd=no
isisd=no
pimd=no
ldpd=no
nhrpd=no
eigrpd=no
babeld=no
sharpd=no
pbrd=no
bfdd=no
fabricd=no
vrrpd=no

#
# If this option is set the /etc/init.d/frr script automatically loads
# the config via "vtysh -b" when the servers are started.
# Check /etc/pam.d/frr if you intend to use "vtysh"!
#
vtysh_enable=yes
zebra_options="  -A 127.0.0.1 -s 90000000"
bgpd_options="   -A 127.0.0.1"
ospfd_options="  -A 127.0.0.1"
ospf6d_options=" -A ::1"
ripd_options="   -A 127.0.0.1"
ripngd_options=" -A ::1"
isisd_options="  -A 127.0.0.1"
pimd_options="   -A 127.0.0.1"
ldpd_options="   -A 127.0.0.1"
nhrpd_options="  -A 127.0.0.1"
eigrpd_options=" -A 127.0.0.1"
babeld_options=" -A 127.0.0.1"
sharpd_options=" -A 127.0.0.1"
pbrd_options="   -A 127.0.0.1"
staticd_options="-A 127.0.0.1"
bfdd_options="   -A 127.0.0.1"
fabricd_options="-A 127.0.0.1"
vrrpd_options="  -A 127.0.0.1"

# configuration profile
#
#frr_profile="traditional"
#frr_profile="datacenter"

#
# This is the maximum number of FD's that will be available.
# Upon startup this is read by the control files and ulimit
# is called.  Uncomment and use a reasonable value for your
# setup if you are expecting a large number of peers in
# say BGP.
MAX_FDS=1024

# The list of daemons to watch is automatically generated by the init script.
#watchfrr_options=""

# for debugging purposes, you can specify a "wrap" command to start instead
# of starting the daemon directly, e.g. to use valgrind on ospfd:
#   ospfd_wrap="/usr/bin/valgrind"
# or you can use "all_wrap" for all daemons, e.g. to use perf record:
#   all_wrap="/usr/bin/perf record --call-graph -"
# the normal daemon command is added to this at the end.

R02# show daemons
 zebra watchfrr staticd

Same issue.
Another test: shutdown the PPPoE interface and add a simple static route:

R02(config)# ip route 0.0.0.0/0 gi0-3

Same issue.

@sworleys
Copy link
Member

spoke to the maintainer for the nexthop group kernel code and he was able to find the problem.

https://lore.kernel.org/netdev/[email protected]/T/#u

Closing this issue.

fengguang pushed a commit to 0day-ci/linux that referenced this issue Apr 23, 2020
A user reported [0] hitting the WARN_ON in fib_info_nh:

    [ 8633.839816] ------------[ cut here ]------------
    [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
    ...
    [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
    ...
    [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
    [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
    [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
    [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
    [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
    [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
    [ 8633.839857] FS:  00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
    [ 8633.839858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
    [ 8633.839867] Call Trace:
    [ 8633.839871]  ip_route_output_key_hash_rcu+0x421/0x890
    [ 8633.839873]  ip_route_output_key_hash+0x5e/0x80
    [ 8633.839876]  ip_route_output_flow+0x1a/0x50
    [ 8633.839878]  __ip4_datagram_connect+0x154/0x310
    [ 8633.839880]  ip4_datagram_connect+0x28/0x40
    [ 8633.839882]  __sys_connect+0xd6/0x100
    ...

The WARN_ON is triggered in fib_select_default which is invoked when
there are multiple default routes. Update the function to use
fib_info_nhc and convert the nexthop checks to use fib_nh_common.

Add test case that covers the affected code path.

[0] FRRouting/frr#6089

Fixes: 493ced1 ("ipv4: Allow routes to use nexthop objects")
Signed-off-by: David Ahern <[email protected]>
hohoxu pushed a commit to hohoxu/n5kernel that referenced this issue Apr 26, 2020
A user reported [0] hitting the WARN_ON in fib_info_nh:

    [ 8633.839816] ------------[ cut here ]------------
    [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
    ...
    [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
    ...
    [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
    [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
    [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
    [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
    [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
    [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
    [ 8633.839857] FS:  00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
    [ 8633.839858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
    [ 8633.839867] Call Trace:
    [ 8633.839871]  ip_route_output_key_hash_rcu+0x421/0x890
    [ 8633.839873]  ip_route_output_key_hash+0x5e/0x80
    [ 8633.839876]  ip_route_output_flow+0x1a/0x50
    [ 8633.839878]  __ip4_datagram_connect+0x154/0x310
    [ 8633.839880]  ip4_datagram_connect+0x28/0x40
    [ 8633.839882]  __sys_connect+0xd6/0x100
    ...

The WARN_ON is triggered in fib_select_default which is invoked when
there are multiple default routes. Update the function to use
fib_info_nhc and convert the nexthop checks to use fib_nh_common.

Add test case that covers the affected code path.

[0] FRRouting/frr#6089

Fixes: 493ced1 ("ipv4: Allow routes to use nexthop objects")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Whissi pushed a commit to Whissi/linux-stable that referenced this issue Apr 29, 2020
[ Upstream commit 7c74b0b ]

A user reported [0] hitting the WARN_ON in fib_info_nh:

    [ 8633.839816] ------------[ cut here ]------------
    [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
    ...
    [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
    ...
    [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
    [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
    [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
    [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
    [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
    [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
    [ 8633.839857] FS:  00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
    [ 8633.839858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
    [ 8633.839867] Call Trace:
    [ 8633.839871]  ip_route_output_key_hash_rcu+0x421/0x890
    [ 8633.839873]  ip_route_output_key_hash+0x5e/0x80
    [ 8633.839876]  ip_route_output_flow+0x1a/0x50
    [ 8633.839878]  __ip4_datagram_connect+0x154/0x310
    [ 8633.839880]  ip4_datagram_connect+0x28/0x40
    [ 8633.839882]  __sys_connect+0xd6/0x100
    ...

The WARN_ON is triggered in fib_select_default which is invoked when
there are multiple default routes. Update the function to use
fib_info_nhc and convert the nexthop checks to use fib_nh_common.

Add test case that covers the affected code path.

[0] FRRouting/frr#6089

Fixes: 493ced1 ("ipv4: Allow routes to use nexthop objects")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Whissi pushed a commit to Whissi/linux-stable that referenced this issue Apr 29, 2020
[ Upstream commit 7c74b0b ]

A user reported [0] hitting the WARN_ON in fib_info_nh:

    [ 8633.839816] ------------[ cut here ]------------
    [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
    ...
    [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
    ...
    [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
    [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
    [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
    [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
    [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
    [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
    [ 8633.839857] FS:  00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
    [ 8633.839858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
    [ 8633.839867] Call Trace:
    [ 8633.839871]  ip_route_output_key_hash_rcu+0x421/0x890
    [ 8633.839873]  ip_route_output_key_hash+0x5e/0x80
    [ 8633.839876]  ip_route_output_flow+0x1a/0x50
    [ 8633.839878]  __ip4_datagram_connect+0x154/0x310
    [ 8633.839880]  ip4_datagram_connect+0x28/0x40
    [ 8633.839882]  __sys_connect+0xd6/0x100
    ...

The WARN_ON is triggered in fib_select_default which is invoked when
there are multiple default routes. Update the function to use
fib_info_nhc and convert the nexthop checks to use fib_nh_common.

Add test case that covers the affected code path.

[0] FRRouting/frr#6089

Fixes: 493ced1 ("ipv4: Allow routes to use nexthop objects")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
@k0ste
Copy link
Contributor

k0ste commented May 8, 2020

FRR 7.3.1
When systemctl restart systemd-networkd

May 08 15:55:52 novosiboffice.opentech.local ospfd[498]: LSA[Type5:0.0.0.0]: Not originate AS-external-LSA for default
May 08 15:55:52 novosiboffice.opentech.local ospfd[493]: LSA[Type5:0.0.0.0]: Not originate AS-external-LSA for default
May 08 15:55:52 novosiboffice.opentech.local ospfd[493]: LSA[Type5:0.0.0.0]: Not originate AS-external-LSA for default
May 08 15:55:52 novosiboffice.opentech.local ospfd[498]: LSA[Type5:0.0.0.0]: Not originate AS-external-LSA for default

And kernel:

[Fri May  8 15:55:52 2020] ------------[ cut here ]------------
[Fri May  8 15:55:52 2020] WARNING: CPU: 0 PID: 1405575 at include/net/nexthop.h:251 fib_nh_match+0x210/0x400
[Fri May  8 15:55:52 2020] Modules linked in: sch_sfq pppoe pppox ppp_generic slhc tun 8021q garp mrp stp llc bonding xt_nat iptable_nat ipt_NETFLOW(OE) xt_state xt_addrtype xt_conntrack xt_comment ipt_REJECT xt_multiport iptable_filter xt_TCPMSS xt_tcpudp xt_mark xt_set iptable_mangle ip_set_hash_net ip_set_bitmap_ipmac ip_set ext4 crc16 mbcache jbd2 amd64_edac_mod edac_mce_amd kvm_amd ccp rng_core kvm irqbypass hid_generic crct10dif_pclmul crc32_pclmul gpu_sched ghash_clmulni_intel i2c_algo_bit ttm drm_kms_helper aesni_intel cec crypto_simd tg3 usbhid cryptd glue_helper hid rc_core pcspkr syscopyarea sp5100_tco fam15h_power sysfillrect k10temp sysimgblt i2c_piix4 fb_sys_fops libphy evdev 8250_dw mac_hid pinctrl_amd acpi_cpufreq nf_conntrack_pptp nf_reject_ipv4 nf_nat nf_conntrack_netlink nfnetlink drm nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sch_htb agpgart ip_tables x_tables xfs libcrc32c crc32c_generic crc32c_intel xhci_pci xhci_hcd ehci_pci ehci_hcd
[Fri May  8 15:55:52 2020] CPU: 0 PID: 1405575 Comm: systemd-network Tainted: G        W  OE     5.6.10-arch1-1-nfcustom #1
[Fri May  8 15:55:52 2020] Hardware name: HPE ProLiant MicroServer Gen10/ProLiant MicroServer Gen10, BIOS 5.12 06/26/2018
[Fri May  8 15:55:52 2020] RIP: 0010:fib_nh_match+0x210/0x400
[Fri May  8 15:55:52 2020] Code: 7c 24 18 48 8b b5 90 00 00 00 e8 cb 97 f6 ff 48 8b 7c 24 18 41 89 c4 e8 7e 93 f6 ff 45 85 e4 0f 85 43 fe ff ff e9 66 ff ff ff <0f> 0b e9 2e ff ff ff 3c 0a 0f 85 72 fe ff ff 48 8b 8d 98 00 00 00
[Fri May  8 15:55:52 2020] RSP: 0018:ffffbe6640633a68 EFLAGS: 00010286
[Fri May  8 15:55:52 2020] RAX: 0000000000000028 RBX: ffffbe6640633bc8 RCX: 0000000000000000
[Fri May  8 15:55:52 2020] RDX: ffffbe6640633ce8 RSI: ffff9f02348ba880 RDI: ffffbe6640633b08
[Fri May  8 15:55:52 2020] RBP: ffff9f02348ba880 R08: 00000000000000fe R09: ffffbe6640633ce8
[Fri May  8 15:55:52 2020] R10: 0000000000000000 R11: ffff9f02267050e0 R12: ffffbe6640633a88
[Fri May  8 15:55:52 2020] R13: 0000000000000000 R14: ffffbe6640633bc8 R15: ffff9f02f35f6080
[Fri May  8 15:55:52 2020] FS:  00007f7fc33f9a80(0000) GS:ffff9f02f7400000(0000) knlGS:0000000000000000
[Fri May  8 15:55:52 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri May  8 15:55:52 2020] CR2: 00007fa92c2c74d8 CR3: 00000000d1a5a000 CR4: 00000000001406f0
[Fri May  8 15:55:52 2020] Call Trace:
[Fri May  8 15:55:52 2020]  fib_table_delete+0x1a7/0x310
[Fri May  8 15:55:52 2020]  inet_rtm_delroute+0x96/0x100
[Fri May  8 15:55:52 2020]  rtnetlink_rcv_msg+0x137/0x3c0
[Fri May  8 15:55:52 2020]  ? rtnl_calcit.isra.0+0x120/0x120
[Fri May  8 15:55:52 2020]  netlink_rcv_skb+0x78/0x150
[Fri May  8 15:55:52 2020]  netlink_unicast+0x19c/0x240
[Fri May  8 15:55:52 2020]  netlink_sendmsg+0x243/0x480
[Fri May  8 15:55:52 2020]  sock_sendmsg+0x5e/0x60
[Fri May  8 15:55:52 2020]  __sys_sendto+0x120/0x190
[Fri May  8 15:55:52 2020]  __x64_sys_sendto+0x25/0x30
[Fri May  8 15:55:52 2020]  do_syscall_64+0x4e/0x150
[Fri May  8 15:55:52 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Fri May  8 15:55:52 2020] RIP: 0033:0x7f7fc377c92a
[Fri May  8 15:55:52 2020] Code: 48 c7 c0 ff ff ff ff eb bc 0f 1f 80 00 00 00 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
[Fri May  8 15:55:52 2020] RSP: 002b:00007ffcd08e65e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[Fri May  8 15:55:52 2020] RAX: ffffffffffffffda RBX: 00007ffcd08e666c RCX: 00007f7fc377c92a
[Fri May  8 15:55:52 2020] RDX: 0000000000000034 RSI: 000056514b4724e0 RDI: 0000000000000003
[Fri May  8 15:55:52 2020] RBP: 000056514b3e6dd0 R08: 00007ffcd08e65f0 R09: 0000000000000010
[Fri May  8 15:55:52 2020] R10: 0000000000000000 R11: 0000000000000246 R12: 000056514b5101f0
[Fri May  8 15:55:52 2020] R13: 0000000000157287 R14: 000056514b46ffe0 R15: 000056514b3965b8
[Fri May  8 15:55:52 2020] ---[ end trace 04fb233d5d78bf6c ]---
[Fri May  8 15:55:52 2020] ------------[ cut here ]------------
[Fri May  8 15:55:52 2020] WARNING: CPU: 0 PID: 1405575 at include/net/nexthop.h:251 fib_nh_match+0x210/0x400
[Fri May  8 15:55:52 2020] Modules linked in: sch_sfq pppoe pppox ppp_generic slhc tun 8021q garp mrp stp llc bonding xt_nat iptable_nat ipt_NETFLOW(OE) xt_state xt_addrtype xt_conntrack xt_comment ipt_REJECT xt_multiport iptable_filter xt_TCPMSS xt_tcpudp xt_mark xt_set iptable_mangle ip_set_hash_net ip_set_bitmap_ipmac ip_set ext4 crc16 mbcache jbd2 amd64_edac_mod edac_mce_amd kvm_amd ccp rng_core kvm irqbypass hid_generic crct10dif_pclmul crc32_pclmul gpu_sched ghash_clmulni_intel i2c_algo_bit ttm drm_kms_helper aesni_intel cec crypto_simd tg3 usbhid cryptd glue_helper hid rc_core pcspkr syscopyarea sp5100_tco fam15h_power sysfillrect k10temp sysimgblt i2c_piix4 fb_sys_fops libphy evdev 8250_dw mac_hid pinctrl_amd acpi_cpufreq nf_conntrack_pptp nf_reject_ipv4 nf_nat nf_conntrack_netlink nfnetlink drm nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sch_htb agpgart ip_tables x_tables xfs libcrc32c crc32c_generic crc32c_intel xhci_pci xhci_hcd ehci_pci ehci_hcd
[Fri May  8 15:55:52 2020] CPU: 0 PID: 1405575 Comm: systemd-network Tainted: G        W  OE     5.6.10-arch1-1-nfcustom #1
[Fri May  8 15:55:52 2020] Hardware name: HPE ProLiant MicroServer Gen10/ProLiant MicroServer Gen10, BIOS 5.12 06/26/2018
[Fri May  8 15:55:52 2020] RIP: 0010:fib_nh_match+0x210/0x400
[Fri May  8 15:55:52 2020] Code: 7c 24 18 48 8b b5 90 00 00 00 e8 cb 97 f6 ff 48 8b 7c 24 18 41 89 c4 e8 7e 93 f6 ff 45 85 e4 0f 85 43 fe ff ff e9 66 ff ff ff <0f> 0b e9 2e ff ff ff 3c 0a 0f 85 72 fe ff ff 48 8b 8d 98 00 00 00
[Fri May  8 15:55:52 2020] RSP: 0018:ffffbe6640633a68 EFLAGS: 00010286
[Fri May  8 15:55:52 2020] RAX: 0000000000000028 RBX: ffffbe6640633bc8 RCX: 0000000000000000
[Fri May  8 15:55:52 2020] RDX: ffffbe6640633ce8 RSI: ffff9f02348ba880 RDI: ffffbe6640633b08
[Fri May  8 15:55:52 2020] RBP: ffff9f02348ba880 R08: 00000000000000fe R09: ffffbe6640633ce8
[Fri May  8 15:55:52 2020] R10: 0000000000000000 R11: ffff9f01bedadc80 R12: ffffbe6640633a88
[Fri May  8 15:55:52 2020] R13: 0000000000000000 R14: ffffbe6640633bc8 R15: ffff9f02f35f6080
[Fri May  8 15:55:52 2020] FS:  00007f7fc33f9a80(0000) GS:ffff9f02f7400000(0000) knlGS:0000000000000000
[Fri May  8 15:55:52 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri May  8 15:55:52 2020] CR2: 00007fa92c2c74d8 CR3: 00000000d1a5a000 CR4: 00000000001406f0
[Fri May  8 15:55:52 2020] Call Trace:
[Fri May  8 15:55:52 2020]  fib_table_delete+0x1a7/0x310
[Fri May  8 15:55:52 2020]  inet_rtm_delroute+0x96/0x100
[Fri May  8 15:55:52 2020]  rtnetlink_rcv_msg+0x137/0x3c0
[Fri May  8 15:55:52 2020]  ? rtnl_calcit.isra.0+0x120/0x120
[Fri May  8 15:55:52 2020]  netlink_rcv_skb+0x78/0x150
[Fri May  8 15:55:52 2020]  netlink_unicast+0x19c/0x240
[Fri May  8 15:55:52 2020]  netlink_sendmsg+0x243/0x480
[Fri May  8 15:55:52 2020]  sock_sendmsg+0x5e/0x60
[Fri May  8 15:55:52 2020]  __sys_sendto+0x120/0x190
[Fri May  8 15:55:52 2020]  __x64_sys_sendto+0x25/0x30
[Fri May  8 15:55:52 2020]  do_syscall_64+0x4e/0x150
[Fri May  8 15:55:52 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Fri May  8 15:55:52 2020] RIP: 0033:0x7f7fc377c92a
[Fri May  8 15:55:52 2020] Code: 48 c7 c0 ff ff ff ff eb bc 0f 1f 80 00 00 00 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
[Fri May  8 15:55:52 2020] RSP: 002b:00007ffcd08e65e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[Fri May  8 15:55:52 2020] RAX: ffffffffffffffda RBX: 00007ffcd08e666c RCX: 00007f7fc377c92a
[Fri May  8 15:55:52 2020] RDX: 000000000000003c RSI: 000056514b512290 RDI: 0000000000000003
[Fri May  8 15:55:52 2020] RBP: 000056514b3e6dd0 R08: 00007ffcd08e65f0 R09: 0000000000000010
[Fri May  8 15:55:52 2020] R10: 0000000000000000 R11: 0000000000000246 R12: 000056514b5101f0
[Fri May  8 15:55:52 2020] R13: 0000000000157287 R14: 000056514b47b580 R15: 000056514b3965b8
[Fri May  8 15:55:52 2020] ---[ end trace 04fb233d5d78bf6d ]---
[Fri May  8 15:55:52 2020] ipt_NETFLOW: sendmsg[0] error -101: data loss 200 pkt, 124391 bytes: network is unreachable.
[Fri May  8 15:55:52 2020] ------------[ cut here ]------------
[Fri May  8 15:55:52 2020] WARNING: CPU: 1 PID: 1405575 at include/net/nexthop.h:251 fib_nh_match+0x210/0x400
[Fri May  8 15:55:52 2020] Modules linked in: sch_sfq pppoe pppox ppp_generic slhc tun 8021q garp mrp stp llc bonding xt_nat iptable_nat ipt_NETFLOW(OE) xt_state xt_addrtype xt_conntrack xt_comment ipt_REJECT xt_multiport iptable_filter xt_TCPMSS xt_tcpudp xt_mark xt_set iptable_mangle ip_set_hash_net ip_set_bitmap_ipmac ip_set ext4 crc16 mbcache jbd2 amd64_edac_mod edac_mce_amd kvm_amd ccp rng_core kvm irqbypass hid_generic crct10dif_pclmul crc32_pclmul gpu_sched ghash_clmulni_intel i2c_algo_bit ttm drm_kms_helper aesni_intel cec crypto_simd tg3 usbhid cryptd glue_helper hid rc_core pcspkr syscopyarea sp5100_tco fam15h_power sysfillrect k10temp sysimgblt i2c_piix4 fb_sys_fops libphy evdev 8250_dw mac_hid pinctrl_amd acpi_cpufreq nf_conntrack_pptp nf_reject_ipv4 nf_nat nf_conntrack_netlink nfnetlink drm nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sch_htb agpgart ip_tables x_tables xfs libcrc32c crc32c_generic crc32c_intel xhci_pci xhci_hcd ehci_pci ehci_hcd
[Fri May  8 15:55:52 2020] CPU: 1 PID: 1405575 Comm: systemd-network Tainted: G        W  OE     5.6.10-arch1-1-nfcustom #1
[Fri May  8 15:55:52 2020] Hardware name: HPE ProLiant MicroServer Gen10/ProLiant MicroServer Gen10, BIOS 5.12 06/26/2018
[Fri May  8 15:55:52 2020] RIP: 0010:fib_nh_match+0x210/0x400
[Fri May  8 15:55:52 2020] Code: 7c 24 18 48 8b b5 90 00 00 00 e8 cb 97 f6 ff 48 8b 7c 24 18 41 89 c4 e8 7e 93 f6 ff 45 85 e4 0f 85 43 fe ff ff e9 66 ff ff ff <0f> 0b e9 2e ff ff ff 3c 0a 0f 85 72 fe ff ff 48 8b 8d 98 00 00 00
[Fri May  8 15:55:52 2020] RSP: 0018:ffffbe6640633a68 EFLAGS: 00010282
[Fri May  8 15:55:52 2020] RAX: 0000000000000028 RBX: ffffbe6640633bc8 RCX: 0000000000000000
[Fri May  8 15:55:52 2020] RDX: ffffbe6640633ce8 RSI: ffff9f0232212a80 RDI: ffffbe6640633b08
[Fri May  8 15:55:52 2020] RBP: ffff9f0232212a80 R08: 00000000000000fe R09: ffffbe6640633ce8
[Fri May  8 15:55:52 2020] R10: 0000000000000000 R11: ffff9f02267050e0 R12: ffffbe6640633a88
[Fri May  8 15:55:52 2020] R13: 0000000000000000 R14: ffffbe6640633bc8 R15: ffff9f02f35f6080
[Fri May  8 15:55:52 2020] FS:  00007f7fc33f9a80(0000) GS:ffff9f02f7480000(0000) knlGS:0000000000000000
[Fri May  8 15:55:52 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri May  8 15:55:52 2020] CR2: 00007fa92d320850 CR3: 00000000d1a5a000 CR4: 00000000001406e0
[Fri May  8 15:55:52 2020] Call Trace:
[Fri May  8 15:55:52 2020]  ? memcg_check_events+0x40/0x220
[Fri May  8 15:55:52 2020]  fib_table_delete+0x1a7/0x310
[Fri May  8 15:55:52 2020]  inet_rtm_delroute+0x96/0x100
[Fri May  8 15:55:52 2020]  rtnetlink_rcv_msg+0x137/0x3c0
[Fri May  8 15:55:52 2020]  ? rtnl_calcit.isra.0+0x120/0x120
[Fri May  8 15:55:52 2020]  netlink_rcv_skb+0x78/0x150
[Fri May  8 15:55:52 2020]  netlink_unicast+0x19c/0x240
[Fri May  8 15:55:52 2020]  netlink_sendmsg+0x243/0x480
[Fri May  8 15:55:52 2020]  sock_sendmsg+0x5e/0x60
[Fri May  8 15:55:52 2020]  __sys_sendto+0x120/0x190
[Fri May  8 15:55:52 2020]  __x64_sys_sendto+0x25/0x30
[Fri May  8 15:55:52 2020]  do_syscall_64+0x4e/0x150
[Fri May  8 15:55:52 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Fri May  8 15:55:52 2020] RIP: 0033:0x7f7fc377c92a
[Fri May  8 15:55:52 2020] Code: 48 c7 c0 ff ff ff ff eb bc 0f 1f 80 00 00 00 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
[Fri May  8 15:55:52 2020] RSP: 002b:00007ffcd08e65c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[Fri May  8 15:55:52 2020] RAX: ffffffffffffffda RBX: 00007ffcd08e664c RCX: 00007f7fc377c92a
[Fri May  8 15:55:52 2020] RDX: 0000000000000034 RSI: 000056514b47dff0 RDI: 0000000000000003
[Fri May  8 15:55:52 2020] RBP: 000056514b3e6dd0 R08: 00007ffcd08e65d0 R09: 0000000000000010
[Fri May  8 15:55:52 2020] R10: 0000000000000000 R11: 0000000000000246 R12: 000056514b5101f0
[Fri May  8 15:55:52 2020] R13: 0000000000157287 R14: 000056514b443060 R15: 000056514b3965b8
[Fri May  8 15:55:52 2020] ---[ end trace 04fb233d5d78bf6e ]---

@sworleys, how do you think, this is another warning? This kernel is 5.6.10.

@sworleys
Copy link
Member

sworleys commented May 8, 2020

@k0ste it looks to be the same. The patch looks like its gonna be in 5.7, unlikely to make it into 5.6. Idk for sure though.

In the meantime you can disable nexthop objects in FRR with:

its a hidden command:

no zebra nexthop kernel enable

@sworleys
Copy link
Member

sworleys commented May 8, 2020

That will just return it to a traditional route installation.

@sworleys
Copy link
Member

sworleys commented May 8, 2020

torvalds/linux@7c74b0b

here is the commit in the linux tree if you want to track it.

looks to be only in the 5.7 branches

@k0ste
Copy link
Contributor

k0ste commented May 9, 2020

@sworleys, patch applied to 5.6.8: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/net/ipv4/fib_semantics.c?h=linux-5.6.y

And actually this resolves your (and me) original issues with fib_select_path, current warn on fib_nh_match.

@sworleys
Copy link
Member

@sworleys, patch applied to 5.6.8: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/net/ipv4/fib_semantics.c?h=linux-5.6.y

And actually this resolves your (and me) original issues with fib_select_path, current warn on fib_nh_match.

ah... yea its different.

can you open another ticket with that one and a bit more details on the routes needed to repro?

@sworleys
Copy link
Member

@k0ste ^^^ see previous message

@k0ste
Copy link
Contributor

k0ste commented May 15, 2020

@sworleys, yeah I will do.

jpuhlman pushed a commit to MontaVista-OpenSourceTechnology/linux-mvista that referenced this issue May 15, 2020
Source: Kernel.org
MR: 103311
Type: Integration
Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable  linux-5.4.y
ChangeID: 382f57b996aa90dcfb13b02fcadeeb3c264cb996
Description:

[ Upstream commit 7c74b0b ]

A user reported [0] hitting the WARN_ON in fib_info_nh:

    [ 8633.839816] ------------[ cut here ]------------
    [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
    ...
    [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
    ...
    [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
    [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
    [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
    [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
    [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
    [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
    [ 8633.839857] FS:  00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
    [ 8633.839858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
    [ 8633.839867] Call Trace:
    [ 8633.839871]  ip_route_output_key_hash_rcu+0x421/0x890
    [ 8633.839873]  ip_route_output_key_hash+0x5e/0x80
    [ 8633.839876]  ip_route_output_flow+0x1a/0x50
    [ 8633.839878]  __ip4_datagram_connect+0x154/0x310
    [ 8633.839880]  ip4_datagram_connect+0x28/0x40
    [ 8633.839882]  __sys_connect+0xd6/0x100
    ...

The WARN_ON is triggered in fib_select_default which is invoked when
there are multiple default routes. Update the function to use
fib_info_nhc and convert the nexthop checks to use fib_nh_common.

Add test case that covers the affected code path.

[0] FRRouting/frr#6089

Fixes: 493ced1 ("ipv4: Allow routes to use nexthop objects")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Armin Kuster <[email protected]>
cesfahani pushed a commit to cesfahani/ubuntu-focal-ga401i that referenced this issue Jun 24, 2020
BugLink: https://bugs.launchpad.net/bugs/1876361

[ Upstream commit 7c74b0bec918c1e0ca0b4208038c156eacf8f13f ]

A user reported [0] hitting the WARN_ON in fib_info_nh:

    [ 8633.839816] ------------[ cut here ]------------
    [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
    ...
    [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
    ...
    [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
    [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
    [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
    [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
    [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
    [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
    [ 8633.839857] FS:  00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
    [ 8633.839858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
    [ 8633.839867] Call Trace:
    [ 8633.839871]  ip_route_output_key_hash_rcu+0x421/0x890
    [ 8633.839873]  ip_route_output_key_hash+0x5e/0x80
    [ 8633.839876]  ip_route_output_flow+0x1a/0x50
    [ 8633.839878]  __ip4_datagram_connect+0x154/0x310
    [ 8633.839880]  ip4_datagram_connect+0x28/0x40
    [ 8633.839882]  __sys_connect+0xd6/0x100
    ...

The WARN_ON is triggered in fib_select_default which is invoked when
there are multiple default routes. Update the function to use
fib_info_nhc and convert the nexthop checks to use fib_nh_common.

Add test case that covers the affected code path.

[0] FRRouting/frr#6089

Fixes: 493ced1 ("ipv4: Allow routes to use nexthop objects")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
Signed-off-by: Stefan Bader <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation zebra
Projects
None yet
Development

No branches or pull requests

4 participants