Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable tcp ldp session when using explicit-null #8313

Closed
Viktort-rf opened this issue Mar 23, 2021 · 4 comments · Fixed by #16226
Closed

Unstable tcp ldp session when using explicit-null #8313

Viktort-rf opened this issue Mar 23, 2021 · 4 comments · Fixed by #16226
Labels
mpls triage Needs further investigation

Comments

@Viktort-rf
Copy link


Describe the bug
When enabling "label local advertisement explicit-null" on both routers, there are problems with maintaining the tcp ldp session. It drops when the Keep Alive timer expires. And immediately its new approval.
iptables is disabled.
No Keep Alive messages are visible on the removed dumps.
I note that if you turn off "label local advertisement explicit-null" on at least one side, the session is stable.

Put "x" in "[ ]" if you already tried following:

[ ] Did you check if this is a duplicate issue?
[ ] Did you test it on the latest FRRouting/frr master branch?

To Reproduce

  1. Log in to the routers.
  2. Run the following commands (on both routers):
    conf t
    mpls ldp
    address-family ipv4
    label local advertise explicit-null
  3. We are waiting for the ldp session to be raised.
  4. We wait for the standard 3 minutes. (Default timers)
  5. We see a break and a renegotiation of the ldp session (timers are reset to zero)

Expected behavior
Stable maintenance of the tcp ldp session when using the explicit-null label

Versions

  • OS Version:
    CentOS7
  • Kernel:
    5.10.13-1.el7.elrepo.x86_64
  • FRR Version:
    7.7-dev_git
@Viktort-rf Viktort-rf added the triage Needs further investigation label Mar 23, 2021
@qlyoung qlyoung added the mpls label Mar 23, 2021
@anp135
Copy link

anp135 commented Mar 29, 2021

The same behaviour with frr 7.5, 7.5.1

@EasyNetDev
Copy link
Contributor

EasyNetDev commented Jan 19, 2024

Hi,

I've notice the same behavior for latest 9.2-dev version. Exactly at 3 minutes the LDP session is going down and renewed:

R01

2024-01-19T17:04:11.094434+02:00 R01 ldpd[2140917]: msg[in]: notification: lsr-id 10.100.2.1, status KeepAlive Timer Expired (fatal error)
2024-01-19T17:04:11.094484+02:00 R01 ldpd[2140917]: nbr_fsm: event SESSION CLOSE resulted in action CLOSE SESSION and changing state for lsr-id 10.100.2.1 from OPERATIONAL to PRESENT
2024-01-19T17:04:11.094536+02:00 R01 ldpd[2140917]: session_close: closing session with lsr-id 10.100.2.1
2024-01-19T17:04:14.409014+02:00 R01 ldpd[2140917]: discovery[recv]: iface lan0.3001 lsr-id 10.100.2.1 transport-address 10.100.2.1 holdtime 15 (dual stack TLV present)
2024-01-19T17:04:14.409084+02:00 R01 ldpd[2140917]: discovery[recv]: iface wan0.3000 lsr-id 10.100.2.1 transport-address 10.100.2.1 holdtime 15 (dual stack TLV present)
2024-01-19T17:04:14.409173+02:00 R01 ldpd[2140917]: discovery[recv]: iface lan0.3001 lsr-id 10.100.2.1 transport-address fc00:0:0:2::1 holdtime 15 (dual stack TLV present)
2024-01-19T17:04:14.409233+02:00 R01 ldpd[2140917]: discovery[recv]: iface wan0.3000 lsr-id 10.100.2.1 transport-address fc00:0:0:2::1 holdtime 15 (dual stack TLV present)
2024-01-19T17:04:15.890825+02:00 R01 ldpd[2140917]: discovery[send]: iface gre1301 (ipv4) holdtime 15
2024-01-19T17:04:15.890974+02:00 R01 ldpd[2140917]: discovery[send]: iface lan0.3001 (ipv4) holdtime 15
2024-01-19T17:04:15.891128+02:00 R01 ldpd[2140917]: discovery[send]: iface wan0.3000 (ipv4) holdtime 15
2024-01-19T17:04:15.891251+02:00 R01 ldpd[2140917]: discovery[send]: iface gre1301 (ipv6) holdtime 15
2024-01-19T17:04:15.891308+02:00 R01 ldpd[2140917]: discovery[send]: iface lan0.3001 (ipv6) holdtime 15
2024-01-19T17:04:15.891383+02:00 R01 ldpd[2140917]: discovery[send]: iface wan0.3000 (ipv6) holdtime 15
2024-01-19T17:04:15.891438+02:00 R01 ldpd[2140917]: discovery[recv]: iface lan0.3001 lsr-id 10.100.2.1 transport-address 10.100.2.1 holdtime 15 (dual stack TLV present)
2024-01-19T17:04:15.891545+02:00 R01 ldpd[2140917]: discovery[recv]: iface lan0.3001 lsr-id 10.100.2.1 transport-address fc00:0:0:2::1 holdtime 15 (dual stack TLV present)
2024-01-19T17:04:15.891649+02:00 R01 ldpd[2140917]: discovery[recv]: iface wan0.3000 lsr-id 10.100.2.1 transport-address 10.100.2.1 holdtime 15 (dual stack TLV present)
2024-01-19T17:04:15.891729+02:00 R01 ldpd[2140917]: discovery[recv]: iface wan0.3000 lsr-id 10.100.2.1 transport-address fc00:0:0:2::1 holdtime 15 (dual stack TLV present)
2024-01-19T17:04:15.891791+02:00 R01 ldpd[2140917]: nbr_fsm: event ADJACENCY MATCHED resulted in action NOTHING and changing state for lsr-id 10.100.2.1 from PRESENT to INITIALIZED
2024-01-19T17:04:16.098875+02:00 R01 ldpd[2140917]: msg[in]: initialization: lsr-id 10.100.2.1
2024-01-19T17:04:16.099001+02:00 R01 ldpd[2140917]: recv_init: lsr-id 10.100.2.1 announced the Dynamic Capability Announcement capability
2024-01-19T17:04:16.099057+02:00 R01 ldpd[2140917]: recv_init: lsr-id 10.100.2.1 announced the Typed Wildcard FEC capability
2024-01-19T17:04:16.099105+02:00 R01 ldpd[2140917]: recv_init: lsr-id 10.100.2.1 announced the Unrecognized Notification capability
2024-01-19T17:04:16.099155+02:00 R01 ldpd[2140917]: nbr_fsm: event INIT RECEIVED resulted in action SEND INIT AND KEEPALIVE and changing state for lsr-id 10.100.2.1 from INITIALIZED to OPENREC
2024-01-19T17:04:16.099203+02:00 R01 ldpd[2140917]: msg[out]: initialization: lsr-id 10.100.2.1
2024-01-19T17:04:16.099978+02:00 R01 ldpd[2140917]: nbr_fsm: event KEEPALIVE RECEIVED resulted in action START NEIGHBOR SESSION and changing state for lsr-id 10.100.2.1 from OPENREC to OPERATIONAL

R02:

2024-01-19T17:04:11.085389+02:00 R02 ldpd[52209]: nbr_ktimeout: lsr-id 10.100.1.1
2024-01-19T17:04:11.085748+02:00 R02 ldpd[52209]: msg[out]: notification: lsr-id 10.100.1.1, status KeepAlive Timer Expired (fatal error)
2024-01-19T17:04:11.085871+02:00 R02 ldpd[52209]: nbr_fsm: event SESSION CLOSE resulted in action CLOSE SESSION and changing state for lsr-id 10.100.1.1 from OPERATIONAL to PRESENT
2024-01-19T17:04:11.086036+02:00 R02 ldpd[52209]: session_close: closing session with lsr-id 10.100.1.1
2024-01-19T17:04:14.410304+02:00 R02 ldpd[52209]: discovery[send]: iface lan0.3001 (ipv4) holdtime 15
2024-01-19T17:04:14.410464+02:00 R02 ldpd[52209]: discovery[send]: iface wan0.3000 (ipv4) holdtime 15
2024-01-19T17:04:14.410573+02:00 R02 ldpd[52209]: discovery[send]: iface lan0.3001 (ipv6) holdtime 15
2024-01-19T17:04:14.410660+02:00 R02 ldpd[52209]: discovery[send]: iface wan0.3000 (ipv6) holdtime 15
2024-01-19T17:04:15.892818+02:00 R02 ldpd[52209]: discovery[recv]: iface wan0.3000 lsr-id 10.100.1.1 transport-address 10.100.1.1 holdtime 15 (dual stack TLV present)
2024-01-19T17:04:15.893060+02:00 R02 ldpd[52209]: discovery[send]: iface lan0.3001 (ipv4) holdtime 15
2024-01-19T17:04:15.893171+02:00 R02 ldpd[52209]: discovery[send]: iface wan0.3000 (ipv4) holdtime 15
2024-01-19T17:04:15.893275+02:00 R02 ldpd[52209]: discovery[send]: iface lan0.3001 (ipv6) holdtime 15
2024-01-19T17:04:15.893378+02:00 R02 ldpd[52209]: discovery[send]: iface wan0.3000 (ipv6) holdtime 15
2024-01-19T17:04:15.893491+02:00 R02 ldpd[52209]: discovery[recv]: iface lan0.3001 lsr-id 10.100.1.1 transport-address 10.100.1.1 holdtime 15 (dual stack TLV present)
2024-01-19T17:04:15.893595+02:00 R02 ldpd[52209]: discovery[recv]: iface lan0.3001 lsr-id 10.100.1.1 transport-address fc00:0:0:1::1 holdtime 15 (dual stack TLV present)
2024-01-19T17:04:15.893686+02:00 R02 ldpd[52209]: discovery[recv]: iface wan0.3000 lsr-id 10.100.1.1 transport-address fc00:0:0:1::1 holdtime 15 (dual stack TLV present)
2024-01-19T17:04:15.893770+02:00 R02 ldpd[52209]: nbr_fsm: event CONNECTION UP resulted in action SETUP NEIGHBOR CONNECTION and changing state for lsr-id 10.100.1.1 from PRESENT to INITIALIZED
2024-01-19T17:04:15.893879+02:00 R02 ldpd[52209]: msg[out]: initialization: lsr-id 10.100.1.1
2024-01-19T17:04:15.893996+02:00 R02 ldpd[52209]: nbr_fsm: event INIT SENT resulted in action NOTHING and changing state for lsr-id 10.100.1.1 from INITIALIZED to OPENSENT
2024-01-19T17:04:16.100525+02:00 R02 ldpd[52209]: msg[in]: initialization: lsr-id 10.100.1.1
2024-01-19T17:04:16.100730+02:00 R02 ldpd[52209]: recv_init: lsr-id 10.100.1.1 announced the Dynamic Capability Announcement capability
2024-01-19T17:04:16.100861+02:00 R02 ldpd[52209]: recv_init: lsr-id 10.100.1.1 announced the Typed Wildcard FEC capability
2024-01-19T17:04:16.100986+02:00 R02 ldpd[52209]: recv_init: lsr-id 10.100.1.1 announced the Unrecognized Notification capability
2024-01-19T17:04:16.101125+02:00 R02 ldpd[52209]: nbr_fsm: event INIT RECEIVED resulted in action SEND KEEPALIVE and changing state for lsr-id 10.100.1.1 from OPENSENT to OPENREC
2024-01-19T17:04:16.101286+02:00 R02 ldpd[52209]: nbr_fsm: event KEEPALIVE RECEIVED resulted in action START NEIGHBOR SESSION and changing state for lsr-id 10.100.1.1 from OPENREC to OPERATIONAL
2024-01-19T17:04:16.101420+02:00 R02 ldpd[52209]: msg[out]: address: lsr-id 10.100.1.1, address 10.100.100.2

My configs:
R01:

mpls ldp
 router-id 10.100.1.1
 dual-stack cisco-interop
 neighbor 10.100.2.1 password XXXXXXXX
 neighbor 10.100.13.1 password XXXXXXXX
 !
 address-family ipv4
  discovery transport-address 10.100.1.1
  label local advertise explicit-null
  !
  interface gre1301
  exit
  !
  interface lan0.3001
  exit
  !
  interface wan0.3000
  exit
  !
 exit-address-family
 !
 address-family ipv6
  discovery transport-address fc00:0:0:1::1
  label local advertise explicit-null
  !
  interface gre1301
  exit
  !
  interface lan0.3001
  exit
  !
  interface wan0.3000
  exit
  !
 exit-address-family
 !
exit

R02:

mpls ldp
 router-id 10.100.2.1
 dual-stack cisco-interop
 neighbor 10.100.1.1 password XXXXXXXX
 !
 address-family ipv4
  discovery transport-address 10.100.2.1
  label local advertise explicit-null
  !
  interface lan0.3001
  exit
  !
  interface wan0.3000
  exit
  !
 exit-address-family
 !
 address-family ipv6
  discovery transport-address fc00:0:0:2::1
  label local advertise explicit-null
  !
  interface lan0.3001
  exit
  !
  interface wan0.3000
  exit
  !
 exit-address-family
 !
exit

There any workarounds to avoid this issue? Should I disable for the moment explicit-null?

@anlancs
Copy link
Contributor

anlancs commented Jun 12, 2024

I have met this same issue, maybe known limitation? any help? Thanks! @donaldsharp @qlyoung

@rwestphal
Copy link
Member

@anlancs The problem is likely that MPLS label processing isn't enabled on your interfaces via sysctl.

You can find instructions on how to enable it here: https://docs.frrouting.org/en/stable-5.0/installation.html#linux-sysctl-settings-and-kernel-modules

anlancs added a commit to anlancs/frr that referenced this issue Jun 15, 2024
In linux kernel networking stack, the received mpls packets will be
processed by the host *twice*, one as mpls packet, the other as ip packet,
so its ttl decreased 1.

So, we must release the `IP_MINTTL` if gtsm is enabled, it is for the
mpls packets caused by the command `label local advertise explicit-null`.

This change makes the gtsm mechanism a bit deviation.

Fix PR FRRouting#8313

Signed-off-by: anlan_cs <[email protected]>
anlancs added a commit to anlancs/frr that referenced this issue Jun 15, 2024
In linux networking stack, the received mpls packets will be processed
by the host *twice*, one as mpls packet, the other as ip packet, so
its ttl decreased 1.

So, we need release the `IP_MINTTL` value if gtsm is enabled, it is for the
mpls packets of neighbor session caused by the command:
`label local advertise explicit-null`.

This change makes the gtsm mechanism a bit deviation.

Fix PR FRRouting#8313

Signed-off-by: anlan_cs <[email protected]>
anlancs added a commit to anlancs/frr that referenced this issue Jun 15, 2024
In linux networking stack, the received mpls packets will be processed
by the host *twice*, one as mpls packet, the other as ip packet, so
its ttl decreased 1.

So, we need release the `IP_MINTTL` value if gtsm is enabled, it is for the
mpls packets of neighbor session caused by the command:
`label local advertise explicit-null`.

This change makes the gtsm mechanism a bit deviation.

Fix PR FRRouting#8313

Signed-off-by: anlan_cs <[email protected]>
anlancs added a commit to anlancs/frr that referenced this issue Jun 15, 2024
In linux networking stack, the received mpls packets will be processed
by the host *twice*, one as mpls packet, the other as ip packet, so
its ttl decreased 1.

So, we need release the `IP_MINTTL` value if gtsm is enabled, it is for the
mpls packets of neighbor session caused by the command:
`label local advertise explicit-null`.

This change makes the gtsm mechanism a bit deviation.

Fix PR FRRouting#8313

Signed-off-by: anlan_cs <[email protected]>
mergify bot pushed a commit that referenced this issue Jul 2, 2024
In linux networking stack, the received mpls packets will be processed
by the host *twice*, one as mpls packet, the other as ip packet, so
its ttl decreased 1.

So, we need release the `IP_MINTTL` value if gtsm is enabled, it is for the
mpls packets of neighbor session caused by the command:
`label local advertise explicit-null`.

This change makes the gtsm mechanism a bit deviation.

Fix PR #8313

Signed-off-by: anlan_cs <[email protected]>
(cherry picked from commit 1919df3)
mergify bot pushed a commit that referenced this issue Jul 2, 2024
In linux networking stack, the received mpls packets will be processed
by the host *twice*, one as mpls packet, the other as ip packet, so
its ttl decreased 1.

So, we need release the `IP_MINTTL` value if gtsm is enabled, it is for the
mpls packets of neighbor session caused by the command:
`label local advertise explicit-null`.

This change makes the gtsm mechanism a bit deviation.

Fix PR #8313

Signed-off-by: anlan_cs <[email protected]>
(cherry picked from commit 1919df3)
mergify bot pushed a commit that referenced this issue Jul 2, 2024
In linux networking stack, the received mpls packets will be processed
by the host *twice*, one as mpls packet, the other as ip packet, so
its ttl decreased 1.

So, we need release the `IP_MINTTL` value if gtsm is enabled, it is for the
mpls packets of neighbor session caused by the command:
`label local advertise explicit-null`.

This change makes the gtsm mechanism a bit deviation.

Fix PR #8313

Signed-off-by: anlan_cs <[email protected]>
(cherry picked from commit 1919df3)
mergify bot pushed a commit that referenced this issue Jul 2, 2024
In linux networking stack, the received mpls packets will be processed
by the host *twice*, one as mpls packet, the other as ip packet, so
its ttl decreased 1.

So, we need release the `IP_MINTTL` value if gtsm is enabled, it is for the
mpls packets of neighbor session caused by the command:
`label local advertise explicit-null`.

This change makes the gtsm mechanism a bit deviation.

Fix PR #8313

Signed-off-by: anlan_cs <[email protected]>
(cherry picked from commit 1919df3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mpls triage Needs further investigation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants