Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPP requests hang because of VPP Liveness Check #792

Closed
NikitaSkrynnik opened this issue Jan 12, 2024 · 3 comments
Closed

VPP requests hang because of VPP Liveness Check #792

NikitaSkrynnik opened this issue Jan 12, 2024 · 3 comments
Assignees

Comments

@NikitaSkrynnik
Copy link
Contributor

Description

VPP API requests hang until context deadline if we get ping.WantPingFinishedEvents after we made the request but before we get the response.

Example (Close request in cmd-nsc-vpp):

  1. We make WantPingFinishedEvents request
  2. We make IPRouteAddDel request
  3. We get PingFinishedEvent
  4. For some reason we wait for IPRouteAddDel response until context deadline and get the error context deadline exceeded

If we wait until we get PingFinishedEvent from vpp it fixes the problem

@NikitaSkrynnik
Copy link
Contributor Author

NikitaSkrynnik commented Jan 18, 2024

1. Experiment with two different connections to VPP API (To exclude concurrency problems in govpp)

Description

Create another connection to VPP API for liveness check.

Resutls

govpp creates completely new connection with its read and write channels but the behavior is the same. govpp doesn't receive the reply for IPRouteAddDel request.

Logs

Jan 18 00:21:08.217 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (9)         ⎆ sdk/pkg/networkservice/common/clientinfo/clientInfo.Close()
Jan 18 00:21:08.218 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (10)          ⎆ sdk/pkg/networkservice/common/upstreamrefresh/upstreamRefreshClient.Close()
Jan 18 00:21:08.218 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (11)           ⎆ sdk/pkg/networkservice/core/next/nextClient.Close()
Jan 18 00:21:08.218 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (12)            ⎆ sdk-vpp/pkg/networkservice/up/peerup/peerupClient.Close()
Jan 18 00:21:08.218 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (13)             ⎆ sdk-vpp/pkg/networkservice/up/upClient.Close()
Jan 18 00:21:08.219 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (14)              ⎆ sdk-vpp/pkg/networkservice/up/ipsecup/ipsecUpClient.Close()
Jan 18 00:21:08.219 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (15)               ⎆ sdk/pkg/networkservice/core/next/nextClient.Close()
Jan 18 00:21:08.219 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (16)                ⎆ sdk-vpp/pkg/networkservice/connectioncontext/mtu/mtuClient.Close()
Jan 18 00:21:08.219 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (17)                 ⎆ sdk-vpp/pkg/networkservice/connectioncontext/ipcontext/routes/routesClient.Close()
Jan 18 00:21:08.220 [INFO] [conn:0xc00039e240] [msg:ip_route_add_del] [govpp.core.stream.Invoke] Start
Jan 18 00:21:08.541 [INFO] [cmd:vpp] vpp[17]: memif_plugin: memif1013904223/0: memif_msg_receive_disconnect: disconnect received
Jan 18 00:21:08.541 [INFO] io.ReadAtLeast finish
Jan 18 00:21:08.541 [INFO] reading msg header is successful
Jan 18 00:21:08.541 [INFO] reading msg data
Jan 18 00:21:08.541 [INFO]  -- readMsg done (buffered: 0)
Jan 18 00:21:08.541 [INFO] read msg: [2 202 128 0 0 2 0 0 0 1 0 0 0 1 0 0 0 0 0], err: <nil>
Jan 18 00:21:08.541 [INFO] recvMsg (19) msgID=714 context=2147483650
Jan 18 00:21:08.541 [INFO] call c.msgCallback
Jan 18 00:21:08.541 [INFO] [msgId:714] [govpp.core.request_handler] Got a message
Jan 18 00:21:08.541 [INFO] [msg:sw_interface_event] [msgId:714] [govpp.core.request_handler] Got a message
Jan 18 00:21:08.541 [INFO] [msg:sw_interface_event] [govpp.core.request_handler] Decoded the message Context
Jan 18 00:21:08.541 [INFO] [msg:sw_interface_event] [govpp.core.request_handler] Decoded the message
Jan 18 00:21:08.541 [INFO] [msg:sw_interface_event] [govpp.core.request_handler] Check if the message is a notification message
time="2024-01-18T00:21:08Z" level=info msg="No subscription found for the notification message." msg_id=714 msg_size=19
Jan 18 00:21:08.541 [INFO] reading msg..
Jan 18 00:21:08.541 [INFO] reading msg header
Jan 18 00:21:08.541 [INFO] io.ReadAtLeast start
Jan 18 00:21:11.427 [INFO] io.ReadAtLeast finish
Jan 18 00:21:11.427 [INFO] reading msg header is successful
Jan 18 00:21:11.427 [INFO] reading msg data
Jan 18 00:21:11.427 [INFO]  -- readMsg done (buffered: 0)
Jan 18 00:21:11.427 [INFO] read msg: [2 116 0 0 0 0 0 0 0 4 0 0 0 3], err: <nil>
Jan 18 00:21:11.427 [INFO] recvMsg (14) msgID=628 context=0
Jan 18 00:21:11.427 [INFO] call c.msgCallback
Jan 18 00:21:11.427 [INFO] [msgId:628] [govpp.core.request_handler] Got a message
Jan 18 00:21:11.427 [INFO] [msg:ping_finished_event] [msgId:628] [govpp.core.request_handler] Got a message
Jan 18 00:21:11.427 [INFO] [msg:ping_finished_event] [govpp.core.request_handler] Decoded the message Context
Jan 18 00:21:11.427 [INFO] [msg:ping_finished_event] [govpp.core.request_handler] Decoded the message
Jan 18 00:21:11.427 [INFO] [msg:ping_finished_event] [govpp.core.request_handler] Check if the message is a notification message
Jan 18 00:21:11.427 [INFO] reading msg..
Jan 18 00:21:11.427 [INFO] reading msg header
Jan 18 00:21:11.427 [INFO] io.ReadAtLeast start
Jan 18 00:21:11.427 [INFO] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [dstIP:172.16.1.100] [type:networkService] (7.10)         DEADDEADDEADDEADDEAD Reading Notify Channel DEADDEADDEADDEADDEAD
Jan 18 00:21:11.427 [INFO] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [dstIP:172.16.1.100] [type:networkService] (7.11)         Closing everything
Jan 18 00:21:23.218 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (18)                  ⎆ sdk-vpp/pkg/networkservice/connectioncontext/ipcontext/ipaddress/ipaddressClient.Close()
Jan 18 00:21:23.218 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (19)                   ⎆ sdk/pkg/networkservice/core/next/nextClient.Close()
Jan 18 00:21:23.218 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (20)                    ⎆ sdk-vpp/pkg/networkservice/mechanisms/memif/memifrxmode/memifrxmodeClient.Close()
Jan 18 00:21:23.219 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (21)                     ⎆ sdk-vpp/pkg/networkservice/mechanisms/memif/memifClient.Close()
Jan 18 00:21:23.219 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (22)                      ⎆ sdk/pkg/networkservice/common/mechanisms/sendfd/sendFDClient.Close()
Jan 18 00:21:23.219 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (22.1)                        close-diff={"mechanism":{"parameters":{"inodeURL":"inode://4/4026534201"}}}
Jan 18 00:21:23.219 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (23)                       ⎆ sdk/pkg/networkservice/common/excludedprefixes/excludedPrefixesClient.Close()
Jan 18 00:21:23.220 [DEBU] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [ExcludedPrefixesClient:Close] [type:networkService] (23.1)                         Excluded prefixes after closing connection: []
Jan 18 00:21:23.220 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (24)                        ⎆ sdk/pkg/networkservice/common/null/nullClient.Close()
Jan 18 00:21:23.220 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (25)                         ⎆ sdk/pkg/networkservice/common/trimpath/trimpathClient.Close()
Jan 18 00:21:23.220 [TRAC] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (26)                          ⎆ sdk/pkg/networkservice/common/connect/connectClient.Close()
Jan 18 00:21:23.221 [ERRO] [id:nsc-memif-5bd6f94f9c-mdsk6-0] [type:networkService] (26.1)                            rpc error: code = DeadlineExceeded desc = context deadline exceeded;

@NikitaSkrynnik
Copy link
Contributor Author

2. Experiment: add logs to vl_api_ip_route_add_del_t_handler in VPP

Results

There are no logs from vl_api_ip_route_add_del_t_handler at all if liveness_check is enabled. If it's disabled then all logs from vl_api_ip_route_add_del_t_handler are printed.

@denis-tingaikin
Copy link
Member

Fixed by networkservicemesh/govpp#23

@github-project-automation github-project-automation bot moved this from Under review to Done in Release v1.13.0 Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants