Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vpp crashes when attempting to run in kubernetes Pod #13

Closed
wants to merge 3 commits into from

Conversation

artem-belov
Copy link
Contributor

va = pm->base + (((uword) vec_len (pm->pages)) << pm->def_log2_page_sz);
  if (mmap (va, size, PROT_READ | PROT_WRITE, mmap_flags, a->fd, 0) ==
      MAP_FAILED)
    {
      pm->error = clib_error_return_unix (0, "failed to mmap %u pages at %p "
					  "fd %d numa %d flags 0x%x", n_pages,
					  va, a->fd, numa_node, mmap_flags);
      goto error;
    }

  clib_memset (va, 0, size);

mmap does not fail but writing to mapped memory is causing sigbus.

/usr/bin/vpp[17659]: received signal SIGBUS, PC 0x7fd2f01d58ba
/usr/bin/vpp[17659]: #0  0x00007fd2f0c6697d 0x7fd2f0c6697d
/usr/bin/vpp[17659]: #1  0x00007fd2f07bc890 0x7fd2f07bc890
/usr/bin/vpp[17659]: #2  0x00007fd2f01d58ba 0x7fd2f01d58ba
/usr/bin/vpp[17659]: #3  0x00007fd2f054d4f2 0x7fd2f054d4f2
/usr/bin/vpp[17659]: #4  0x00007fd2f054f673 clib_pmalloc_create_shared_arena + 0x213
/usr/bin/vpp[17659]: #5  0x00007fd2f0c46cf3 vlib_physmem_shared_map_create + 0x23
/usr/bin/vpp[17659]: #6  0x00007fd2f0c1b561 vlib_buffer_main_init + 0x151
/usr/bin/vpp[17659]: #7  0x00007fd2f0c3b70a vlib_main + 0x12a
/usr/bin/vpp[17659]: #8  0x00007fd2f0c65a56 0x7fd2f0c65a56
/usr/bin/vpp[17659]: #9  0x00007fd2f053bfc4 0x7fd2f053bfc4

@edwarnicke
Copy link
Contributor

Please push to
https://gerrit.fd.io/ . The github.com is just a mirror of the Gerrit for fd.io at this time

artem-belov pushed a commit to artem-belov/vpp that referenced this pull request Apr 12, 2019
fdio-github pushed a commit that referenced this pull request Mar 19, 2021
This issue happens if:
- the API client connects via Unix socket
- the client issues the *_dump API call and immediately disconnects

What happens after is that the API handler keeps sending the *_details
messages, however at some point the write fails, and the socket is
deleted.

The attempt of a use of the registration pointer results in interpreting
the socket as a shared memory socket. This results in a crash, because
the data in this structure then does not make sense, like the below:

|
|Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault.
|__GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:67
|67      ../nptl/pthread_mutex_lock.c: No such file or directory.
|(gdb) bt
|#0  __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:67
|#1  0x00007ffff500f957 in svm_queue_lock (q=0x0) at /home/ubuntu/vpp/src/svm/queue.c:101
|#2  svm_queue_add (q=0x0, elem=0x7fffa76c2de0 "\210\365\006\060\001", nowait=0) at /home/ubuntu/vpp/src/svm/queue.c:274
|#3  0x00007ffff6e131e3 in vl_api_send_msg (rp=<optimized out>, elem=<optimized out>) at /home/ubuntu/vpp/src/vlibmemory/api.h:43
|#4  send_sw_interface_details (am=<optimized out>, rp=<optimized out>, swif=0x7fffb957a0bc, interface_name=<optimized out>, context=<optimized out>)
|    at /home/ubuntu/vpp/src/vnet/interface_api.c:353
|#5  0x00007ffff6e0edeb in vl_api_sw_interface_dump_t_handler (mp=<optimized out>) at /home/ubuntu/vpp/src/vnet/interface_api.c:412
|#6  0x00007ffff7daeb48 in msg_handler_internal (am=<optimized out>, the_msg=0x7fffb839a5e0, trace_it=<optimized out>, do_it=1, free_it=0)
|    at /home/ubuntu/vpp/src/vlibapi/api_shared.c:501
|#7  vl_msg_api_socket_handler (the_msg=0x7fffb839a5e0) at /home/ubuntu/vpp/src/vlibapi/api_shared.c:790
|#8  0x00007ffff7d7c608 in vl_socket_process_api_msg (rp=<optimized out>, input_v=0x7fffa76c2de0 "\210\365\006\060\001") at /home/ubuntu/vpp/src/vlibmemory/socket_api.c:212
|#9  0x00007ffff7d89ff1 in vl_api_clnt_process (vm=<optimized out>, node=<optimized out>, f=<optimized out>) at /home/ubuntu/vpp/src/vlibmemory/vlib_api.c:405
|#10 0x00007ffff53bf9a7 in vlib_process_bootstrap (_a=<optimized out>) at /home/ubuntu/vpp/src/vlib/main.c:1490
|#11 0x00007ffff4da0b2c in clib_calljmp () from /home/ayourtch/vpp/build-root/install-vpp-native/vpp/lib/libvppinfra.so.21.06
|#12 0x00007fffa99a4d90 in ?? ()
|#13 0x00007ffff53b6cb2 in vlib_process_startup (vm=0x7ffff56a9880 <vlib_global_main>, p=0x7fffb5d41380, f=0x0) at /home/ubuntu/vpp/src/vlib/main.c:1515
|#14 dispatch_process (vm=0x7ffff56a9880 <vlib_global_main>, p=0x7fffb5d41380, f=0x0, last_time_stamp=<optimized out>) at /home/ubuntu/vpp/src/vlib/main.c:1571
|#15 0x0000000000000000 in ?? ()
|(gdb) frame 3
|#3  0x00007ffff6e131e3 in vl_api_send_msg (rp=<optimized out>, elem=<optimized out>) at /home/ubuntu/vpp/src/vlibmemory/api.h:43
|43            vl_msg_api_send_shmem (rp->vl_input_queue, (u8 *) & elem);
|(gdb) l
|38          {
|39            vl_socket_api_send (rp, elem);
|40          }
|41        else
|42          {
|43            vl_msg_api_send_shmem (rp->vl_input_queue, (u8 *) & elem);
|44          }
|45      }
|46
|47      always_inline int
|(gdb)
|

The approach in this change is to avoid the closing operations "here and
now", but instead mark the the registration as a zombie and place
a forced RPC towards a callback that does the actual cleanup work.

Forced RPC is handled via the API processing loop with barrier sync,
so we are guaranteed not to have any API processing in-process.

Type: fix
Change-Id: I1972d42da620bdb4fd773c83262863c2781d9005
Signed-off-by: Andrew Yourtchenko <[email protected]>
fdio-github pushed a commit that referenced this pull request May 19, 2023
When I setup vpp by netvsc driver, occurs the following crash:

(format_dpdk_device_name) assertion `(i) < vec_len (dm->devices)' fails

vnet[100166]: #6  0x00007f434d651f6a _clib_error + 0x2da
vnet[100166]: #7  0x00007f430b4bef64 format_dpdk_device_name + 0xf4
vnet[100166]: #8  0x00007f434d6555f3 do_percent + 0xee3
vnet[100166]: #9  0x00007f434d654359 va_format + 0xb9
vnet[100166]: #10 0x00007f434d7ac16e vlib_log + 0x3ce
vnet[100166]: #11 0x00007f430b49ebe3 dpdk_device_start + 0x193
vnet[100166]: #12 0x00007f430b4aa233 dpdk_interface_admin_up_down + 0x163
vnet[100166]: #13 0x00007f434d988fc8 vnet_sw_interface_set_flags_helper + 0x378
vnet[100166]: #14 0x00007f434d989338 vnet_sw_interface_set_flags + 0x48

This patch fix it by device_index as a index for devices vec, and not
dpdk port_id.

Type: fix
Change-Id: I84c46616d06117c9ae3b2c7d0473050f1b8ded5f
Signed-off-by: Daniel Ding <[email protected]>
fdio-github pushed a commit that referenced this pull request Jun 19, 2024
Do not mark drop paths as imported to avoid crashes on invalid table lookup.

```
vpp[8478]: /build/Vpp2310/source/src/vnet/fib/fib_table.c:35 (fib_table_get) assertion `! pool_is_free (ip4_main.fibs, _e)' fails
 #9  0x00007ff21785da1d in _clib_error () from /lib/x86_64-linux-gnu/libvppinfra.so.23.10
 #10 0x00007ff218087698 in fib_table_get (index=4294967295, proto=FIB_PROTOCOL_IP4) at /build/Vpp2310/source/src/vnet/fib/fib_table.c:35
 #11 0x00007ff218087a37 in fib_table_lookup_exact_match (fib_index=4294967295, prefix=0x7ff0eae0d354) at /build/Vpp2310/source/src/vnet/fib/fib_table.c:100
 #12 0x00007ff2180bc938 in fib_attached_export_import (fib_entry=0x7ff0eceac3e0, export_fib=4294967295) at /build/Vpp2310/source/src/vnet/fib/fib_attached_export.c:264
 #13 0x00007ff218098ade in fib_entry_post_flag_update_actions (fib_entry=0x7ff0eceac3e0, old_flags=FIB_ENTRY_FLAG_NONE, new_fib_index=4294967295) at /build/Vpp2310/source/src/vnet/fib/fib_entry.c:624
 #14 0x00007ff218098b90 in fib_entry_post_install_actions (fib_entry=0x7ff0eceac3e0, source=FIB_SOURCE_API, old_flags=FIB_ENTRY_FLAG_NONE) at /build/Vpp2310/source/src/vnet/fib/fib_entry.c:674
 #15 0x00007ff218098cce in fib_entry_create (fib_index=1, prefix=0x7ff0d3244d80, source=FIB_SOURCE_API, flags=FIB_ENTRY_FLAG_NONE, paths=0x7ff0eac15ab8) at /build/Vpp2310/source/src/vnet/fib/fib_entry.c:712
 #16 0x00007ff218088db4 in fib_table_entry_update (fib_index=1, prefix=0x7ff0d3244d80, source=FIB_SOURCE_API, flags=FIB_ENTRY_FLAG_NONE, paths=0x7ff0eac15ab8) at /build/Vpp2310/source/src/vnet/fib/fib_table.c:799
 #17 0x00007ff2180c026c in fib_api_route_add_del (is_add=1 '\001', is_multipath=0 '\000', fib_index=1, prefix=0x7ff0d3244d80, src=FIB_SOURCE_API, entry_flags=FIB_ENTRY_FLAG_NONE, rpaths=0x7ff0eac15ab8) at /build/Vpp2310/source/src/vnet/fib/fib_api.c:485
 #18 0x00007ff217d4b6dd in ip_route_add_del_t_handler (mp=0x7ff0eb08b998, stats_index=0x7ff0d3244dc8) at /build/Vpp2310/source/src/vnet/ip/ip_api.c:718
 #19 0x00007ff217d4b986 in vl_api_ip_route_add_del_t_handler (mp=0x7ff0eb08b998) at /build/Vpp2310/source/src/vnet/ip/ip_api.c:789
```

Type: fix
Fixes: 4b08632
Signed-off-by: Dmitry Valter <[email protected]>
Change-Id: I647899533771c35f44c9ecde517a30f111b36ad9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants