Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault in release build with GCC 5. #419

Merged
merged 3 commits into from
Nov 27, 2020

Conversation

csukuangfj
Copy link
Collaborator

GCC 5 will cause segfault in the Release build.

Now all C++ tests can be run in GitHub actions for Debug build as well as Release build.

GCC 5 will cause segfault in Release build.
@qindazhu
Copy link
Collaborator

So the fault just occurs on Gihub build, we cannot reproduce it locally, with GCC 5?

@csukuangfj
Copy link
Collaborator Author

We are using GCC 7.5 on the server, so there is no segfault.

@danpovey
Copy link
Collaborator

danpovey commented Nov 27, 2020 via email

@csukuangfj
Copy link
Collaborator Author

With GCC 5,

[       OK ] FsaAlgo.AddEpsilonSelfLoopsFsa (72 ms)
[ RUN      ] FsaAlgo.ShortestPath
==13956== Invalid write of size 8
==13956==    at 0x51FFBB0: k2::Index(k2::RaggedShape&, k2::Array1<int> const&, k2::Array1<int>*) (in /home/runner/work/k2/k2/build/lib/libk2context.so)
==13956==    by 0x52009C4: k2::Transpose(k2::RaggedShape&, k2::Array1<int>*) (in /home/runner/work/k2/k2/build/lib/libk2context.so)
==13956==    by 0x519F47B: k2::GetStateBatches(k2::Ragged<k2::Arc>&, bool) (in /home/runner/work/k2/k2/build/lib/libk2context.so)
==13956==    by 0x120F60: k2::FsaAlgo_ShortestPath_Test::TestBody() (in /home/runner/work/k2/k2/build/bin/cu_fsa_algo_test)
==13956==    by 0x4D0F7882: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/runner/work/k2/k2/build/lib/libgtest.so)
==13956==    by 0x4D0E3512: testing::Test::Run() (in /home/runner/work/k2/k2/build/lib/libgtest.so)
==13956==    by 0x4D0E366C: testing::TestInfo::Run() (in /home/runner/work/k2/k2/build/lib/libgtest.so)
==13956==    by 0x4D0E3794: testing::TestSuite::Run() (in /home/runner/work/k2/k2/build/lib/libgtest.so)
==13956==    by 0x4D0EF2EB: testing::internal::UnitTestImpl::RunAllTests() (in /home/runner/work/k2/k2/build/lib/libgtest.so)
==13956==    by 0x4D0EF56D: testing::UnitTest::Run() (in /home/runner/work/k2/k2/build/lib/libgtest.so)
==13956==    by 0x4E3E87F: main (in /home/runner/work/k2/k2/build/lib/libgtest_main.so)

For example, for this commit csukuangfj@924b7c1,
the stack strace is available at https://github.com/csukuangfj/k2/runs/1458396149#step:11:336

[ RUN      ] FsaAlgo.ShortestPath
==14105== Invalid write of size 8
==14105==    at 0x51FA4B0: k2::Index(k2::RaggedShape&, k2::Array1<int> const&, k2::Array1<int>*) (in /home/runner/work/k2/k2/build/lib/libk2context.so)
==14105==    by 0x51FB0F4: k2::Transpose(k2::RaggedShape&, k2::Array1<int>*) (in /home/runner/work/k2/k2/build/lib/libk2context.so)
==14105==    by 0x5199FEB: k2::GetStateBatches(k2::Ragged<k2::Arc>&, bool) (in /home/runner/work/k2/k2/build/lib/libk2context.so)
==14105==    by 0x120F60: k2::FsaAlgo_ShortestPath_Test::TestBody() (in /home/runner/work/k2/k2/build/bin/cu_fsa_algo_test)

@csukuangfj
Copy link
Collaborator Author

You can see that the stack traces are the same before and after replacing short lambdas.

@csukuangfj
Copy link
Collaborator Author

So the fault just occurs on Gihub build, we cannot reproduce it locally, with GCC 5?

I want to setup a docker image to install GCC5 so that we can do some experiments locally.

@qindazhu
Copy link
Collaborator

I use gcc 5.5 on our server, IIRC, I can repro the fault before (when we find this issue).

➜  ~ gcc --version
gcc (Homebrew GCC 5.5.0_7) 5.5.0

@csukuangfj
Copy link
Collaborator Author

I use gcc 5.5 on our server, IIRC, I can repro the fault before (when we find this issue).

It can be reproduced on our server with the latest master branch using GCC 5.5.0

# checkout the latest master
cd build
export PATH=/home/linuxbrew/.linuxbrew/bin:$PATH
make clean
rm CMakeCache.txt
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j30 cu_fsa_algo_test
./bin/cu_fsa_algo_test  # you may need to run it for several times to see the segfault. It does NOT always happen.

@csukuangfj
Copy link
Collaborator Author

Here is the output from our server

fangjun:~/open-source/k2/build$ ./bin/cu_fsa_algo_test
Running main() from /root/fangjun/open-source/k2/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 15 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 3 tests from ArcSort
[ RUN      ] ArcSort.EmptyFsa
[       OK ] ArcSort.EmptyFsa (0 ms)
[ RUN      ] ArcSort.NonEmptyFsa
[       OK ] ArcSort.NonEmptyFsa (4242 ms)
[ RUN      ] ArcSort.NonEmptyFsaVec
[       OK ] ArcSort.NonEmptyFsaVec (3 ms)
[----------] 3 tests from ArcSort (4245 ms total)

[----------] 12 tests from FsaAlgo
[ RUN      ] FsaAlgo.LinearFsa
[       OK ] FsaAlgo.LinearFsa (0 ms)
[ RUN      ] FsaAlgo.LinearFsaVec
[       OK ] FsaAlgo.LinearFsaVec (0 ms)
[ RUN      ] FsaAlgo.IntersectFsaVec
[       OK ] FsaAlgo.IntersectFsaVec (0 ms)
[ RUN      ] FsaAlgo.AddEpsilonSelfLoopsFsa
[I] /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:virtual void k2::FsaAlgo_AddEpsilonSelfLoopsFsa_Test::TestBody():283 fsa1 = [ [ 0 1 1 0.1 0 2 1 0.2 ] [ 1 3 2 0.3 ] [ 2 3 3 0.4 ] [ 3 4 -1 0.5 ] [ ] ], fsa1+self-loops = [ [ 0 0 0 0 0 1 1 0.1 0 2 1 0.2 ] [ 1 1 0 0 1 3 2 0.3 ] [ 2 2 0 0 2 3 3 0.4 ] [ 3 3 0 0 3 4 -1 0.5 ] [ ] ], arc-map = [ -1 0 1 -1 2 -1 3 -1 4 ]
[I] /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:virtual void k2::FsaAlgo_AddEpsilonSelfLoopsFsa_Test::TestBody():283 fsa1 = [ [ ] ], fsa1+self-loops = [ [ ] ], arc-map = [ ]
[I] /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:virtual void k2::FsaAlgo_AddEpsilonSelfLoopsFsa_Test::TestBody():283 fsa1 = [ [ ] [ [ 0 1 1 0.1 0 2 1 0.2 ] [ 1 3 2 0.3 ] [ 2 3 3 0.4 ] [ 3 4 -1 0.5 ] [ ] ] ], fsa1+self-loops = [ [ ] [ [ 0 0 0 0 0 1 1 0.1 0 2 1 0.2 ] [ 1 1 0 0 1 3 2 0.3 ] [ 2 2 0 0 2 3 3 0.4 ] [ 3 3 0 0 3 4 -1 0.5 ] [ ] ] ], arc-map = [ -1 0 1 -1 2 -1 3 -1 4 ]
[I] /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:virtual void k2::FsaAlgo_AddEpsilonSelfLoopsFsa_Test::TestBody():283 fsa1 = [ [ 0 1 1 0.1 0 2 1 0.2 ] [ 1 3 2 0.3 ] [ 2 3 3 0.4 ] [ 3 4 -1 0.5 ] [ ] ], fsa1+self-loops = [ [ 0 0 0 0 0 1 1 0.1 0 2 1 0.2 ] [ 1 1 0 0 1 3 2 0.3 ] [ 2 2 0 0 2 3 3 0.4 ] [ 3 3 0 0 3 4 -1 0.5 ] [ ] ], arc-map = [ -1 0 1 -1 2 -1 3 -1 4 ]
[I] /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:virtual void k2::FsaAlgo_AddEpsilonSelfLoopsFsa_Test::TestBody():283 fsa1 = [ [ ] ], fsa1+self-loops = [ [ ] ], arc-map = [ ]
[I] /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:virtual void k2::FsaAlgo_AddEpsilonSelfLoopsFsa_Test::TestBody():283 fsa1 = [ [ ] [ [ 0 1 1 0.1 0 2 1 0.2 ] [ 1 3 2 0.3 ] [ 2 3 3 0.4 ] [ 3 4 -1 0.5 ] [ ] ] ], fsa1+self-loops = [ [ ] [ [ 0 0 0 0 0 1 1 0.1 0 2 1 0.2 ] [ 1 1 0 0 1 3 2 0.3 ] [ 2 2 0 0 2 3 3 0.4 ] [ 3 3 0 0 3 4 -1 0.5 ] [ ] ] ], arc-map = [ -1 0 1 -1 2 -1 3 -1 4 ]
[       OK ] FsaAlgo.AddEpsilonSelfLoopsFsa (2 ms)
[ RUN      ] FsaAlgo.ShortestPath
Segmentation fault
fangjun:~/open-source/k2/build$

@danpovey
Copy link
Collaborator

danpovey commented Nov 27, 2020 via email

@csukuangfj
Copy link
Collaborator Author

Here is the stack trace from gdb

fangjun:~/open-source/k2/build$ gdb ./bin/cu_fsa_algo_test
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./bin/cu_fsa_algo_test...(no debugging symbols found)...done.
(gdb) r
Starting program: /root/fangjun/open-source/k2/build/bin/cu_fsa_algo_test
warning: File "/home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/lib/libstdc++.so.6.0.21-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/lib/libstdc++.so.6.0.21-gdb.py
line to your configuration file "/root/fangjun/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/root/fangjun/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Running main() from /root/fangjun/open-source/k2/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 15 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 3 tests from ArcSort
[ RUN      ] ArcSort.EmptyFsa
[       OK ] ArcSort.EmptyFsa (0 ms)
[ RUN      ] ArcSort.NonEmptyFsa
[New Thread 0x7fffa52ef700 (LWP 1332983)]                                                                                                                                                                   
[New Thread 0x7fffa4aee700 (LWP 1332984)]                                                                                                                                                                   
[       OK ] ArcSort.NonEmptyFsa (4364 ms)                                                                                                                                                                  
[ RUN      ] ArcSort.NonEmptyFsaVec
[       OK ] ArcSort.NonEmptyFsaVec (2 ms)
[----------] 3 tests from ArcSort (4367 ms total)

[----------] 12 tests from FsaAlgo
[ RUN      ] FsaAlgo.LinearFsa
[       OK ] FsaAlgo.LinearFsa (1 ms)
[ RUN      ] FsaAlgo.LinearFsaVec
[       OK ] FsaAlgo.LinearFsaVec (0 ms)
[ RUN      ] FsaAlgo.IntersectFsaVec
[       OK ] FsaAlgo.IntersectFsaVec (0 ms)
[ RUN      ] FsaAlgo.AddEpsilonSelfLoopsFsa
[I] /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:virtual void k2::FsaAlgo_AddEpsilonSelfLoopsFsa_Test::TestBody():283 fsa1 = [ [ 0 1 1 0.1 0 2 1 0.2 ] [ 1 3 2 0.3 ] [ 2 3 3 0.4 ] [ 3 4 -1 0.5 ] $
 ] ], fsa1+self-loops = [ [ 0 0 0 0 0 1 1 0.1 0 2 1 0.2 ] [ 1 1 0 0 1 3 2 0.3 ] [ 2 2 0 0 2 3 3 0.4 ] [ 3 3 0 0 3 4 -1 0.5 ] [ ] ], arc-map = [ -1 0 1 -1 2 -1 3 -1 4 ]
[I] /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:virtual void k2::FsaAlgo_AddEpsilonSelfLoopsFsa_Test::TestBody():283 fsa1 = [ [ ] ], fsa1+self-loops = [ [ ] ], arc-map = [ ]
[I] /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:virtual void k2::FsaAlgo_AddEpsilonSelfLoopsFsa_Test::TestBody():283 fsa1 = [ [ ] [ [ 0 1 1 0.1 0 2 1 0.2 ] [ 1 3 2 0.3 ] [ 2 3 3 0.4 ] [ 3 4 -1 $
.5 ] [ ] ] ], fsa1+self-loops = [ [ ] [ [ 0 0 0 0 0 1 1 0.1 0 2 1 0.2 ] [ 1 1 0 0 1 3 2 0.3 ] [ 2 2 0 0 2 3 3 0.4 ] [ 3 3 0 0 3 4 -1 0.5 ] [ ] ] ], arc-map = [ -1 0 1 -1 2 -1 3 -1 4 ]
[I] /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:virtual void k2::FsaAlgo_AddEpsilonSelfLoopsFsa_Test::TestBody():283 fsa1 = [ [ 0 1 1 0.1 0 2 1 0.2 ] [ 1 3 2 0.3 ] [ 2 3 3 0.4 ] [ 3 4 -1 0.5 ] $
 ] ], fsa1+self-loops = [ [ 0 0 0 0 0 1 1 0.1 0 2 1 0.2 ] [ 1 1 0 0 1 3 2 0.3 ] [ 2 2 0 0 2 3 3 0.4 ] [ 3 3 0 0 3 4 -1 0.5 ] [ ] ], arc-map = [ -1 0 1 -1 2 -1 3 -1 4 ]
[I] /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:virtual void k2::FsaAlgo_AddEpsilonSelfLoopsFsa_Test::TestBody():283 fsa1 = [ [ ] ], fsa1+self-loops = [ [ ] ], arc-map = [ ]
[I] /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:virtual void k2::FsaAlgo_AddEpsilonSelfLoopsFsa_Test::TestBody():283 fsa1 = [ [ ] [ [ 0 1 1 0.1 0 2 1 0.2 ] [ 1 3 2 0.3 ] [ 2 3 3 0.4 ] [ 3 4 -1 $
.5 ] [ ] ] ], fsa1+self-loops = [ [ ] [ [ 0 0 0 0 0 1 1 0.1 0 2 1 0.2 ] [ 1 1 0 0 1 3 2 0.3 ] [ 2 2 0 0 2 3 3 0.4 ] [ 3 3 0 0 3 4 -1 0.5 ] [ ] ] ], arc-map = [ -1 0 1 -1 2 -1 3 -1 4 ]
[       OK ] FsaAlgo.AddEpsilonSelfLoopsFsa (1 ms)
[ RUN      ] FsaAlgo.ShortestPath
Thread 1 "cu_fsa_algo_tes" received signal SIGSEGV, Segmentation fault.
0x00007ffff75f3f05 in std::_Sp_counted_ptr_inplace<k2::Region, std::allocator<k2::Region>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /root/fangjun/open-source/k2/build/lib/libk2context.so
(gdb) bt
#0  0x00007ffff75f3f05 in std::_Sp_counted_ptr_inplace<k2::Region, std::allocator<k2::Region>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /root/fangjun/open-source/k2/build/lib/libk2context.so
#1  0x000000000041d11a in std::vector<k2::RaggedShapeDim, std::allocator<k2::RaggedShapeDim> >::~vector() ()
#2  0x00007ffff7635491 in k2::GetStateBatches(k2::Ragged<k2::Arc>&, bool) () from /root/fangjun/open-source/k2/build/lib/libk2context.so
#3  0x0000000000417b6f in k2::FsaAlgo_ShortestPath_Test::TestBody() ()
#4  0x00007ffff7ec48b3 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
   from /root/fangjun/open-source/k2/build/lib/libgtest.so
#5  0x00007ffff7eb16e3 in testing::Test::Run() () from /root/fangjun/open-source/k2/build/lib/libgtest.so
#6  0x00007ffff7eb183d in testing::TestInfo::Run() () from /root/fangjun/open-source/k2/build/lib/libgtest.so
#7  0x00007ffff7eb1935 in testing::TestSuite::Run() () from /root/fangjun/open-source/k2/build/lib/libgtest.so
#8  0x00007ffff7ebcb1c in testing::internal::UnitTestImpl::RunAllTests() () from /root/fangjun/open-source/k2/build/lib/libgtest.so
#9  0x00007ffff7ebcd91 in testing::UnitTest::Run() () from /root/fangjun/open-source/k2/build/lib/libgtest.so
#10 0x00007ffff7ff20db in main () from /root/fangjun/open-source/k2/build/lib/libgtest_main.so
#11 0x00007fffaf6eeb97 in __libc_start_main (main=0x7ffff7ff20a0 <main>, argc=1, argv=0x7fffffffe9b8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe9a8)
    at ../csu/libc-start.c:310
#12 0x000000000040855a in _start ()

@danpovey
Copy link
Collaborator

danpovey commented Nov 27, 2020 via email

@csukuangfj
Copy link
Collaborator Author

I am adding -g to CMAKE_CXX_FLAGS and recompiling it.
Will add valgrind output soon.

@csukuangfj
Copy link
Collaborator Author

Output from valgrind


fangjun:~/open-source/k2/build$ valgrind ./bin/cu_fsa_algo_test
................
[       OK ] FsaAlgo.AddEpsilonSelfLoopsFsa (111 ms)
[ RUN      ] FsaAlgo.ShortestPath
==1335287== Invalid write of size 8
==1335287==    at 0x4FEBF28: construct<CUstream_st*, CUstream_st* const&> (stl_vector.h:923)
==1335287==    by 0x4FEBF28: construct<CUstream_st*, CUstream_st* const&> (alloc_traits.h:530)
==1335287==    by 0x4FEBF28: push_back (stl_vector.h:917)
==1335287==    by 0x4FEBF28: Push (context.h:378)
==1335287==    by 0x4FEBF28: With (context.h:398)
==1335287==    by 0x4FEBF28: k2::Index(k2::RaggedShape&, k2::Array1<int> const&, k2::Array1<int>*) (ragged_ops.cu:447)
==1335287==    by 0x4FECB0E: k2::Transpose(k2::RaggedShape&, k2::Array1<int>*) (ragged_ops.cu:902)
==1335287==    by 0x4F97403: Transpose<int> (ragged_ops.h:306)
==1335287==    by 0x4F97403: k2::GetStateBatches(k2::Ragged<k2::Arc>&, bool) (fsa_utils.cu:801)
==1335287==    by 0x417B6E: k2::FsaAlgo_ShortestPath_Test::TestBody() (fsa_algo_test.cu:343)
==1335287==    by 0x41898B2: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /root/fangjun/open-source/k2/build
/lib/libgtest.so)
==1335287==    by 0x41766E2: testing::Test::Run() (in /root/fangjun/open-source/k2/build/lib/libgtest.so)
==1335287==    by 0x417683C: testing::TestInfo::Run() (in /root/fangjun/open-source/k2/build/lib/libgtest.so)
==1335287==    by 0x4176934: testing::TestSuite::Run() (in /root/fangjun/open-source/k2/build/lib/libgtest.so)
==1335287==    by 0x4181B1B: testing::internal::UnitTestImpl::RunAllTests() (in /root/fangjun/open-source/k2/build/lib/libgtest.so)
==1335287==    by 0x4181D90: testing::UnitTest::Run() (in /root/fangjun/open-source/k2/build/lib/libgtest.so)
==1335287==    by 0x402A0DA: main (in /root/fangjun/open-source/k2/build/lib/libgtest_main.so)
==1335287==  Address 0x144c5ab10 is 32 bytes before a block of size 16 in arena "client"
==1335287==

@qindazhu
Copy link
Collaborator

Seems those crashes from latest code in ParallellRunner?

@danpovey
Copy link
Collaborator

danpovey commented Nov 27, 2020 via email

@danpovey
Copy link
Collaborator

danpovey commented Nov 27, 2020 via email

@csukuangfj
Copy link
Collaborator Author

The following stack trace is more informative, which is produced with -g while compiling k2

(gdb) bt
#0  0x00007ffff75f3f05 in k2::Region::~Region (this=0x1fdf510, __in_chrg=<optimized out>) at /root/fangjun/open-source/k2/k2/csrc/context.h:315
#1  __gnu_cxx::new_allocator<k2::Region>::destroy<k2::Region> (this=<optimized out>, __p=<optimized out>) at /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/include/c++/5.5.0/ext/new_allocator.h:124
#2  std::allocator_traits<std::allocator<k2::Region> >::destroy<k2::Region> (__a=..., __p=<optimized out>) at /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/include/c++/5.5.0/bits/alloc_traits.h:542
#3  std::_Sp_counted_ptr_inplace<k2::Region, std::allocator<k2::Region>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x1fdf500)
    at /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/include/c++/5.5.0/bits/shared_ptr_base.h:531
#4  0x000000000041d11a in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x1fdf500) at /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/include/c++/5.5.0/bits/shared_ptr_base.h:150
#5  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x760d8378, __in_chrg=<optimized out>)
    at /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/include/c++/5.5.0/bits/shared_ptr_base.h:659
#6  std::__shared_ptr<k2::Region, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x760d8370, __in_chrg=<optimized out>)
    at /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/include/c++/5.5.0/bits/shared_ptr_base.h:925
#7  std::shared_ptr<k2::Region>::~shared_ptr (this=0x760d8370, __in_chrg=<optimized out>) at /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/include/c++/5.5.0/bits/shared_ptr.h:93
#8  k2::Array1<int>::~Array1 (this=0x760d8360, __in_chrg=<optimized out>) at /root/fangjun/open-source/k2/k2/csrc/array.h:37
#9  k2::RaggedShapeDim::~RaggedShapeDim (this=0x760d8340, __in_chrg=<optimized out>) at /root/fangjun/open-source/k2/k2/csrc/ragged.h:33
#10 std::_Destroy<k2::RaggedShapeDim> (__pointer=<optimized out>) at /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/include/c++/5.5.0/bits/stl_construct.h:93
#11 std::_Destroy_aux<false>::__destroy<k2::RaggedShapeDim*> (__last=<optimized out>, __first=0x760d8340) at /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/include/c++/5.5.0/bits/stl_construct.h:103
#12 std::_Destroy<k2::RaggedShapeDim*> (__last=<optimized out>, __first=<optimized out>) at /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/include/c++/5.5.0/bits/stl_construct.h:126
#13 std::_Destroy<k2::RaggedShapeDim*, k2::RaggedShapeDim> (__last=0x760d83d0, __first=<optimized out>) at /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/include/c++/5.5.0/bits/stl_construct.h:151
#14 std::vector<k2::RaggedShapeDim, std::allocator<k2::RaggedShapeDim> >::~vector (this=0x7fffffffdd80, __in_chrg=<optimized out>)
    at /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_7/include/c++/5.5.0/bits/stl_vector.h:424
#15 0x00007ffff7635491 in k2::RaggedShape::~RaggedShape (this=0x7fffffffdd80, __in_chrg=<optimized out>) at /root/fangjun/open-source/k2/k2/csrc/ragged.h:62
#16 k2::GetStateBatches (fsas=..., transpose=transpose@entry=true) at /root/fangjun/open-source/k2/k2/csrc/fsa_utils.cu:794
#17 0x0000000000417b6f in k2::FsaAlgo_ShortestPath_Test::TestBody (this=<optimized out>) at /root/fangjun/open-source/k2/k2/csrc/fsa_algo_test.cu:343
#18 0x00007ffff7ec48b3 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
   from /root/fangjun/open-source/k2/build/lib/libgtest.so
#19 0x00007ffff7eb16e3 in testing::Test::Run() () from /root/fangjun/open-source/k2/build/lib/libgtest.so
#20 0x00007ffff7eb183d in testing::TestInfo::Run() () from /root/fangjun/open-source/k2/build/lib/libgtest.so
#21 0x00007ffff7eb1935 in testing::TestSuite::Run() () from /root/fangjun/open-source/k2/build/lib/libgtest.so
#22 0x00007ffff7ebcb1c in testing::internal::UnitTestImpl::RunAllTests() () from /root/fangjun/open-source/k2/build/lib/libgtest.so
#23 0x00007ffff7ebcd91 in testing::UnitTest::Run() () from /root/fangjun/open-source/k2/build/lib/libgtest.so
#24 0x00007ffff7ff20db in main () from /root/fangjun/open-source/k2/build/lib/libgtest_main.so
#25 0x00007fffaf6eeb97 in __libc_start_main (main=0x7ffff7ff20a0 <main>, argc=1, argv=0x7fffffffe948, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe938)
    at ../csu/libc-start.c:310
#26 0x000000000040855a in _start ()

@danpovey
Copy link
Collaborator

danpovey commented Nov 27, 2020 via email

@danpovey
Copy link
Collaborator

danpovey commented Nov 27, 2020 via email

@csukuangfj
Copy link
Collaborator Author

The segfault is caused by

k2/k2/csrc/context.h

Lines 381 to 384 in 57a3bc6

void Pop(cudaStream_t stream) {
K2_DCHECK(!stack_.empty());
K2_DCHECK_EQ(stack_.back(), stream);
stack_.pop_back();

The debug checks are ignored in the release build, so pop_back is illegal.

You can set

k2/k2/csrc/log.h

Lines 35 to 37 in 57a3bc6

#if defined(NDEBUG)
constexpr bool kDisableDebug = true;
#else

to

constexpr bool kDisableDebug = false; 

to see the error log.

@qindazhu
Copy link
Collaborator

There must be something wrong (maybe related with calling code), we suppose it will never be empty according to the way we call With and ParallelRunner

@qindazhu
Copy link
Collaborator

BTW, @csukuangfj does using extern for g_stream_override in context.h and declare in context.cu fix the problem?

@csukuangfj
Copy link
Collaborator Author

BTW, @csukuangfj does using extern for g_stream_override in context.h and declare in context.cu fix the problem?

I am trying it.

@csukuangfj
Copy link
Collaborator Author

The segfault remains with extern thread_local:

diff --git a/k2/csrc/context.cu b/k2/csrc/context.cu
index 86b5fb0..267025d 100644
--- a/k2/csrc/context.cu
+++ b/k2/csrc/context.cu
@@ -15,6 +15,8 @@

 namespace k2 {

+thread_local CudaStreamOverride g_stream_override;
+
 RegionPtr NewRegion(ContextPtr context, std::size_t num_bytes) {
   // .. fairly straightforward.  Sets bytes_used to num_bytes, caller can
   // overwrite if needed.
diff --git a/k2/csrc/context.h b/k2/csrc/context.h
index 5a77228..9410770 100644
--- a/k2/csrc/context.h
+++ b/k2/csrc/context.h
@@ -390,7 +390,7 @@ class CudaStreamOverride {
   std::vector<cudaStream_t> stack_;
 };

-static thread_local CudaStreamOverride g_stream_override;
+extern thread_local CudaStreamOverride g_stream_override;

 class With {
  public:
diff --git a/k2/csrc/log.h b/k2/csrc/log.h
index 25099c0..1ef2f65 100644
--- a/k2/csrc/log.h
+++ b/k2/csrc/log.h
@@ -33,7 +33,7 @@ namespace k2 {
 namespace internal {

 #if defined(NDEBUG)
-constexpr bool kDisableDebug = true;
+constexpr bool kDisableDebug = false;
 #else
 constexpr bool kDisableDebug = false;
 #endif

And the output is

fangjun:~/open-source/k2/build$ ./bin/cu_fsa_algo_test
Running main() from /root/fangjun/open-source/k2/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 15 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 3 tests from ArcSort
[ RUN      ] ArcSort.EmptyFsa
[       OK ] ArcSort.EmptyFsa (0 ms)
[ RUN      ] ArcSort.NonEmptyFsa
[F] /root/fangjun/open-source/k2/k2/csrc/eval.h:void k2::EvalDevice(cudaStream_t, int32_t, LambdaT&) [with LambdaT = __nv_dl_wrapper_t<__nv_dl_tag<void (k2::Array1<int>::*)(int), &k2::Array1<int>::operator=, 1u>, int*, const int>; cudaStream_t = CUstream_st*; int32_t = int]:139 Check failed: stream != kCudaStreamInvalid


[ Stack-Trace: ]
/root/fangjun/open-source/k2/build/lib/libk2_log.so(k2::internal::GetStackTrace()+0x39) [0x7f27cbf6ffa9]
./bin/cu_fsa_algo_test(k2::internal::Logger::~Logger()+0x28) [0x41c1a8]
/root/fangjun/open-source/k2/build/lib/libk2context.so(void k2::EvalDevice<__nv_dl_wrapper_t<__nv_dl_tag<void (k2::Array1<int>::*)(int), &k2::Array1<int>::operator=, 1u>, int*, int const> >(CUstream_st*,
int, __nv_dl_wrapper_t<__nv_dl_tag<void (k2::Array1<int>::*)(int), &k2::Array1<int>::operator=, 1u>, int*, int const>&)+0x697) [0x7f27cb61d557]
/root/fangjun/open-source/k2/build/lib/libk2context.so(k2::RaggedShape::Validate(bool) const+0x72d) [0x7f27cb6e29dd]
/root/fangjun/open-source/k2/build/lib/libk2context.so(k2::RaggedShape::To(std::shared_ptr<k2::Context>) const+0x6a1) [0x7f27cb6e4211]
./bin/cu_fsa_algo_test(k2::Ragged<k2::Arc>::To(std::shared_ptr<k2::Context>) const+0x53) [0x4206a3]
./bin/cu_fsa_algo_test() [0x40e91a]
/root/fangjun/open-source/k2/build/lib/libgtest.so(void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x33) [0x7f27cbf418b3]
/root/fangjun/open-source/k2/build/lib/libgtest.so(testing::Test::Run()+0xc3) [0x7f27cbf2e6e3]
/root/fangjun/open-source/k2/build/lib/libgtest.so(testing::TestInfo::Run()+0x12d) [0x7f27cbf2e83d]
/root/fangjun/open-source/k2/build/lib/libgtest.so(testing::TestSuite::Run()+0xc5) [0x7f27cbf2e935]
/root/fangjun/open-source/k2/build/lib/libgtest.so(testing::internal::UnitTestImpl::RunAllTests()+0x3dc) [0x7f27cbf39b1c]
/root/fangjun/open-source/k2/build/lib/libgtest.so(testing::UnitTest::Run()+0x81) [0x7f27cbf39d91]
/root/fangjun/open-source/k2/build/lib/libgtest_main.so(main+0x3b) [0x7f27cc0700db]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f2783756b97]
./bin/cu_fsa_algo_test() [0x40858a]

Aborted
fangjun:~/open-source/k2/build$

@danpovey
Copy link
Collaborator

danpovey commented Nov 27, 2020 via email

@danpovey
Copy link
Collaborator

danpovey commented Nov 27, 2020 via email

@csukuangfj
Copy link
Collaborator Author

After adding log statements to the constructor and destructor, the segfault is gone! The binary is run for multiple times; there
is no segfault anymore.

As the segfault disappears for higher versions of GCC, so I think there is no problem with the code.

diff --git a/k2/csrc/context.h b/k2/csrc/context.h
index 5a77228..2ca9e17 100644
--- a/k2/csrc/context.h
+++ b/k2/csrc/context.h
@@ -377,14 +377,19 @@ class CudaStreamOverride {
   void Push(cudaStream_t stream) {
     stack_.push_back(stream);
     stream_override_ = stream;
+    K2_LOG(INFO) << "push: size: " << stack_.size();
   }
   void Pop(cudaStream_t stream) {
+    K2_LOG(INFO) << "pop: size: " << stack_.size();
     K2_DCHECK(!stack_.empty());
     K2_DCHECK_EQ(stack_.back(), stream);
     stack_.pop_back();
   }
 
-  CudaStreamOverride() : stream_override_(0x0) {}
+  CudaStreamOverride() : stream_override_(0x0) {
+    K2_LOG(INFO) << "constructor";
+  }
+  ~CudaStreamOverride() { K2_LOG(INFO) << "in destructor"; }
 
   cudaStream_t stream_override_;
   std::vector<cudaStream_t> stack_;
diff --git a/k2/csrc/log.h b/k2/csrc/log.h
index 25099c0..1ef2f65 100644
--- a/k2/csrc/log.h
+++ b/k2/csrc/log.h
@@ -33,7 +33,7 @@ namespace k2 {
 namespace internal {
 
 #if defined(NDEBUG)
-constexpr bool kDisableDebug = true;
+constexpr bool kDisableDebug = false;
 #else
 constexpr bool kDisableDebug = false;
 #endif

@qindazhu
Copy link
Collaborator

qindazhu commented Nov 27, 2020

I don't know the reason, but it seems when I add LOG in push and pop, the crash disappears

--- a/k2/csrc/context.h
+++ b/k2/csrc/context.h
@@ -375,10 +375,12 @@ class CudaStreamOverride {
       return stream;
   }
   void Push(cudaStream_t stream) {
+    K2_LOG(INFO) << "Push";
     stack_.push_back(stream);
     stream_override_ = stream;
   }
   void Pop(cudaStream_t stream) {
+    K2_LOG(INFO) << "Pop";
     K2_DCHECK(!stack_.empty());
     K2_DCHECK_EQ(stack_.back(), stream);
     stack_.pop_back();

Not sure if it can help to find the root reason. (@csukuangfj wondering if you can repro this or not, I have run for more than 10 times)

@danpovey
Copy link
Collaborator

danpovey commented Nov 27, 2020 via email

@csukuangfj
Copy link
Collaborator Author

Not sure if it helps to find the root reason. (@csukuangfj wondering if you can repro this or not, I have run for more than 10 times)

Yes, please see the comments I posted just before yours.

@csukuangfj
Copy link
Collaborator Author

I guess it has something to do with inline optimization of the compiler.

The following modification will prevent segfault (The segfault does not appear after more than 10 runs of the binary)

diff --git a/k2/csrc/context.h b/k2/csrc/context.h
index 5a77228..8a91424 100644
--- a/k2/csrc/context.h
+++ b/k2/csrc/context.h
@@ -374,11 +374,11 @@ class CudaStreamOverride {
     else
       return stream;
   }
-  void Push(cudaStream_t stream) {
+  __attribute__((noinline)) void Push(cudaStream_t stream) {
     stack_.push_back(stream);
     stream_override_ = stream;
   }
-  void Pop(cudaStream_t stream) {
+  __attribute__((noinline)) void Pop(cudaStream_t stream) {
     K2_DCHECK(!stack_.empty());
     K2_DCHECK_EQ(stack_.back(), stream);
     stack_.pop_back();

@csukuangfj
Copy link
Collaborator Author

__attribute__((noinline)) is specific to GCC, I will switch to [[gnu::noinline]], which is supported by the C++11 standard.

@danpovey
Copy link
Collaborator

danpovey commented Nov 27, 2020 via email

@qindazhu
Copy link
Collaborator

Moving the implementation of Pop and Push to context.cu would be fine as well. Have run 10+ times

Disable inline optimization of `CudaStreamOverride`.
@danpovey
Copy link
Collaborator

danpovey commented Nov 27, 2020 via email

@csukuangfj csukuangfj changed the title Use GCC 6 and GCC 7 in GitHub actions to prevent segfault. Fix segfault in release build with GCC 5. Nov 27, 2020
@csukuangfj
Copy link
Collaborator Author

Moving the implementation of Pop and Push to context.cu would be fine as well. Have run 10+ times

Thanks, will move them to context.cu.

It prevents the compiler from inlining them.
@qindazhu
Copy link
Collaborator

+2

@csukuangfj
Copy link
Collaborator Author

Let's wait and see if GitHub actions will segfault or not.

@csukuangfj
Copy link
Collaborator Author

Tests are passed! Merging

@csukuangfj csukuangfj merged commit d376902 into k2-fsa:master Nov 27, 2020
@csukuangfj csukuangfj deleted the fangjun-upgrade-gcc branch November 27, 2020 11:35
@kkm000
Copy link

kkm000 commented Dec 5, 2020

  1. Looks like an ODR violation. Requires more looking into. Gimme a day.

  2. Putting the LOG statement simply prevents inlining in this particular compiler.

[[gnu::noinline]] ... is supported by the C++11 standard.

It would be correct to say "allowed by the standard". The standard does not define what the scoped (gnu::) directives do, they are left up to the compiler.

  1. Always compile with -g. It does not affect generated code in the slightest, only retains symbol information in the binary, so that your stack trace is more sensible than without it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants