Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::logic_error in SQLite deleter #306

Closed
daniellowell opened this issue Jun 23, 2020 · 5 comments · Fixed by #328
Closed

std::logic_error in SQLite deleter #306

daniellowell opened this issue Jun 23, 2020 · 5 comments · Fixed by #328
Assignees

Comments

@daniellowell
Copy link
Contributor

daniellowell commented Jun 23, 2020

In "near" develop TOT this test on gfx900 (haven't tested other ASICS):

bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --enable-fdb 1 --batch_size 1 --input_channels 32 --output_channels 32 --spatial_dim_elements 28 28 --filter_dims 1 1 --pads_strides_dilations 0 0 1 1 2 2 --trans_output_pads 0 0

Causes:

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid
Aborted (core dumped)
dlowell@tengu:[reshuffleAndReduceTestCases]~/MIOpen/oclbuild$

Partial frames in GDB:

#2  0x00007ffff680f56c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*> (this=0x7fffffffdcf0, __beg=0x0, __end=0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>)
    at /usr/include/c++/7/bits/basic_string.tcc:212
212               std::__throw_logic_error(__N("basic_string::"
(gdb) list
207           _M_construct(_InIterator __beg, _InIterator __end,
208                        std::forward_iterator_tag)
209           {
210             // NB: Not required, but considered best practice.
211             if (__gnu_cxx::__is_null_pointer(__beg) && __beg != __end)
212               std::__throw_logic_error(__N("basic_string::"
213                                            "_M_construct null not valid"));
214
215             size_type __dnew = static_cast<size_type>(std::distance(__beg, __end));
216
(gdb) up
#3  0x00007ffff6ad9f63 in miopen::SQLite::impl::SQLiteCloser::operator() (this=0x555555e26720, ptr=0x555555d22128)
    at /home/dlowell/MIOpen/src/sqlite_db.cpp:59
59                  std::string filename_(sqlite3_db_filename(ptr, "main"));
(gdb)
#4  0x00007ffff6adb363 in std::unique_ptr<sqlite3, miopen::SQLite::impl::SQLiteCloser>::~unique_ptr (this=0x555555e26720,
    __in_chrg=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:263
263               get_deleter()(__ptr);
(gdb) up
#5  0x00007ffff6add812 in miopen::SQLite::impl::~impl (this=0x555555e26720, __in_chrg=<optimized out>)
    at /home/dlowell/MIOpen/src/sqlite_db.cpp:53
53      class SQLite::impl
(gdb)
#6  0x00007ffff6add838 in std::default_delete<miopen::SQLite::impl>::operator() (this=0x555555d22650, __ptr=0x555555e26720)
    at /usr/include/c++/7/bits/unique_ptr.h:78
78              delete __ptr;
(gdb)

The actual branch I used was reshuffleAndReduceTestCases

Full backtrace attached below.
bt1.txt

@daniellowell
Copy link
Contributor Author

This is a ROCm 3.5 environment. So I perhaps there is something with my environment. Let me check another machine.

@JehandadKhan
Copy link
Contributor

@dlowell can you share the CMake env ?

@daniellowell
Copy link
Contributor Author

I don’t haven’t it anymore. Please ask @asroy @zjing14 @TejashShah

@asroy
Copy link
Contributor

asroy commented Jul 2, 2020

I saw this issue once, but now I cannot see it anymore. But the hardware team saw it again today on gfx908

root@b4ad303ff32c:/MIOpen_trial# export MIOPEN_DEBUG_CONV_FFT=0
root@b4ad303ff32c:/MIOpen_trial# export MIOPEN_DEBUG_CONV_DIRECT=0
root@b4ad303ff32c:/MIOpen_trial# export MIOPEN_DEBUG_CONV_GEMM=0
root@b4ad303ff32c:/MIOpen_trial# export MIOPEN_DEBUG_CONV_SCGEMM=0
root@b4ad303ff32c:/MIOpen_trial# export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=1
root@b4ad303ff32c:/MIOpen_trial# export MIOPEN_FIND_ENFORCE=0
root@b4ad303ff32c:/MIOpen_trial# ./build/bin/MIOpenDriver conv -F 1 -n 256 -g 1 -k 1024 -c 1024 -H  14 -W  14  -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -V 0 -w 1 -t 1 -i 1
MIOpenDriver conv -F 1 -n 256 -g 1 -k 1024 -c 1024 -H 14 -W 14 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -V 0 -w 1 -t 1 -i 1
MIOpen(HIP): Error [GetFindEnforceActionImpl] Wrong MIOPEN_FIND_ENFORCE, using default.
MIOpen(HIP): Warning [Prefetch] File is unreadable: /opt/rocm/miopen/share/miopen/db/gfx90878.HIP.fdb.txt
MIOpen(HIP): Warning [SQLiteBase] Unable to read system database file:/opt/rocm/miopen/share/miopen/db/miopen.db Performance may degrade
Wall-clock Time Forward Conv. Elapsed: 6.67449 ms, Auxiliary API calls: 6032 ms (GWSS: 1079.66)
MIOpen Forward Conv. Algorithm: 5, Solution: 64/ConvHipImplicitGemmForwardV4R4Xdlops
GPU Kernel Time Forward Conv. Elapsed: 5.952481 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: fwd-conv1x1u1, 256, 1024, 14, 14, 1, 1, 1024,  105226698752, 209715200, 205520896, 17678, 70, 5.952481
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid
Aborted (core dumped)

So he commented out all the code inside the operator() function of Sqlite struct, and was able to workaround the core dump issue

@JehandadKhan
Copy link
Contributor

@asroy Thanks for the update, I will triage it further and try to reproduce it. I believe it has something to do with Debug vs Release builds. This is not high-priority since the SQLite object only gets created once and is destroyed when the library is unloaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants