Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when loading model from XGBoost binary format #210

Closed
wphicks opened this issue Sep 30, 2020 · 8 comments
Closed

Segfault when loading model from XGBoost binary format #210

wphicks opened this issue Sep 30, 2020 · 8 comments

Comments

@wphicks
Copy link
Contributor

wphicks commented Sep 30, 2020

Treelite segfaults when attempting to load a model from XGBoost binary format, as demonstrated here with this test case. I'm working on tracking down the exact circumstances under which this segfault is encountered and will update as I discover more.

@wphicks
Copy link
Contributor Author

wphicks commented Sep 30, 2020

Should have posted this here rather than on the PR:

Backtrace:

    at /home/nwani/m3/conda-bld/compilers_linux-64_1560109574129/work/.build/x86_64-conda_cos6-linux-gnu/build/build-cc-gcc-final/x86_64-conda_cos6-linux-gnu/libstdc++-v3/include/bits/basic_string.h:3901
#1  std::string::append (this=0x7fffffff6cb0, __str=...)
    at /home/nwani/m3/conda-bld/compilers_linux-64_1560109574129/work/.build/x86_64-conda_cos6-linux-gnu/build/build-cc-gcc-final/x86_64-conda_cos6-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:777
#2  0x00007fffcac0f16a in dmlc::io::LocalFileSystem::Open(dmlc::io::URI const&, char const*, bool) () from /home/whicks/anaconda3/envs/treelite_test/lib/python3.7/site-packages/xgboost/lib/libxgboost.so
#3  0x00007fffe58ff7cb in dmlc::Stream::Create(char const*, char const*, bool) () from /home/whicks/anaconda3/envs/treelite_test/lib/python3.7/site-packages/treelite/lib/libtreelite.so
#4  0x00007fffe58ba7b7 in treelite::frontend::LoadXGBoostModel(char const*, treelite::Model*) () from /home/whicks/anaconda3/envs/treelite_test/lib/python3.7/site-packages/treelite/lib/libtreelite.so
#5  0x00007fffe5855015 in TreeliteLoadXGBoostModel () from /home/whicks/anaconda3/envs/treelite_test/lib/python3.7/site-packages/treelite/lib/libtreelite.so
#6  0x00007ffff65ba9dd in ffi_call_unix64 () from /home/whicks/anaconda3/envs/treelite_test/lib/python3.7/lib-dynload/../../libffi.so.7
#7  0x00007ffff65ba067 in ffi_call_int () from /home/whicks/anaconda3/envs/treelite_test/lib/python3.7/lib-dynload/../../libffi.so.7
#8  0x00007ffff65d2517 in _call_function_pointer (argcount=2, resmem=0x7fffffff71e0, restype=<optimized out>, atypes=0x7fffffff71a0, avalues=0x7fffffff71c0, pProc=0x7fffe5854fb0 <TreeliteLoadXGBoostModel>, 
    flags=4353) at /usr/local/src/conda/python-3.7.9/Modules/_ctypes/callproc.c:829
#9  _ctypes_callproc (pProc=0x7fffe5854fb0 <TreeliteLoadXGBoostModel>, argtuple=<optimized out>, flags=4353, argtypes=<optimized out>, restype=0x555555b8c740, checker=0x0)
    at /usr/local/src/conda/python-3.7.9/Modules/_ctypes/callproc.c:1201
#10 0x00007ffff65d2f84 in PyCFuncPtr_call (self=<optimized out>, inargs=<optimized out>, kwds=0x0) at /usr/local/src/conda/python-3.7.9/Modules/_ctypes/_ctypes.c:4025
#11 0x00005555556c0ccb in _PyObject_FastCallKeywords (callable=0x7fffca3c3ae0, stack=0x7fffca69bba0, nargs=2, kwnames=0x0) at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:199```

@wphicks
Copy link
Contributor Author

wphicks commented Sep 30, 2020

@hcho3 Unfortunately, it looks like updating the dmlc-core reference did not do the trick. I'm trying to confirm the dmlc-core commit for our current xgboost version now.

@hcho3
Copy link
Collaborator

hcho3 commented Sep 30, 2020

@wphicks Related issue: apache/tvm#4953. This was fixed by dmlc/xgboost#5590 by hiding all C++ symbols in libxgboost.so. It appears that the fix was not complete.

@hcho3
Copy link
Collaborator

hcho3 commented Sep 30, 2020

@wphicks Indeed, dmlc/xgboost#5590 was incomplete and fails to hide C++ symbols from dmlc-core:

$ readelf -a --wide  ~/miniconda3/lib/python3.7/site-packages/xgboost/lib/libxgboost.so | grep Open | grep LocalFileSystem
0000000018da3360  0000067800000001 R_X86_64_64            000000000050afa0 _ZN4dmlc2io15LocalFileSystem4OpenERKNS0_3URIEPKcb + 0
0000000018da3368  0000023a00000001 R_X86_64_64            000000000050af50 _ZN4dmlc2io15LocalFileSystem11OpenForReadERKNS0_3URIEb + 0
   570: 000000000050af50    19 FUNC    GLOBAL DEFAULT   11 _ZN4dmlc2io15LocalFileSystem11OpenForReadERKNS0_3URIEb
  1656: 000000000050afa0  1970 FUNC    GLOBAL DEFAULT   11 _ZN4dmlc2io15LocalFileSystem4OpenERKNS0_3URIEPKcb
 16252: 000000000050afa0  1970 FUNC    GLOBAL DEFAULT   11 _ZN4dmlc2io15LocalFileSystem4OpenERKNS0_3URIEPKcb
 16916: 000000000050af50    19 FUNC    GLOBAL DEFAULT   11 _ZN4dmlc2io15LocalFileSystem11OpenForReadERKNS0_3URIEb

@hcho3
Copy link
Collaborator

hcho3 commented Sep 30, 2020

I submitted dmlc/xgboost#6188 to hide C++ symbols from dmlc-core. XGBoost 1.3.0 won't be out until October, so in the meanwhile we can work around the issue by using the standard C++ function for file I/O and doing away with dmlc::LocalFileSystem entirely.

@hcho3
Copy link
Collaborator

hcho3 commented Sep 30, 2020

@wphicks I've verified that dmlc/xgboost#6188 fixes this issue.

@wphicks
Copy link
Contributor Author

wphicks commented Sep 30, 2020

Wonderful! Thanks, @hcho3. Looks like that PR ran up against a spending limit for Jenkins, but I'll review tomorrow.

@hcho3
Copy link
Collaborator

hcho3 commented Oct 7, 2020

Fixed by #211

@hcho3 hcho3 closed this as completed Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants