Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when use NeighborSampler. #3835

Closed
gebrahimi91 opened this issue Jan 11, 2022 · 2 comments
Closed

Segmentation fault when use NeighborSampler. #3835

gebrahimi91 opened this issue Jan 11, 2022 · 2 comments
Labels

Comments

@gebrahimi91
Copy link

gebrahimi91 commented Jan 11, 2022

🐛 Bug

I keep getting Segmentation fault when I use NeighborSampler.

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffec1257e5d in sample_adj_cpu(at::Tensor, at::Tensor, at::Tensor, long, bool) () from /opt/conda/lib/python3.8/site-packages/torch_sparse/_sample_cuda.so

To Reproduce

import matplotlib.pyplot as plt
from torch_geometric.data import NeighborSampler
import torch
import pickle
class Trainer:
def __init__(self, batch_size, model, gpu_id, lr, lam = 0.1, num_neighbors = [-1, -1]):
    file_to_read = open("samples_tr_small.pkl", "rb")

    samples_tr = pickle.load(file_to_read)
    self.batch_size = batch_size
    self.model = model
    self.lr = lr
    self.num_neighbors = num_neighbors
    if isinstance(gpu_id, list):
        pass
    else:
        gpu_id = [gpu_id, gpu_id]
    self.gpu_id = gpu_id
    if self.gpu_id[0]>=0:
        self.device = torch.device('cuda:0')
    else:
        self.device = torch.device('cpu:0')



    self.train_loaders = []


    row,    col, value = samples_tr['adj_t'].coo()
    print(row.max())
    print(col.max())
    for i in range(1):

        self.train_loader = NeighborSampler(samples_tr['adj_t'], 
                               sizes=self.num_neighbors, batch_size=self.batch_size, 
                               shuffle=True, num_workers=0,
                               #num_nodes = 200000,
                               return_e_id = False)
        self.train_loaders.append(self.train_loader)
            

        
    self.x_train = samples_tr['all_x']
    self.mode_train = samples_tr['modality']
    self.labels_train = samples_tr['node_labels']
    self.group_train = samples_tr['group']
    self.num_samples_train = samples_tr['num_samples']

    
    
def train(self, epochs):
    self.epoch_losses = []
    self.epoch_accuracies =[]
    self.val_losses = []
    self.val_accuracies = []

    
    for epoch in range(epochs):
        #self.model.train()
        #for train_loader in self.train_loaders: ######
        cnt = 0
        for train_loader in self.train_loaders:

            for batch_size, n_id, adjs in train_loader:
                print(cnt)
                cnt+=1
                continue
    return 0
gpu_id = [0, 1]
if not torch.cuda.is_available():
gpu_id = -1
lr = 0.0001
lam = 0.1
batch_size = 2
num_neighbors = [3,3]
num_hops = len(num_neighbors)
model = None
t = Trainer(batch_size = batch_size, model =model , gpu_id=gpu_id, num_neighbors = num_neighbors, lam = lam, lr = lr)
a = t.train(1)

Expected behavior

No seg fault.

Environment

  • PyG version (torch_geometric.__version__): 2.0.3
  • PyTorch version: (torch.__version__): 1.8.1
  • OS (e.g., Linux): Linux
  • Python version (e.g., 3.9): 3.8.8
  • CUDA/cuDNN version: 11.5
  • How you installed PyTorch and PyG (conda, pip, source): pip
  • Any other relevant information (e.g., version of torch-scatter):

Additional context

The pickle file can be downloaded here

@gebrahimi91
Copy link
Author

gebrahimi91 commented Jan 11, 2022

gbd bt output:

#0 0x00007ffec1257e5d in sample_adj_cpu(at::Tensor, at::Tensor, at::Tensor, long, bool) () from /opt/conda/lib/python3.8/site-packages/torch_sparse/_sample_cuda.so #1 0x00007ffec1250b69 in sample_adj(at::Tensor, at::Tensor, at::Tensor, long, bool) () from /opt/conda/lib/python3.8/site-packages/torch_sparse/_sample_cuda.so #2 0x00007ffec1256377 in std::decay<c10::guts::infer_function_traits<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (*)(at::Tensor, at::Tensor, at::Tensor, long, bool), std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor, at::Tensor, at::Tensor, long, bool> > >::type::return_type>::type c10::impl::call_functor_with_args_from_stack_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (*)(at::Tensor, at::Tensor, at::Tensor, long, bool), std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor, at::Tensor, at::Tensor, long, bool> >, true, 0ul, 1ul, 2ul, 3ul, 4ul>(c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (*)(at::Tensor, at::Tensor, at::Tensor, long, bool), std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor, at::Tensor, at::Tensor, long, bool> >*, std::vector<c10::IValue, std::allocator<c10::IValue> >*, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul>) () from /opt/conda/lib/python3.8/site-packages/torch_sparse/_sample_cuda.so #3 0x00007ffec125673d in c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (*)(at::Tensor, at::Tensor, at::Tensor, long, bool), std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor, at::Tensor, at::Tensor, long, bool> >, true>::call(c10::OperatorKernel*, c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) () from /opt/conda/lib/python3.8/site-packages/torch_sparse/_sample_cuda.so #4 0x00007fff66fd74b5 in torch::jit::(anonymous namespace)::createOperatorFromC10_withTracingHandledHere(c10::OperatorHandle const&)::{lambda(std::vector<c10::IValue, std::allocator<c10::IValue> >*)#1}::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >*) const () from /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so #5 0x00007fffba6d6e82 in torch::jit::invokeOperatorFromPython(std::vector<std::shared_ptr<torch::jit::Operator>, std::allocator<std::shared_ptr<torch::jit::Operator> > > const&, pybind11::args, pybind11::kwargs const&) () from /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so #6 0x00007fffba6afdd2 in torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#115}::operator()(std::string const&) const::{lambda(pybind11::args, {lambda(std::string const&)#115}::kwargs)#1}::operator()(pybind11, pybind11::args) const () from /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so #7 0x00007fffba6b04cf in void pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#115}::operator()(std::string const&) const::{lambda(pybind11::args, pybind11::kwargs)#1}, pybind11::object, {lambda(std::string const&)#115}, pybind11::args, pybind11::name, pybind11::doc>(torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#115}::operator()(std::string const&) const::{lambda(pybind11::args, pybind11::kwargs)#1}&&, pybind11::object (*)({lambda(std::string const&)#115}, pybind11::args), pybind11::name const&, pybind11::doc const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail) () from /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so #8 0x00007fffba2ec35e in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so #9 0x00005555556a82d8 in cfunction_call_varargs (kwargs=<optimized out>, args=<optimized out>, func=0x7ffff6db7450) at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:743 #10 PyCFunction_Call () at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:773 #11 0x0000555555697edc in _PyObject_MakeTpCall.localalias.6 () at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:159 #12 0x0000555555723879 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x5555c885d428, callable=0x7ffff6db7450) at /tmp/build/80754af9/python_1614202678154/work/Include/cpython/abstract.h:125 #13 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x5555558f3510) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:4963 #14 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:3469 #15 0x00005555556ed760 in PyEval_EvalFrameEx (throwflag=0, f=0x5555c885d260) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:741 #16 _PyEval_EvalCodeWithName.localalias.4 () at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:4298 #17 0x00005555556ee970 in _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=0x5555c7fe7948, func=0x7ffec059fdc0) at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:436 #18 _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=0x5555c7fe7948, callable=0x7ffec059fdc0) at /tmp/build/80754af9/python_1614202678154/work/Include/cpython/abstract.h:127 #19 method_vectorcall () at /tmp/build/80754af9/python_1614202678154/work/Objects/classobject.c:60 #20 0x0000555555657562 in _PyObject_Vectorcall (kwnames=0x7ffec0538160, nargsf=<optimized out>, args=<optimized out>, callable=0x7ffeac0488c0) at /tmp/build/80754af9/python_1614202678154/work/Include/cpython/abstract.h:127 #21 call_function (kwnames=0x7ffec0538160, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:4963 #22 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:3515 #23 0x00005555556ee85b in function_code_fastcall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>) at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:284 #24 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=0x7ffe0a136fd0, func=0x7ffec053a8b0) at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:411 #25 _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=0x7ffe0a136fd0, callable=0x7ffec053a8b0) at /tmp/build/80754af9/python_1614202678154/work/Include/cpython/abstract.h:127 #26 method_vectorcall () at /tmp/build/80754af9/python_1614202678154/work/Objects/classobject.c:60 #27 0x00005555556579bd in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffe0a136fd8, callable=0x7ffe329bf0c0) at /tmp/build/80754af9/python_1614202678154/work/Include/cpython/abstract.h:127 #28 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x5555558f3510) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:4963 #29 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:3469 #30 0x00005555556edd11 in PyEval_EvalFrameEx (throwflag=0, f=0x7ffe0a136e40) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:741 ---Type <return> to continue, or q <return> to quit--- #31 _PyEval_EvalCodeWithName.localalias.4 () at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:4298 #32 0x00005555556ee593 in _PyFunction_Vectorcall.localalias.352 () at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:436 #33 0x000055555565799c in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffdc3e56528, callable=0x7ffec9894550) at /tmp/build/80754af9/python_1614202678154/work/Include/cpython/abstract.h:127 #34 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x5555558f3510) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:4963 #35 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:3486 #36 0x00005555556ee4bb in function_code_fastcall (globals=<optimized out>, nargs=1, args=<optimized out>, co=<optimized out>) at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:284 #37 _PyFunction_Vectorcall.localalias.352 () at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:411 #38 0x000055555565799c in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffe0a609d70, callable=0x7ffec98990d0) at /tmp/build/80754af9/python_1614202678154/work/Include/cpython/abstract.h:127 #39 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x5555558f3510) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:4963 #40 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:3486 #41 0x00005555556ee4bb in function_code_fastcall (globals=<optimized out>, nargs=1, args=<optimized out>, co=<optimized out>) at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:284 #42 _PyFunction_Vectorcall.localalias.352 () at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:411 #43 0x000055555569cc4b in _PyObject_Vectorcall (kwnames=0x0, nargsf=1, args=0x7fffffffc930, callable=0x7ffec9894e50) at /tmp/build/80754af9/python_1614202678154/work/Include/cpython/abstract.h:127 #44 _PyObject_FastCall () at /tmp/build/80754af9/python_1614202678154/work/Include/cpython/abstract.h:147 #45 _PyObject_FastCall_Prepend () at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:850 #46 0x00005555556ac30e in call_unbound (nargs=0, args=0x0, self=0x7ffff7e78a30, func=0x7ffec9894e50, unbound=<optimized out>) at /tmp/build/80754af9/python_1614202678154/work/Objects/typeobject.c:1453 #47 call_method (nargs=0, args=0x0, name=0x5555558b1c20 <PyId___next__.15157>, obj=0x7ffff7e78a30) at /tmp/build/80754af9/python_1614202678154/work/Objects/typeobject.c:1485 #48 slot_tp_iternext () at /tmp/build/80754af9/python_1614202678154/work/Objects/typeobject.c:6732 #49 0x000055555571fd89 in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:3202 #50 0x00005555556ee4bb in function_code_fastcall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>) at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:284 #51 _PyFunction_Vectorcall.localalias.352 () at /tmp/build/80754af9/python_1614202678154/work/Objects/call.c:411 #52 0x000055555565799c in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff7ebd5b0, callable=0x7fffe5dcc310) at /tmp/build/80754af9/python_1614202678154/work/Include/cpython/abstract.h:127 #53 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x5555558f3510) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:4963 #54 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:3486 #55 0x00005555556ed760 in PyEval_EvalFrameEx (throwflag=0, f=0x7ffff7ebd440) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:741 #56 _PyEval_EvalCodeWithName.localalias.4 () at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:4298 #57 0x00005555557824e3 in PyEval_EvalCodeEx () at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:4327 #58 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at /tmp/build/80754af9/python_1614202678154/work/Python/ceval.c:718 #59 0x0000555555782584 in run_eval_code_obj () at /tmp/build/80754af9/python_1614202678154/work/Python/pythonrun.c:1165 #60 0x00005555557a87c4 in run_mod () at /tmp/build/80754af9/python_1614202678154/work/Python/pythonrun.c:1187 #61 0x0000555555669620 in pyrun_file (fp=0x55555595b350, filename=0x7ffff7e14a30, start=<optimized out>, globals=0x7ffff7f203c0, locals=0x7ffff7f203c0, closeit=1, flags=0x7fffffffcea8) at /tmp/build/80754af9/python_1614202678154/work/Python/pythonrun.c:1084 #62 0x000055555566c362 in pyrun_simple_file (flags=0x7fffffffcea8, closeit=1, filename=0x7ffff7e14a30, fp=0x55555595b350) at /tmp/build/80754af9/python_1614202678154/work/Python/pythonrun.c:439 #63 PyRun_SimpleFileExFlags (fp=0x55555595b350, filename=<optimized out>, closeit=1, flags=0x7fffffffcea8) at /tmp/build/80754af9/python_1614202678154/work/Python/pythonrun.c:472 #64 0x000055555566ce80 in pymain_run_file (cf=0x7fffffffcea8, config=0x5555558f29a0) at /tmp/build/80754af9/python_1614202678154/work/Modules/main.c:391 #65 pymain_run_python (exitcode=0x7fffffffcea0) at /tmp/build/80754af9/python_1614202678154/work/Modules/main.c:616 #66 Py_RunMain () at /tmp/build/80754af9/python_1614202678154/work/Modules/main.c:695 #67 0x00005555557ab979 in Py_BytesMain () at /tmp/build/80754af9/python_1614202678154/work/Modules/main.c:1141 #68 0x00007ffff77e4bf7 in __libc_start_main (main=0x55555566d6f0 <main>, argc=2, argv=0x7fffffffd098, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd088) at ../csu/libc-start.c:310 #69 0x000055555573b185 in _start () at ../sysdeps/x86_64/elf/start.S:103

@rusty1s
Copy link
Member

rusty1s commented Jan 12, 2022

Seems to be related to an incorrect definition of sparse_sizes.

@rusty1s rusty1s closed this as completed Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants