Segmentation fault when train the net #17

Harvey-Mei · 2021-08-01T03:15:38Z

python train.py --batch_size 24 --experiment_name shapenet-ldif
--model_directory $models --model_type "ldif"
--dataset_directory $dataset
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
INFO: Making dataset...
INFO: Optimized dataset detected at ./shapenet/optimized
INFO: Mapping...
INFO: is_invalid vs lower_coords: [24, 32, 1] vs [24, 32, 3]
INFO: Post-where lower_coords: [24, 32, 3]
INFO: is_invalid vs sdf coords: [24, 32, 1] vs [24, 32, 1]
INFO: In-out image summaries have been removed.
INFO: The 0-th GPU has 22390 MB free.
INFO: TensorFlow can use up to 93.1397945511389% of the total GPU memory.
INFO: Initializing variables...
INFO: No previous checkpoint detected, training from scratch.
Fatal Python error: Segmentation fault

Thread 0x00007fd78cff9700 (most recent call first):
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/threading.py", line 302 in wait
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/queue.py", line 170 in get
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/summary/writer/event_file_writer.py", line 159 in run
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/threading.py", line 932 in _bootstrap_inner
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/threading.py", line 890 in _bootstrap

Thread 0x00007fd9b5258340 (most recent call first):
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1441 in _call_tf_sessionrun
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1349 in _run_fn
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1365 in _do_call
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1358 in _do_run
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1179 in _run
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 955 in run
File "train.py", line 263 in main
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main
File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/absl/app.py", line 312 in run
File "train.py", line 283 in
./reproduce_shapenet_autoencoder.sh: line 50: 1295263 Segmentation fault (core dumped) python train.py --batch_size 24 --experiment_name shapenet-ldif --model_directory $models --model_type "ldif" --dataset_directory $dataset

I have generated the dataset from raw ShapnetCoreV1/03001627 models, by converting .obj file to .ply and then generating watertight .ply file using gaps tools. After that I used the command in the script named reproduce_shapenet_autoencoder.sh to make dataset, everything done successfully. But when I tried to train the net with the dataset, it failed and got the log showed above.

BTW, the enviroment with my computer: ubuntu20.04 with RTX3090, cuda version = 11.1, and I run the code on tensorflow-1.15.
Could you give me some advice for this issue?
Thank you!

Harvey-Mei · 2021-08-09T12:27:49Z

Also, I have successfully run build_gas.sh, gaps_is_installed.sh and build_kernel.sh. with some modification to suit my environment, the scripts showed log as expected and generated all the needed executable files.

Harvey-Mei · 2021-08-09T13:05:42Z

Thread 100 "train.py" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ff865fff700 (LWP 1307527)]
0x00007fff7ca95890 in tensorflow::data::experimental::ParallelInterleaveDatasetOp::Dataset::Iterator::EnsureWorkerThreadsStarted(tensorflow::data::IteratorContext*) ()
from /home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so

I got this message when debug with GDB.

susuhu · 2022-10-06T15:47:47Z

I have the same problem Segmentation fault (core dumped) . I ran build_gas.sh successfully but I can't run build_kernel.sh because "unsupported GNU version! gcc versions later than 6 are not supported!". But since it's optional, it shouldn't affect training, right?
I'm using Ubuntu20.4 with RTX2080. CUDA Version: 11.3. The env is created with the ymal file.

Harvey-Mei closed this as completed Aug 2, 2021

Harvey-Mei reopened this Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when train the net #17

Segmentation fault when train the net #17

Harvey-Mei commented Aug 1, 2021 •

edited

Loading

Harvey-Mei commented Aug 9, 2021 •

edited

Loading

Harvey-Mei commented Aug 9, 2021

susuhu commented Oct 6, 2022

Segmentation fault when train the net #17

Segmentation fault when train the net #17

Comments

Harvey-Mei commented Aug 1, 2021 • edited Loading

Harvey-Mei commented Aug 9, 2021 • edited Loading

Harvey-Mei commented Aug 9, 2021

susuhu commented Oct 6, 2022

Harvey-Mei commented Aug 1, 2021 •

edited

Loading

Harvey-Mei commented Aug 9, 2021 •

edited

Loading