We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi Has anyome managed to train multi-gpus? I'm using this command python train_3d.py --outdir=./outdir --data=shapenet_get3d/img/03790512 --camera_path shapenet_get3d/camera --gpus=8 --batch=32 --gamma=40 --data_camera_mode shapenet_motorbike --dmtet_scale 1.0 --use_shapenet_split 1 --one_3d_generator 0 --img_res=256 --kimg=200 --workers 1
python train_3d.py --outdir=./outdir --data=shapenet_get3d/img/03790512 --camera_path shapenet_get3d/camera --gpus=8 --batch=32 --gamma=40 --data_camera_mode shapenet_motorbike --dmtet_scale 1.0 --use_shapenet_split 1 --one_3d_generator 0 --img_res=256 --kimg=200 --workers 1
Constructing networks... terminate called after throwing an instance of 'std::runtime_error' what(): NCCL error in: /pytorch/torch/lib/c10d/../c10d/NCCLUtils.hpp:158, unhandled cuda error, NCCL version 2.7.8 ncclUnhandledCudaError: Call to CUDA function failed. Setting up augmentation... Distributing across 8 GPUs... Traceback (most recent call last): File "train_3d.py", line 339, in <module> main() # pylint: disable=no-value-for-parameter File "~/miniconda3x86/envs/get3d/lib/python3.8/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "~/miniconda3x86/envs/get3d/lib/python3.8/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "~/miniconda3x86/envs/get3d/lib/python3.8/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "~/miniconda3x86/envs/get3d/lib/python3.8/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "train_3d.py", line 333, in main launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run) File "train_3d.py", line 107, in launch_training torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus) File "~/miniconda3x86/envs/get3d/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "~/miniconda3x86/envs/get3d/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "~/miniconda3x86/envs/get3d/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException: -- Process 3 terminated with the following error: Traceback (most recent call last): File "~/miniconda3x86/envs/get3d/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "~/GET3D/train_3d.py", line 51, in subprocess_fn training_loop_3d.training_loop(rank=rank, **c) File "~/GET3D/training/training_loop_3d.py", line 159, in training_loop G = dnnlib.util.construct_class_by_name(**G_kwargs, **common_kwargs).train().requires_grad_(False).to( File "~/GET3D/dnnlib/util.py", line 306, in construct_class_by_name return call_func_by_name(*args, func_name=class_name, **kwargs) File "~/GET3D/dnnlib/util.py", line 301, in call_func_by_name return func_obj(*args, **kwargs) File "~/GET3D/torch_utils/persistence.py", line 105, in __init__ super().__init__(*args, **kwargs) File "~/GET3D/training/networks_get3d.py", line 599, in __init__ self.synthesis = DMTETSynthesisNetwork( File "~/GET3D/torch_utils/persistence.py", line 105, in __init__ super().__init__(*args, **kwargs) File "~/GET3D/training/networks_get3d.py", line 81, in __init__ self.dmtet_geometry = DMTetGeometry( File "~/GET3D/uni_rep/rep_3d/dmtet.py", line 423, in __init__ all_edges_sorted = torch.sort(all_edges, dim=1)[0] RuntimeError: CUDA error: an illegal memory access was encountered
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hi
Has anyome managed to train multi-gpus? I'm using this command
python train_3d.py --outdir=./outdir --data=shapenet_get3d/img/03790512 --camera_path shapenet_get3d/camera --gpus=8 --batch=32 --gamma=40 --data_camera_mode shapenet_motorbike --dmtet_scale 1.0 --use_shapenet_split 1 --one_3d_generator 0 --img_res=256 --kimg=200 --workers 1
The text was updated successfully, but these errors were encountered: