Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single GPU training #37

Open
Ali-Abolfathi opened this issue Aug 22, 2021 · 1 comment
Open

Single GPU training #37

Ali-Abolfathi opened this issue Aug 22, 2021 · 1 comment

Comments

@Ali-Abolfathi
Copy link

Ali-Abolfathi commented Aug 22, 2021

hi, thanks for sharing your model, is it possible to train this model on custom dataset with single GPU?, whenever i try to do that, getting this error(im using tools/train.py script):
Traceback (most recent call last): File "CBNetV2/tools/train.py", line 188, in <module> main() File "CBNetV2/tools/train.py", line 184, in main meta=meta) File "/content/CBNetV2/mmdet/apis/train.py", line 185, in train_detector runner.run(data_loaders, cfg.workflow) File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter **kwargs) File "/usr/local/lib/python3.7/dist-packages/mmcv/parallel/data_parallel.py", line 67, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/content/CBNetV2/mmdet/models/detectors/base.py", line 237, in train_step losses = self(**data) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/fp16_utils.py", line 128, in new_func output = old_func(*new_args, **new_kwargs) File "/content/CBNetV2/mmdet/models/detectors/base.py", line 171, in forward return self.forward_train(img, img_metas, **kwargs) File "/content/CBNetV2/mmdet/models/detectors/two_stage.py", line 266, in forward_train **kwargs) File "/content/CBNetV2/mmdet/models/roi_heads/cascade_roi_head.py", line 248, in forward_train rcnn_train_cfg) File "/content/CBNetV2/mmdet/models/roi_heads/cascade_roi_head.py", line 146, in _bbox_forward_train bbox_results = self._bbox_forward(stage, x, rois) File "/content/CBNetV2/mmdet/models/roi_heads/cascade_roi_head.py", line 136, in _bbox_forward cls_score, bbox_pred = bbox_head(bbox_feats) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/content/CBNetV2/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py", line 155, in forward x = conv(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/bricks/conv_module.py", line 201, in forward x = self.norm(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/batchnorm.py", line 731, in forward world_size = torch.distributed.get_world_size(process_group) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 748, in get_world_size return _get_group_size(group) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 274, in _get_group_size default_pg = _get_default_group() File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 358, in _get_default_group raise RuntimeError("Default process group has not been initialized, " RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

@saidineshpola
Copy link

I removed automatic mixed precision by changing runner to epochbasedrunner then it works fine for me .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants