Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: device-side assert triggered when testing trained custom data #56

Open
YuzhouPeng opened this issue May 23, 2022 · 1 comment

Comments

@YuzhouPeng
Copy link

hello, I tried to test self-train weight using vit base, and here is following errors:

Traceback (most recent call last):
File "test.py", line 70, in
num_query)
File "/home/pengyuzhou/workspace/TransReID/processor/processor.py", line 162, in do_inference
feat = model(img, cam_label=camids, view_label=target_view)
File "/home/pengyuzhou/miniconda3/envs/transreid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/pengyuzhou/workspace/TransReID/model/make_model.py", line 310, in forward
features = self.base(x, cam_label=cam_label, view_label=view_label)
File "/home/pengyuzhou/miniconda3/envs/transreid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/pengyuzhou/workspace/TransReID/model/backbones/vit_pytorch.py", line 414, in forward
x = self.forward_features(x, cam_label, view_label)
File "/home/pengyuzhou/workspace/TransReID/model/backbones/vit_pytorch.py", line 402, in forward_features
x = blk(x)
File "/home/pengyuzhou/miniconda3/envs/transreid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/pengyuzhou/workspace/TransReID/model/backbones/vit_pytorch.py", line 190, in forward
x = x + self.drop_path(self.mlp(self.norm2(x)))
RuntimeError: CUDA error: device-side assert triggered

and here is the training configure file:

vit_base.yml
MODEL:
PRETRAIN_CHOICE: 'imagenet'
PRETRAIN_PATH: '/home/pengyuzhou/.cache/torch/jx_vit_base_p16_224-80ecf9dd.pth'
METRIC_LOSS_TYPE: 'triplet'
IF_LABELSMOOTH: 'off'
IF_WITH_CENTER: 'no'
NAME: 'transformer'
NO_MARGIN: True
DEVICE_ID: ('1')
TRANSFORMER_TYPE: 'vit_base_patch16_224_TransReID'
STRIDE_SIZE: [16, 16]

INPUT:
SIZE_TRAIN: [256, 128]
SIZE_TEST: [256, 128]
PROB: 0.5 # random horizontal flip
RE_PROB: 0.5 # random erasing
PADDING: 10
PIXEL_MEAN: [0.5, 0.5, 0.5]
PIXEL_STD: [0.5, 0.5, 0.5]

DATASETS:
NAMES: ('dukemtmc')
ROOT_DIR: ('/home/pengyuzhou/workspace/TransReID/data')

DATALOADER:
SAMPLER: 'softmax_triplet'
NUM_INSTANCE: 4
NUM_WORKERS: 8

SOLVER:
OPTIMIZER_NAME: 'SGD'
MAX_EPOCHS: 120
BASE_LR: 0.008
IMS_PER_BATCH: 256
WARMUP_METHOD: 'linear'
LARGE_FC_LR: False
CHECKPOINT_PERIOD: 9
LOG_PERIOD: 50
EVAL_PERIOD: 120
WEIGHT_DECAY: 1e-4
WEIGHT_DECAY_BIAS: 1e-4
BIAS_LR_FACTOR: 2

TEST:
EVAL: True
IMS_PER_BATCH: 128
RE_RANKING: False
WEIGHT: 'output.pt'
NECK_FEAT: 'before'
FEAT_NORM: 'yes'

OUTPUT_DIR: '/home/pengyuzhou/workspace/TransReID/logs'

here is test configure file:

vit_transreid.yml
MODEL:
PRETRAIN_CHOICE: 'imagenet'
PRETRAIN_PATH: '/home/pengyuzhou/.cache/torch/jx_vit_base_p16_224-80ecf9dd.pth'
METRIC_LOSS_TYPE: 'triplet'
IF_LABELSMOOTH: 'off'
IF_WITH_CENTER: 'no'
NAME: 'transformer'
NO_MARGIN: True
DEVICE_ID: ('3')
TRANSFORMER_TYPE: 'vit_base_patch16_224_TransReID'
STRIDE_SIZE: [16, 16]
SIE_CAMERA: True
SIE_COE: 3.0
JPM: True
RE_ARRANGE: True

INPUT:
SIZE_TRAIN: [256, 128]
SIZE_TEST: [256, 128]
PROB: 0.5 # random horizontal flip
RE_PROB: 0.5 # random erasing
PADDING: 10
PIXEL_MEAN: [0.5, 0.5, 0.5]
PIXEL_STD: [0.5, 0.5, 0.5]

DATASETS:
NAMES: ('dukemtmc')
ROOT_DIR: ('/home/pengyuzhou/workspace/TransReID/data')

DATALOADER:
SAMPLER: 'softmax_triplet'
NUM_INSTANCE: 4
NUM_WORKERS: 8

SOLVER:
OPTIMIZER_NAME: 'SGD'
MAX_EPOCHS: 120
BASE_LR: 0.008
IMS_PER_BATCH: 256
WARMUP_METHOD: 'linear'
LARGE_FC_LR: False
CHECKPOINT_PERIOD: 120
LOG_PERIOD: 50
EVAL_PERIOD: 120
WEIGHT_DECAY: 1e-4
WEIGHT_DECAY_BIAS: 1e-4
BIAS_LR_FACTOR: 2

TEST:
EVAL: True
IMS_PER_BATCH: 1
RE_RANKING: False
WEIGHT: '/home/pengyuzhou/workspace/TransReID/logs/transformer_27.pth'
NECK_FEAT: 'before'
FEAT_NORM: 'yes'

OUTPUT_DIR: /home/pengyuzhou/workspace/TransReID/logs/duke_vit_transreid'

How to solve this? Many thanks.

@onvungocminh
Copy link

Hi @YuzhouPeng, Did you solve the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants