RuntimeError: CUDA error: device-side assert triggered when testing trained custom data #56

YuzhouPeng · 2022-05-23T08:15:46Z

hello, I tried to test self-train weight using vit base, and here is following errors:

Traceback (most recent call last):
File "test.py", line 70, in
num_query)
File "/home/pengyuzhou/workspace/TransReID/processor/processor.py", line 162, in do_inference
feat = model(img, cam_label=camids, view_label=target_view)
File "/home/pengyuzhou/miniconda3/envs/transreid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/pengyuzhou/workspace/TransReID/model/make_model.py", line 310, in forward
features = self.base(x, cam_label=cam_label, view_label=view_label)
File "/home/pengyuzhou/miniconda3/envs/transreid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/pengyuzhou/workspace/TransReID/model/backbones/vit_pytorch.py", line 414, in forward
x = self.forward_features(x, cam_label, view_label)
File "/home/pengyuzhou/workspace/TransReID/model/backbones/vit_pytorch.py", line 402, in forward_features
x = blk(x)
File "/home/pengyuzhou/miniconda3/envs/transreid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/pengyuzhou/workspace/TransReID/model/backbones/vit_pytorch.py", line 190, in forward
x = x + self.drop_path(self.mlp(self.norm2(x)))
RuntimeError: CUDA error: device-side assert triggered

and here is the training configure file:

vit_base.yml
MODEL:
PRETRAIN_CHOICE: 'imagenet'
PRETRAIN_PATH: '/home/pengyuzhou/.cache/torch/jx_vit_base_p16_224-80ecf9dd.pth'
METRIC_LOSS_TYPE: 'triplet'
IF_LABELSMOOTH: 'off'
IF_WITH_CENTER: 'no'
NAME: 'transformer'
NO_MARGIN: True
DEVICE_ID: ('1')
TRANSFORMER_TYPE: 'vit_base_patch16_224_TransReID'
STRIDE_SIZE: [16, 16]

INPUT:
SIZE_TRAIN: [256, 128]
SIZE_TEST: [256, 128]
PROB: 0.5 # random horizontal flip
RE_PROB: 0.5 # random erasing
PADDING: 10
PIXEL_MEAN: [0.5, 0.5, 0.5]
PIXEL_STD: [0.5, 0.5, 0.5]

DATASETS:
NAMES: ('dukemtmc')
ROOT_DIR: ('/home/pengyuzhou/workspace/TransReID/data')

DATALOADER:
SAMPLER: 'softmax_triplet'
NUM_INSTANCE: 4
NUM_WORKERS: 8

SOLVER:
OPTIMIZER_NAME: 'SGD'
MAX_EPOCHS: 120
BASE_LR: 0.008
IMS_PER_BATCH: 256
WARMUP_METHOD: 'linear'
LARGE_FC_LR: False
CHECKPOINT_PERIOD: 9
LOG_PERIOD: 50
EVAL_PERIOD: 120
WEIGHT_DECAY: 1e-4
WEIGHT_DECAY_BIAS: 1e-4
BIAS_LR_FACTOR: 2

TEST:
EVAL: True
IMS_PER_BATCH: 128
RE_RANKING: False
WEIGHT: 'output.pt'
NECK_FEAT: 'before'
FEAT_NORM: 'yes'

OUTPUT_DIR: '/home/pengyuzhou/workspace/TransReID/logs'

here is test configure file:

vit_transreid.yml
MODEL:
PRETRAIN_CHOICE: 'imagenet'
PRETRAIN_PATH: '/home/pengyuzhou/.cache/torch/jx_vit_base_p16_224-80ecf9dd.pth'
METRIC_LOSS_TYPE: 'triplet'
IF_LABELSMOOTH: 'off'
IF_WITH_CENTER: 'no'
NAME: 'transformer'
NO_MARGIN: True
DEVICE_ID: ('3')
TRANSFORMER_TYPE: 'vit_base_patch16_224_TransReID'
STRIDE_SIZE: [16, 16]
SIE_CAMERA: True
SIE_COE: 3.0
JPM: True
RE_ARRANGE: True

INPUT:
SIZE_TRAIN: [256, 128]
SIZE_TEST: [256, 128]
PROB: 0.5 # random horizontal flip
RE_PROB: 0.5 # random erasing
PADDING: 10
PIXEL_MEAN: [0.5, 0.5, 0.5]
PIXEL_STD: [0.5, 0.5, 0.5]

DATASETS:
NAMES: ('dukemtmc')
ROOT_DIR: ('/home/pengyuzhou/workspace/TransReID/data')

DATALOADER:
SAMPLER: 'softmax_triplet'
NUM_INSTANCE: 4
NUM_WORKERS: 8

SOLVER:
OPTIMIZER_NAME: 'SGD'
MAX_EPOCHS: 120
BASE_LR: 0.008
IMS_PER_BATCH: 256
WARMUP_METHOD: 'linear'
LARGE_FC_LR: False
CHECKPOINT_PERIOD: 120
LOG_PERIOD: 50
EVAL_PERIOD: 120
WEIGHT_DECAY: 1e-4
WEIGHT_DECAY_BIAS: 1e-4
BIAS_LR_FACTOR: 2

TEST:
EVAL: True
IMS_PER_BATCH: 1
RE_RANKING: False
WEIGHT: '/home/pengyuzhou/workspace/TransReID/logs/transformer_27.pth'
NECK_FEAT: 'before'
FEAT_NORM: 'yes'

OUTPUT_DIR: /home/pengyuzhou/workspace/TransReID/logs/duke_vit_transreid'

How to solve this? Many thanks.

onvungocminh · 2023-07-18T09:09:21Z

Hi @YuzhouPeng, Did you solve the problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: device-side assert triggered when testing trained custom data #56

RuntimeError: CUDA error: device-side assert triggered when testing trained custom data #56

YuzhouPeng commented May 23, 2022

onvungocminh commented Jul 18, 2023

RuntimeError: CUDA error: device-side assert triggered when testing trained custom data #56

RuntimeError: CUDA error: device-side assert triggered when testing trained custom data #56

Comments

YuzhouPeng commented May 23, 2022

onvungocminh commented Jul 18, 2023