FCOS3D train on kitti dataset #865

xiaofengWang-CCNU · 2021-08-12T10:36:44Z

Sorry to bother you.
To train FCOS3D on kitti dataset, I did following steps.

write the 'fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_kitti-mono3d.py' according to 'fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d.py'.
writer a 'kitti-mono3d.py' in path 'configs/base/datasets' according to 'nus-mono3d.py'.
run python tools/train.py configs/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_kitti-mono3d.py --work-dir ./ckpt --gpu-ids 6
the data are followed the create_data.py.

BUT I get a error :

Traceback (most recent call last):
File "tools/train.py", line 223, in
main()
File "tools/train.py", line 219, in main
meta=meta)
File "/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model
meta=meta)
File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
for i, data_batch in enumerate(self.data_loader):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 291, in iter
return _MultiProcessingDataLoaderIter(self)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 764, in init
self._try_put_index()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 994, in _try_put_index
index = self._next_index()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 357, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 208, in iter
for idx in self.sampler:
File "/opt/conda/lib/python3.7/site-packages/mmdet/datasets/samplers/group_sampler.py", line 36, in iter
indices = np.concatenate(indices)
File "<array_function internals>", line 6, in concatenate
ValueError: need at least one array to concatenate

I can not find what caused this error, does anyone are doing this ,please help me, think you.

Tai-Wang · 2021-08-13T02:09:46Z

Please show your config. Besides, if you are not in a big hurry, please stay tuned for our released KITTI model. It is expected to be done by the end of September.

xiaofengWang-CCNU · 2021-08-13T05:28:24Z

The configs:

1. fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_kitti-mono3d.py:

_base_ = [
'../_base_/datasets/kitti-mono3d.py', '../_base_/models/fcos3d.py',
'../_base_/schedules/mmdet_schedule_1x.py', '../_base_/default_runtime.py'

]

model settings

model = dict(
backbone=dict(
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
stage_with_dcn=(False, False, True, True)))

class_names = [
'Pedestrian', 'Cyclist', 'Car'
]

img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='LoadAnnotations3D',
with_bbox=True,
with_label=True,
with_attr_label=True,
with_bbox_3d=True,
with_label_3d=True,
with_bbox_depth=True),
dict(type='Resize', img_scale=(1600, 900), keep_ratio=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(
type='Collect3D',
keys=[
'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
'gt_labels_3d', 'centers2d', 'depths'
]),
]
test_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='MultiScaleFlipAug',
scale_factor=1.0,
flip=False,
transforms=[
dict(type='RandomFlip3D'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img']),
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))

optimizer

optimizer = dict(
lr=0.002, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(
delete=True, grad_clip=dict(max_norm=35, norm_type=2))

learning policy

lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[8, 11])
total_epochs = 12
evaluation = dict(interval=2)

2. kitti-mono3d.py:

dataset_type = 'NuScenesMonoDataset'
#dataset_type = 'KittiMonoDataset'
data_root = 'data/kitti/'

class_names = [
'Pedestrian', 'Cyclist', 'Car'
]

Input modality for kitti dataset, this is consistent with the submission

format which requires the information in input_modality.

input_modality = dict(
use_lidar=False,
use_camera=True,
use_radar=False,
use_map=False,
use_external=False)
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='LoadAnnotations3D',
with_bbox=True,
with_label=True,
with_attr_label=True,
with_bbox_3d=True,
with_label_3d=True,
with_bbox_depth=True),
dict(type='Resize', img_scale=(1600, 900), keep_ratio=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(
type='Collect3D',
keys=[
'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
'gt_labels_3d', 'centers2d', 'depths'
]),
]
test_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='MultiScaleFlipAug',
scale_factor=1.0,
flip=False,
transforms=[
dict(type='RandomFlip3D'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img']),
])
]

construct a pipeline for data and gt loading in show function

please keep its loading function consistent with test_pipeline (e.g. client)

eval_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img'])
]

data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'kitti_infos_train_mono3d.coco.json',
img_prefix=data_root,
classes=class_names,
pipeline=train_pipeline,
modality=input_modality,
test_mode=False,
box_type_3d='Camera'),
val=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'kitti_infos_val_mono3d.coco.json',
img_prefix=data_root,
classes=class_names,
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
box_type_3d='Camera'),
test=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'kitti_infos_val_mono3d.coco.json',
img_prefix=data_root,
classes=class_names,
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
box_type_3d='Camera'))
evaluation = dict(interval=2)

up here is the config file.

And If i set the 'datase_type'= 'KittiMonoDataset' , there will be another error:
KittiMonoDataset: init() missing 1 required positional argument: 'info_file'
But i can not find which info_file to use

Tai-Wang · 2021-08-13T05:33:15Z

Please use KittiMonoDataset and set info_file the same as LiDAR-based methods (use the .pkl files). You also need to adjust those dataset-specific parameters such as with_attr_label and img_scale, etc.

xiaofengWang-CCNU · 2021-08-13T07:08:36Z

Think you very much for your answer, and i modified it as your suggestted ,
the following question are happend:

Traceback (most recent call last):
File "tools/train.py", line 223, in
main()
File "tools/train.py", line 219, in main
meta=meta)
File "/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model
meta=meta)
File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
for i, data_batch in enumerate(self.data_loader):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in next
data = self._next_data()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/opt/conda/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 194, in getitem
data = self.prepare_train_img(idx)
File "/opt/conda/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 217, in prepare_train_img
return self.pipeline(results)
File "/opt/conda/lib/python3.7/site-packages/mmdet/datasets/pipelines/compose.py", line 40, in call
data = t(data)
File "/mmdetection3d/mmdet3d/datasets/pipelines/formating.py", line 164, in call
data[key] = results[key]
KeyError: 'attr_labels'

The keys=[
'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
'gt_labels_3d', 'centers2d', 'depths'
] in train_pipeline should be modified, But where the keys come from

Tai-Wang · 2021-08-14T06:17:16Z

The keys are recorded after several data preprocessing of the overall training pipeline. Similarly to removing the with_attr_label, you need to remove attr_labels from keys.

xiaofengWang-CCNU · 2021-08-17T02:27:28Z

Think you for your answer, I have removed the attr_labels , it seems that i have set a wrong data size ,i have tried every possible size, but it still have the following question:

Traceback (most recent call last):
  File "tools/train.py", line 223, in <module>
    main()
  File "tools/train.py", line 219, in main
    meta=meta)
  File "/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model
    meta=meta)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
    **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 237, in train_step
    losses = self(**data)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 171, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/mmdetection3d/mmdet3d/models/detectors/single_stage_mono3d.py", line 67, in forward_train
    attr_labels, gt_bboxes_ignore)
  File "/mmdetection3d/mmdet3d/models/dense_heads/base_mono3d_dense_head.py", line 71, in forward_train
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 184, in new_func
    return old_func(*args, **kwargs)
  File "/mmdetection3d/mmdet3d/models/dense_heads/fcos_mono3d_head.py", line 309, in loss
    gt_labels_3d, centers2d, depths, attr_labels)
  File "/mmdetection3d/mmdet3d/models/dense_heads/fcos_mono3d_head.py", line 801, in get_targets
    num_points_per_lvl=num_points)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/core/utils/misc.py", line 29, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/mmdetection3d/mmdet3d/models/dense_heads/fcos_mono3d_head.py", line 876, in _get_target_single
    self.bbox_code_size)
RuntimeError: The expanded size of the tensor (9) must match the existing size (7) at non-singleton dimension 2.  Target sizes: [9978, 4, 9].  Tensor sizes: [1, 4, 7]

The operation of expend get a wrong parameter, I am very confused about it, please help me ,think you.

xiaofengWang-CCNU · 2021-08-17T05:04:04Z

I have set the self.bbox_code_size = 7, but what img_scale should be set to?

Tai-Wang · 2021-08-17T12:21:40Z

Should be (1242, 375) for KITTI images.

xiaofengWang-CCNU · 2021-08-18T02:35:01Z

Think you very much for your help, i have set the img_scale=(1242,375), and an unexpected error happend:

Traceback (most recent call last):
  File "tools/train.py", line 223, in <module>
    main()
  File "tools/train.py", line 219, in main
    meta=meta)
  File "/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model
    meta=meta)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
    **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 237, in train_step
    losses = self(**data)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 171, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/mmdetection3d/mmdet3d/models/detectors/single_stage_mono3d.py", line 67, in forward_train
    attr_labels, gt_bboxes_ignore)
  File "/mmdetection3d/mmdet3d/models/dense_heads/base_mono3d_dense_head.py", line 71, in forward_train
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 184, in new_func
    return old_func(*args, **kwargs)
  File "/mmdetection3d/mmdet3d/models/dense_heads/fcos_mono3d_head.py", line 411, in loss
    avg_factor=equal_weights.sum())
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/losses/smooth_l1_loss.py", line 97, in forward
    **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/parrots_jit.py", line 21, in wrapper_inner
    return func(*args, **kargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/losses/utils.py", line 96, in wrapper
    loss = loss_func(pred, target, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/losses/smooth_l1_loss.py", line 25, in smooth_l1_loss
    assert pred.size() == target.size() and target.numel() > 0
AssertionError

The pred.size and target.size is:

torch.Size([63, 2]) torch.Size([63, 2])
torch.Size([63]) torch.Size([63])
torch.Size([63, 3]) torch.Size([63, 3])
torch.Size([63]) torch.Size([63])
torch.Size([63, 2]) torch.Size([63, 0])

I do not know what caused this error, Is there any other KITTI specific parameters should be adjusted?

To solve this error, I just set pred_velo=False and pred_attrs=False, i am not sure if that is right.

the class_names for KITTI is following, is this right?

class_names = [
     'Car', 'Van', 'Truck', 'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram', 'Misc'
 ]

Setting as above, there is a key error when eval time, so I modified the class_to_name and class_to_rangeas follow:

    class_to_name = {
        0: 'Car',
        1: 'Pedestrian',
        2: 'Cyclist',
        3: 'Van',
        4: 'Person_sitting',
        5: 'Truck',
        6: 'Misc',
        7: 'Tram',
    }
    class_to_range = {
        0: [0.5, 0.95, 10],
        1: [0.25, 0.7, 10],
        2: [0.25, 0.7, 10],
        3: [0.5, 0.95, 10],
        4: [0.25, 0.7, 10],
        5: [0.25, 0.7, 10],
        6: [0.5, 0.95, 10],
        7: [0.25, 0.7, 10],

I wonder if this is right

Tai-Wang · 2021-08-22T10:46:31Z

The class_names should be ['Car', 'Pedestrian', 'Cyclist'] because the mainstream 3D detection setting only supports the evaluation of these classes (with enough samples).

xiaofengWang-CCNU · 2021-09-02T06:55:01Z

Think you very much for your help, I have trained FCOS3D on KITTI dataset ,the configs are as follow:

fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_kitti-mono3d.py

_base_ = [
    '../_base_/datasets/kitti-mono3d.py', '../_base_/models/fcos3d.py',
    '../_base_/schedules/mmdet_schedule_1x.py', '../_base_/default_runtime.py'
]
# model settings
model = dict(
    backbone=dict(
        dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
        stage_with_dcn=(False, False, True, True)))

class_names = [
    'Pedestrian', 'Cyclist', 'Car'
]

img_norm_cfg = dict(
    mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='LoadAnnotations3D',
        with_bbox=True,
        with_label=True,
        #with_attr_label=False,
        with_bbox_3d=True,
        with_label_3d=True,
        with_bbox_depth=True),
    dict(type='Resize', img_scale=(1242,375), keep_ratio=True),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(
        type='Collect3D',
        keys=[
            'img', 'gt_bboxes', 'gt_labels', 'gt_bboxes_3d',
            'gt_labels_3d', 'centers2d', 'depths'
        ]),
]
test_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='MultiScaleFlipAug',
        scale_factor=1.0,
        flip=False,
        transforms=[
            dict(type='RandomFlip3D'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(pipeline=train_pipeline),
    val=dict(pipeline=test_pipeline),
    test=dict(pipeline=test_pipeline))
# optimizer
optimizer = dict(
    lr=0.002, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(
    _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
total_epochs = 24
evaluation = dict(interval=2)

kitti-mono3d.py 

dataset_type = 'KittiMonoDataset'
data_root = 'data/kitti/'

class_names = [
    'Pedestrian', 'Cyclist', 'Car'
]

# Input modality for kitti dataset, this is consistent with the submission
# format which requires the information in input_modality.
input_modality = dict(
    use_lidar=False,
    use_camera=True,
    use_radar=False,
    use_map=False,
    use_external=False)
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='LoadAnnotations3D',
        with_bbox=True,
        with_label=True,
        #with_attr_label=False,
        with_bbox_3d=True,
        with_label_3d=True,
        with_bbox_depth=True),
    dict(type='Resize', img_scale=(1242,375), keep_ratio=True),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(
        type='Collect3D',
        keys=[
            'img', 'gt_bboxes', 'gt_labels', 'gt_bboxes_3d',
            'gt_labels_3d', 'centers2d', 'depths'
        ]),
]
test_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='MultiScaleFlipAug',
        scale_factor=1.0,
        flip=False,
        transforms=[
            dict(type='RandomFlip3D'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['img']),
        ])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='DefaultFormatBundle3D',
        class_names=class_names,
        with_label=False),
    dict(type='Collect3D', keys=['img'])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'kitti_infos_train_mono3d.coco.json',
        info_file=data_root + 'kitti_infos_train.pkl',
        img_prefix=data_root,
        classes=class_names,
        pipeline=train_pipeline,
        modality=input_modality,
        test_mode=False,
        box_type_3d='Camera'),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'kitti_infos_val_mono3d.coco.json',
        info_file=data_root + 'kitti_infos_val.pkl',
        img_prefix=data_root,
        classes=class_names,
        pipeline=test_pipeline,
        modality=input_modality,
        test_mode=True,
        box_type_3d='Camera'),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'kitti_infos_val_mono3d.coco.json',
        info_file=data_root + 'kitti_infos_val.pkl',
        img_prefix=data_root,
        classes=class_names,
        pipeline=test_pipeline,
        modality=input_modality,
        test_mode=True,
        box_type_3d='Camera'))
evaluation = dict(interval=2)

fcos3d.py

model = dict(
    type='FCOSMono3D',
    pretrained='open-mmlab://detectron2/resnet101_caffe',
    backbone=dict(
        type='ResNet',
        depth=101,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='caffe'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs='on_output',
        num_outs=5,
        relu_before_extra_convs=True),
    bbox_head=dict(
        type='FCOSMono3DHead',
        num_classes=3,
        in_channels=256,
        stacked_convs=2,
        feat_channels=256,
        use_direction_classifier=True,
        diff_rad_by_sin=True,
        pred_attrs=False,
        pred_velo=False,
        dir_offset=0.7854,  # pi/4
        strides=[8, 16, 32, 64, 128],
        group_reg_dims=(2, 1, 3, 1, 2),  # offset, depth, size, rot, velo
        cls_branch=(256, ),
        reg_branch=(
            (256, ),  # offset
            (256, ),  # depth
            (256, ),  # size
            (256, ),  # rot
            ()  # velo
        ),
        dir_branch=(256, ),
        attr_branch=(256, ),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
        loss_dir=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
        loss_attr=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
        loss_centerness=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        norm_on_bbox=True,
        centerness_on_reg=True,
        center_sampling=True,
        conv_bias=True,
        dcn_on_last_conv=True),
    train_cfg=dict(
        allowed_border=0,
        code_weight=[1.0, 1.0, 0.2, 1.0, 1.0, 1.0, 1.0, 0.05, 0.05],
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        use_rotate_nms=True,
        nms_across_levels=False,
        nms_pre=1000,
        nms_thr=0.8,
        score_thr=0.05,
        min_bbox_size=0,
        max_per_img=200))

Also we need to modify the bbox_code_size=7 on anchor_free_mono3d_head.py

The result are as follow(24 epoch):

I have run the mono_det_demo.py on nusences dataset, the result are as follow:

If you are doing this, please let me know, let's make this work perfect.

likegogogo · 2021-10-10T00:53:40Z

@xiaofengWang-CCNU had you train fcos3d in waymo dataset? as waymo dataset can be converted to kitti.

Tai-Wang · 2021-10-31T07:37:53Z

Hi all, thanks for your interest!

We have got an updated version of FCOS3D (FCOS3D++ or PGD) with #964 and #1014 supported on KITTI. You can refer to that config and implementation for more insights. Some hyperparameters of the baseline (FCOS3D) are basically fine-tuned but I believe there is still space for better performance. Hope you can make further progress!

Tai-Wang · 2021-11-25T09:11:41Z

We are working on a more extensive study based on FCOS3D and PGD on different datasets. Just close this issue temporarily. We will update related information on the homepage if there is any progress. Please stay tuned.

BJLZ123 · 2021-12-06T12:09:21Z

@xiaofengWang-CCNU Could you leave your email to me, I am also using fcos3d in kitti, hope to learn from you。
My e-mail is [email protected]。thank you~

YinengXiong · 2022-04-28T01:50:49Z

@xiaofengWang-CCNU Could you leave your email to me, I am also using fcos3d in KITTI, but I can't get a similar result with your config file, My email is [email protected]. hope to learn from you, thanks a lot!

abhi1kumar · 2022-09-22T22:07:30Z

If you are doing this, please let me know, let's make this work perfect.

Your config does not reproduce the AP2D closer to 70. We have to train it with batch size= 12 on a single GPU to get AP2D Mod Car 0.7 closer to 70%.

data = dict(
    samples_per_gpu=12,
    workers_per_gpu=12
)

abhi1kumar · 2022-09-28T03:13:07Z

We are working on a more extensive study based on FCOS3D and PGD on different datasets. Just close this issue temporarily. We will update related information on the homepage if there is any progress. Please stay tuned.

Hi @Tai-Wang ,
Thank you for releasing your nuscenes configs of FCOS3D. Table 1 of your PGD paper also reports the FCOS3D results on the KITTI dataset with AP11 metric. Would it be possible for you to add the FCOS3D KITTI config to the mmdetection3d library?

PS - I tried the kitti_run_13.py.txt config for the FCOS3D on KITTI. The KITTI results are as follows (I could not reproduce the exact FCOS3D results as mentioned in Table 1 of PGD):

----------- AP11 Results ------------

Pedestrian [email protected], 0.50, 0.50:
bbox AP11:48.7265, 44.4238, 40.3403
bev  AP11:3.7565, 3.1921, 2.6185
3d   AP11:3.0281, 2.1568, 2.0752
aos  AP11:35.20, 31.88, 28.86
Pedestrian [email protected], 0.25, 0.25:
bbox AP11:48.7265, 44.4238, 40.3403
bev  AP11:15.2305, 13.2454, 11.8222
3d   AP11:14.6855, 12.6808, 11.2241
aos  AP11:35.20, 31.88, 28.86
Cyclist [email protected], 0.50, 0.50:
bbox AP11:40.4218, 29.6994, 28.6308
bev  AP11:2.6796, 1.5958, 1.5836
3d   AP11:1.8958, 1.2950, 1.2330
aos  AP11:26.26, 19.90, 19.12
Cyclist [email protected], 0.25, 0.25:
bbox AP11:40.4218, 29.6994, 28.6308
bev  AP11:13.3322, 8.0502, 7.3994
3d   AP11:12.7632, 7.0859, 7.0180
aos  AP11:26.26, 19.90, 19.12
Car [email protected], 0.70, 0.70:
bbox AP11:71.5747, 65.0664, 58.6049
bev  AP11:13.6629, 9.4923, 8.6624
3d   AP11:9.6028, 6.3318, 5.8389
aos  AP11:69.96, 63.08, 56.13
Car [email protected], 0.50, 0.50:
bbox AP11:71.5747, 65.0664, 58.6049
bev  AP11:32.6482, 23.5753, 22.5470
3d   AP11:28.7454, 20.1327, 19.1243
aos  AP11:69.96, 63.08, 56.13

Overall AP11@easy, moderate, hard:
bbox AP11:53.5743, 46.3966, 42.5253
bev  AP11:6.6997, 4.7601, 4.2882
3d   AP11:4.8422, 3.2612, 3.0490
aos  AP11:43.81, 38.29, 34.71

----------- AP40 Results ------------

Pedestrian [email protected], 0.50, 0.50:
bbox AP40:47.3424, 42.3251, 38.3909
bev  AP40:3.0132, 2.5833, 2.1692
3d   AP40:2.2745, 1.8029, 1.5599
aos  AP40:32.24, 27.95, 25.11
Pedestrian [email protected], 0.25, 0.25:
bbox AP40:47.3424, 42.3251, 38.3909
bev  AP40:13.6192, 11.8712, 10.1345
3d   AP40:13.0446, 11.2606, 9.6154
aos  AP40:32.24, 27.95, 25.11
Cyclist [email protected], 0.50, 0.50:
bbox AP40:39.7180, 26.7853, 25.8877
bev  AP40:2.2422, 1.2011, 1.1086
3d   AP40:1.4964, 0.8123, 0.7267
aos  AP40:26.13, 18.61, 17.95
Cyclist [email protected], 0.25, 0.25:
bbox AP40:39.7180, 26.7853, 25.8877
bev  AP40:11.7421, 6.6859, 6.1926
3d   AP40:11.2264, 6.1054, 5.7610
aos  AP40:26.13, 18.61, 17.95
Car [email protected], 0.70, 0.70:
bbox AP40:72.8897, 65.7473, 58.7460
bev  AP40:11.0352, 7.9578, 7.2419
3d   AP40:6.3220, 4.2078, 3.7063
aos  AP40:71.19, 63.68, 56.11
Car [email protected], 0.50, 0.50:
bbox AP40:72.8897, 65.7473, 58.7460
bev  AP40:32.1019, 23.0358, 21.6403
3d   AP40:27.9831, 19.6851, 18.4440
aos  AP40:71.19, 63.68, 56.11

Overall AP40@easy, moderate, hard:
bbox AP40:53.3167, 44.9526, 41.0082
bev  AP40:5.4302, 3.9141, 3.5066
3d   AP40:3.3643, 2.2743, 1.9977
aos  AP40:43.19, 36.75, 33.06

DongkyuYu · 2023-03-29T02:57:40Z

Hi @Tai-Wang!
Thank you for your efforts to share the PGD embodiment!
I have some confusions at your config file configs/pgd/pgd_r101_caffe_fpn_gn-head_3x4_4x_kitti-mono3d.py.
Why is the pred_keypoints option set true when the nuScenes experiments and the original paper didn't predict keypoints.
Is it just for get more performance? And it seems that at test time keypoints prediction didn't affect to bbox predictions, isn't it?

xiaofengWang-CCNU mentioned this issue Sep 2, 2021

Training Kitti data set on focs3d, the prediction results are offset #883

Closed

Tai-Wang self-assigned this Oct 31, 2021

Tai-Wang added community discussion enhancement New feature or request reimplementation labels Oct 31, 2021

Gewaihir mentioned this issue Nov 3, 2021

Questions about FCOS3D and PGD model 3D box #1024

Closed

Tai-Wang closed this as completed Nov 25, 2021

This was referenced Oct 18, 2023

[Bug] FCOS3D train on kitti dataset #2779

Open

[Bug] fcos3d kitti dataset img_scale #2789

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FCOS3D train on kitti dataset #865

FCOS3D train on kitti dataset #865

xiaofengWang-CCNU commented Aug 12, 2021

Tai-Wang commented Aug 13, 2021 •

edited

Loading

xiaofengWang-CCNU commented Aug 13, 2021

Tai-Wang commented Aug 13, 2021

xiaofengWang-CCNU commented Aug 13, 2021

Tai-Wang commented Aug 14, 2021

xiaofengWang-CCNU commented Aug 17, 2021

xiaofengWang-CCNU commented Aug 17, 2021

Tai-Wang commented Aug 17, 2021

xiaofengWang-CCNU commented Aug 18, 2021 •

edited

Loading

Tai-Wang commented Aug 22, 2021

xiaofengWang-CCNU commented Sep 2, 2021 •

edited

Loading

likegogogo commented Oct 10, 2021

Tai-Wang commented Oct 31, 2021

Tai-Wang commented Nov 25, 2021

BJLZ123 commented Dec 6, 2021

YinengXiong commented Apr 28, 2022

abhi1kumar commented Sep 22, 2022

abhi1kumar commented Sep 28, 2022

DongkyuYu commented Mar 29, 2023

FCOS3D train on kitti dataset #865

FCOS3D train on kitti dataset #865

Comments

xiaofengWang-CCNU commented Aug 12, 2021

Tai-Wang commented Aug 13, 2021 • edited Loading

xiaofengWang-CCNU commented Aug 13, 2021

model settings

optimizer

learning policy

Input modality for kitti dataset, this is consistent with the submission

format which requires the information in input_modality.

construct a pipeline for data and gt loading in show function

please keep its loading function consistent with test_pipeline (e.g. client)

Tai-Wang commented Aug 13, 2021

xiaofengWang-CCNU commented Aug 13, 2021

Tai-Wang commented Aug 14, 2021

xiaofengWang-CCNU commented Aug 17, 2021

xiaofengWang-CCNU commented Aug 17, 2021

Tai-Wang commented Aug 17, 2021

xiaofengWang-CCNU commented Aug 18, 2021 • edited Loading

Tai-Wang commented Aug 22, 2021

xiaofengWang-CCNU commented Sep 2, 2021 • edited Loading

likegogogo commented Oct 10, 2021

Tai-Wang commented Oct 31, 2021

Tai-Wang commented Nov 25, 2021

BJLZ123 commented Dec 6, 2021

YinengXiong commented Apr 28, 2022

abhi1kumar commented Sep 22, 2022

abhi1kumar commented Sep 28, 2022

DongkyuYu commented Mar 29, 2023

Tai-Wang commented Aug 13, 2021 •

edited

Loading

xiaofengWang-CCNU commented Aug 18, 2021 •

edited

Loading

xiaofengWang-CCNU commented Sep 2, 2021 •

edited

Loading