Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FCOS3D train on kitti dataset #865

Closed
xiaofengWang-CCNU opened this issue Aug 12, 2021 · 19 comments
Closed

FCOS3D train on kitti dataset #865

xiaofengWang-CCNU opened this issue Aug 12, 2021 · 19 comments
Assignees

Comments

@xiaofengWang-CCNU
Copy link

Sorry to bother you.
To train FCOS3D on kitti dataset, I did following steps.

  1. write the 'fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_kitti-mono3d.py' according to 'fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d.py'.

  2. writer a 'kitti-mono3d.py' in path 'configs/base/datasets' according to 'nus-mono3d.py'.

  3. run python tools/train.py configs/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_kitti-mono3d.py --work-dir ./ckpt --gpu-ids 6

  4. the data are followed the create_data.py.

BUT I get a error :

Traceback (most recent call last):
File "tools/train.py", line 223, in
main()
File "tools/train.py", line 219, in main
meta=meta)
File "/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model
meta=meta)
File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
for i, data_batch in enumerate(self.data_loader):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 291, in iter
return _MultiProcessingDataLoaderIter(self)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 764, in init
self._try_put_index()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 994, in _try_put_index
index = self._next_index()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 357, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 208, in iter
for idx in self.sampler:
File "/opt/conda/lib/python3.7/site-packages/mmdet/datasets/samplers/group_sampler.py", line 36, in iter
indices = np.concatenate(indices)
File "<array_function internals>", line 6, in concatenate
ValueError: need at least one array to concatenate

I can not find what caused this error, does anyone are doing this ,please help me, think you.

@Tai-Wang
Copy link
Member

Tai-Wang commented Aug 13, 2021

Please show your config. Besides, if you are not in a big hurry, please stay tuned for our released KITTI model. It is expected to be done by the end of September.

@xiaofengWang-CCNU
Copy link
Author

The configs:

1. fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_kitti-mono3d.py:

_base_ = [
'../_base_/datasets/kitti-mono3d.py', '../_base_/models/fcos3d.py',
'../_base_/schedules/mmdet_schedule_1x.py', '../_base_/default_runtime.py'

]

model settings

model = dict(
backbone=dict(
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
stage_with_dcn=(False, False, True, True)))

class_names = [
'Pedestrian', 'Cyclist', 'Car'
]

img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='LoadAnnotations3D',
with_bbox=True,
with_label=True,
with_attr_label=True,
with_bbox_3d=True,
with_label_3d=True,
with_bbox_depth=True),
dict(type='Resize', img_scale=(1600, 900), keep_ratio=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(
type='Collect3D',
keys=[
'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
'gt_labels_3d', 'centers2d', 'depths'
]),
]
test_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='MultiScaleFlipAug',
scale_factor=1.0,
flip=False,
transforms=[
dict(type='RandomFlip3D'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img']),
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))

optimizer

optimizer = dict(
lr=0.002, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(
delete=True, grad_clip=dict(max_norm=35, norm_type=2))

learning policy

lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[8, 11])
total_epochs = 12
evaluation = dict(interval=2)

2. kitti-mono3d.py:

dataset_type = 'NuScenesMonoDataset'
#dataset_type = 'KittiMonoDataset'
data_root = 'data/kitti/'

class_names = [
'Pedestrian', 'Cyclist', 'Car'
]

Input modality for kitti dataset, this is consistent with the submission

format which requires the information in input_modality.

input_modality = dict(
use_lidar=False,
use_camera=True,
use_radar=False,
use_map=False,
use_external=False)
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='LoadAnnotations3D',
with_bbox=True,
with_label=True,
with_attr_label=True,
with_bbox_3d=True,
with_label_3d=True,
with_bbox_depth=True),
dict(type='Resize', img_scale=(1600, 900), keep_ratio=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(
type='Collect3D',
keys=[
'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
'gt_labels_3d', 'centers2d', 'depths'
]),
]
test_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='MultiScaleFlipAug',
scale_factor=1.0,
flip=False,
transforms=[
dict(type='RandomFlip3D'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img']),
])
]

construct a pipeline for data and gt loading in show function

please keep its loading function consistent with test_pipeline (e.g. client)

eval_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img'])
]

data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'kitti_infos_train_mono3d.coco.json',
img_prefix=data_root,
classes=class_names,
pipeline=train_pipeline,
modality=input_modality,
test_mode=False,
box_type_3d='Camera'),
val=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'kitti_infos_val_mono3d.coco.json',
img_prefix=data_root,
classes=class_names,
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
box_type_3d='Camera'),
test=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'kitti_infos_val_mono3d.coco.json',
img_prefix=data_root,
classes=class_names,
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
box_type_3d='Camera'))
evaluation = dict(interval=2)

up here is the config file.

And If i set the 'datase_type'= 'KittiMonoDataset' , there will be another error:
KittiMonoDataset: init() missing 1 required positional argument: 'info_file'
But i can not find which info_file to use

@Tai-Wang
Copy link
Member

Please use KittiMonoDataset and set info_file the same as LiDAR-based methods (use the .pkl files). You also need to adjust those dataset-specific parameters such as with_attr_label and img_scale, etc.

@xiaofengWang-CCNU
Copy link
Author

Think you very much for your answer, and i modified it as your suggestted ,
the following question are happend:

Traceback (most recent call last):
File "tools/train.py", line 223, in
main()
File "tools/train.py", line 219, in main
meta=meta)
File "/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model
meta=meta)
File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
for i, data_batch in enumerate(self.data_loader):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in next
data = self._next_data()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/opt/conda/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 194, in getitem
data = self.prepare_train_img(idx)
File "/opt/conda/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 217, in prepare_train_img
return self.pipeline(results)
File "/opt/conda/lib/python3.7/site-packages/mmdet/datasets/pipelines/compose.py", line 40, in call
data = t(data)
File "/mmdetection3d/mmdet3d/datasets/pipelines/formating.py", line 164, in call
data[key] = results[key]
KeyError: 'attr_labels'

The keys=[
'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
'gt_labels_3d', 'centers2d', 'depths'
] in train_pipeline should be modified, But where the keys come from

@Tai-Wang
Copy link
Member

The keys are recorded after several data preprocessing of the overall training pipeline. Similarly to removing the with_attr_label, you need to remove attr_labels from keys.

@xiaofengWang-CCNU
Copy link
Author

Think you for your answer, I have removed the attr_labels , it seems that i have set a wrong data size ,i have tried every possible size, but it still have the following question:

Traceback (most recent call last):
  File "tools/train.py", line 223, in <module>
    main()
  File "tools/train.py", line 219, in main
    meta=meta)
  File "/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model
    meta=meta)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
    **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 237, in train_step
    losses = self(**data)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 171, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/mmdetection3d/mmdet3d/models/detectors/single_stage_mono3d.py", line 67, in forward_train
    attr_labels, gt_bboxes_ignore)
  File "/mmdetection3d/mmdet3d/models/dense_heads/base_mono3d_dense_head.py", line 71, in forward_train
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 184, in new_func
    return old_func(*args, **kwargs)
  File "/mmdetection3d/mmdet3d/models/dense_heads/fcos_mono3d_head.py", line 309, in loss
    gt_labels_3d, centers2d, depths, attr_labels)
  File "/mmdetection3d/mmdet3d/models/dense_heads/fcos_mono3d_head.py", line 801, in get_targets
    num_points_per_lvl=num_points)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/core/utils/misc.py", line 29, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/mmdetection3d/mmdet3d/models/dense_heads/fcos_mono3d_head.py", line 876, in _get_target_single
    self.bbox_code_size)
RuntimeError: The expanded size of the tensor (9) must match the existing size (7) at non-singleton dimension 2.  Target sizes: [9978, 4, 9].  Tensor sizes: [1, 4, 7]

The operation of expend get a wrong parameter, I am very confused about it, please help me ,think you.

@xiaofengWang-CCNU
Copy link
Author

I have set the self.bbox_code_size = 7, but what img_scale should be set to?

@Tai-Wang
Copy link
Member

Should be (1242, 375) for KITTI images.

@xiaofengWang-CCNU
Copy link
Author

xiaofengWang-CCNU commented Aug 18, 2021

Think you very much for your help, i have set the img_scale=(1242,375), and an unexpected error happend:

Traceback (most recent call last):
  File "tools/train.py", line 223, in <module>
    main()
  File "tools/train.py", line 219, in main
    meta=meta)
  File "/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model
    meta=meta)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
    **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 237, in train_step
    losses = self(**data)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 171, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/mmdetection3d/mmdet3d/models/detectors/single_stage_mono3d.py", line 67, in forward_train
    attr_labels, gt_bboxes_ignore)
  File "/mmdetection3d/mmdet3d/models/dense_heads/base_mono3d_dense_head.py", line 71, in forward_train
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 184, in new_func
    return old_func(*args, **kwargs)
  File "/mmdetection3d/mmdet3d/models/dense_heads/fcos_mono3d_head.py", line 411, in loss
    avg_factor=equal_weights.sum())
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/losses/smooth_l1_loss.py", line 97, in forward
    **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/parrots_jit.py", line 21, in wrapper_inner
    return func(*args, **kargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/losses/utils.py", line 96, in wrapper
    loss = loss_func(pred, target, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/losses/smooth_l1_loss.py", line 25, in smooth_l1_loss
    assert pred.size() == target.size() and target.numel() > 0
AssertionError

The pred.size and target.size is:

torch.Size([63, 2]) torch.Size([63, 2])
torch.Size([63]) torch.Size([63])
torch.Size([63, 3]) torch.Size([63, 3])
torch.Size([63]) torch.Size([63])
torch.Size([63, 2]) torch.Size([63, 0])

I do not know what caused this error, Is there any other KITTI specific parameters should be adjusted?

To solve this error, I just set pred_velo=False and pred_attrs=False, i am not sure if that is right.

the class_names for KITTI is following, is this right?

class_names = [
     'Car', 'Van', 'Truck', 'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram', 'Misc'
 ]

Setting as above, there is a key error when eval time, so I modified the class_to_name and class_to_rangeas follow:

    class_to_name = {
        0: 'Car',
        1: 'Pedestrian',
        2: 'Cyclist',
        3: 'Van',
        4: 'Person_sitting',
        5: 'Truck',
        6: 'Misc',
        7: 'Tram',
    }
    class_to_range = {
        0: [0.5, 0.95, 10],
        1: [0.25, 0.7, 10],
        2: [0.25, 0.7, 10],
        3: [0.5, 0.95, 10],
        4: [0.25, 0.7, 10],
        5: [0.25, 0.7, 10],
        6: [0.5, 0.95, 10],
        7: [0.25, 0.7, 10],

I wonder if this is right

@Tai-Wang
Copy link
Member

The class_names should be ['Car', 'Pedestrian', 'Cyclist'] because the mainstream 3D detection setting only supports the evaluation of these classes (with enough samples).

@xiaofengWang-CCNU
Copy link
Author

xiaofengWang-CCNU commented Sep 2, 2021

Think you very much for your help, I have trained FCOS3D on KITTI dataset ,the configs are as follow:

fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_kitti-mono3d.py

_base_ = [
    '../_base_/datasets/kitti-mono3d.py', '../_base_/models/fcos3d.py',
    '../_base_/schedules/mmdet_schedule_1x.py', '../_base_/default_runtime.py'
]
# model settings
model = dict(
    backbone=dict(
        dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
        stage_with_dcn=(False, False, True, True)))

class_names = [
    'Pedestrian', 'Cyclist', 'Car'
]

img_norm_cfg = dict(
    mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='LoadAnnotations3D',
        with_bbox=True,
        with_label=True,
        #with_attr_label=False,
        with_bbox_3d=True,
        with_label_3d=True,
        with_bbox_depth=True),
    dict(type='Resize', img_scale=(1242,375), keep_ratio=True),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(
        type='Collect3D',
        keys=[
            'img', 'gt_bboxes', 'gt_labels', 'gt_bboxes_3d',
            'gt_labels_3d', 'centers2d', 'depths'
        ]),
]
test_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='MultiScaleFlipAug',
        scale_factor=1.0,
        flip=False,
        transforms=[
            dict(type='RandomFlip3D'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(pipeline=train_pipeline),
    val=dict(pipeline=test_pipeline),
    test=dict(pipeline=test_pipeline))
# optimizer
optimizer = dict(
    lr=0.002, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(
    _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
total_epochs = 24
evaluation = dict(interval=2)
kitti-mono3d.py 

dataset_type = 'KittiMonoDataset'
data_root = 'data/kitti/'

class_names = [
    'Pedestrian', 'Cyclist', 'Car'
]

# Input modality for kitti dataset, this is consistent with the submission
# format which requires the information in input_modality.
input_modality = dict(
    use_lidar=False,
    use_camera=True,
    use_radar=False,
    use_map=False,
    use_external=False)
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='LoadAnnotations3D',
        with_bbox=True,
        with_label=True,
        #with_attr_label=False,
        with_bbox_3d=True,
        with_label_3d=True,
        with_bbox_depth=True),
    dict(type='Resize', img_scale=(1242,375), keep_ratio=True),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(
        type='Collect3D',
        keys=[
            'img', 'gt_bboxes', 'gt_labels', 'gt_bboxes_3d',
            'gt_labels_3d', 'centers2d', 'depths'
        ]),
]
test_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='MultiScaleFlipAug',
        scale_factor=1.0,
        flip=False,
        transforms=[
            dict(type='RandomFlip3D'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['img']),
        ])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='DefaultFormatBundle3D',
        class_names=class_names,
        with_label=False),
    dict(type='Collect3D', keys=['img'])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'kitti_infos_train_mono3d.coco.json',
        info_file=data_root + 'kitti_infos_train.pkl',
        img_prefix=data_root,
        classes=class_names,
        pipeline=train_pipeline,
        modality=input_modality,
        test_mode=False,
        box_type_3d='Camera'),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'kitti_infos_val_mono3d.coco.json',
        info_file=data_root + 'kitti_infos_val.pkl',
        img_prefix=data_root,
        classes=class_names,
        pipeline=test_pipeline,
        modality=input_modality,
        test_mode=True,
        box_type_3d='Camera'),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'kitti_infos_val_mono3d.coco.json',
        info_file=data_root + 'kitti_infos_val.pkl',
        img_prefix=data_root,
        classes=class_names,
        pipeline=test_pipeline,
        modality=input_modality,
        test_mode=True,
        box_type_3d='Camera'))
evaluation = dict(interval=2)

fcos3d.py

model = dict(
    type='FCOSMono3D',
    pretrained='open-mmlab://detectron2/resnet101_caffe',
    backbone=dict(
        type='ResNet',
        depth=101,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='caffe'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs='on_output',
        num_outs=5,
        relu_before_extra_convs=True),
    bbox_head=dict(
        type='FCOSMono3DHead',
        num_classes=3,
        in_channels=256,
        stacked_convs=2,
        feat_channels=256,
        use_direction_classifier=True,
        diff_rad_by_sin=True,
        pred_attrs=False,
        pred_velo=False,
        dir_offset=0.7854,  # pi/4
        strides=[8, 16, 32, 64, 128],
        group_reg_dims=(2, 1, 3, 1, 2),  # offset, depth, size, rot, velo
        cls_branch=(256, ),
        reg_branch=(
            (256, ),  # offset
            (256, ),  # depth
            (256, ),  # size
            (256, ),  # rot
            ()  # velo
        ),
        dir_branch=(256, ),
        attr_branch=(256, ),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
        loss_dir=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
        loss_attr=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
        loss_centerness=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        norm_on_bbox=True,
        centerness_on_reg=True,
        center_sampling=True,
        conv_bias=True,
        dcn_on_last_conv=True),
    train_cfg=dict(
        allowed_border=0,
        code_weight=[1.0, 1.0, 0.2, 1.0, 1.0, 1.0, 1.0, 0.05, 0.05],
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        use_rotate_nms=True,
        nms_across_levels=False,
        nms_pre=1000,
        nms_thr=0.8,
        score_thr=0.05,
        min_bbox_size=0,
        max_per_img=200))

Also we need to modify the bbox_code_size=7 on anchor_free_mono3d_head.py

The result are as follow(24 epoch):
mono3d_result

I have run the mono_det_demo.py on nusences dataset, the result are as follow:
n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525_pred1

If you are doing this, please let me know, let's make this work perfect.

@likegogogo
Copy link

@xiaofengWang-CCNU had you train fcos3d in waymo dataset? as waymo dataset can be converted to kitti.

@Tai-Wang
Copy link
Member

Hi all, thanks for your interest!

We have got an updated version of FCOS3D (FCOS3D++ or PGD) with #964 and #1014 supported on KITTI. You can refer to that config and implementation for more insights. Some hyperparameters of the baseline (FCOS3D) are basically fine-tuned but I believe there is still space for better performance. Hope you can make further progress!

@Tai-Wang
Copy link
Member

We are working on a more extensive study based on FCOS3D and PGD on different datasets. Just close this issue temporarily. We will update related information on the homepage if there is any progress. Please stay tuned.

@BJLZ123
Copy link

BJLZ123 commented Dec 6, 2021

@xiaofengWang-CCNU Could you leave your email to me, I am also using fcos3d in kitti, hope to learn from you。
My e-mail is [email protected]。thank you~

@YinengXiong
Copy link

@xiaofengWang-CCNU Could you leave your email to me, I am also using fcos3d in KITTI, but I can't get a similar result with your config file, My email is [email protected]. hope to learn from you, thanks a lot!

@abhi1kumar
Copy link

If you are doing this, please let me know, let's make this work perfect.

Your config does not reproduce the AP2D closer to 70. We have to train it with batch size= 12 on a single GPU to get AP2D Mod Car 0.7 closer to 70%.

data = dict(
    samples_per_gpu=12,
    workers_per_gpu=12
)

@abhi1kumar
Copy link

We are working on a more extensive study based on FCOS3D and PGD on different datasets. Just close this issue temporarily. We will update related information on the homepage if there is any progress. Please stay tuned.

Hi @Tai-Wang ,
Thank you for releasing your nuscenes configs of FCOS3D. Table 1 of your PGD paper also reports the FCOS3D results on the KITTI dataset with AP11 metric. Would it be possible for you to add the FCOS3D KITTI config to the mmdetection3d library?

PS - I tried the kitti_run_13.py.txt config for the FCOS3D on KITTI. The KITTI results are as follows (I could not reproduce the exact FCOS3D results as mentioned in Table 1 of PGD):

----------- AP11 Results ------------

Pedestrian [email protected], 0.50, 0.50:
bbox AP11:48.7265, 44.4238, 40.3403
bev  AP11:3.7565, 3.1921, 2.6185
3d   AP11:3.0281, 2.1568, 2.0752
aos  AP11:35.20, 31.88, 28.86
Pedestrian [email protected], 0.25, 0.25:
bbox AP11:48.7265, 44.4238, 40.3403
bev  AP11:15.2305, 13.2454, 11.8222
3d   AP11:14.6855, 12.6808, 11.2241
aos  AP11:35.20, 31.88, 28.86
Cyclist [email protected], 0.50, 0.50:
bbox AP11:40.4218, 29.6994, 28.6308
bev  AP11:2.6796, 1.5958, 1.5836
3d   AP11:1.8958, 1.2950, 1.2330
aos  AP11:26.26, 19.90, 19.12
Cyclist [email protected], 0.25, 0.25:
bbox AP11:40.4218, 29.6994, 28.6308
bev  AP11:13.3322, 8.0502, 7.3994
3d   AP11:12.7632, 7.0859, 7.0180
aos  AP11:26.26, 19.90, 19.12
Car [email protected], 0.70, 0.70:
bbox AP11:71.5747, 65.0664, 58.6049
bev  AP11:13.6629, 9.4923, 8.6624
3d   AP11:9.6028, 6.3318, 5.8389
aos  AP11:69.96, 63.08, 56.13
Car [email protected], 0.50, 0.50:
bbox AP11:71.5747, 65.0664, 58.6049
bev  AP11:32.6482, 23.5753, 22.5470
3d   AP11:28.7454, 20.1327, 19.1243
aos  AP11:69.96, 63.08, 56.13

Overall AP11@easy, moderate, hard:
bbox AP11:53.5743, 46.3966, 42.5253
bev  AP11:6.6997, 4.7601, 4.2882
3d   AP11:4.8422, 3.2612, 3.0490
aos  AP11:43.81, 38.29, 34.71

----------- AP40 Results ------------

Pedestrian [email protected], 0.50, 0.50:
bbox AP40:47.3424, 42.3251, 38.3909
bev  AP40:3.0132, 2.5833, 2.1692
3d   AP40:2.2745, 1.8029, 1.5599
aos  AP40:32.24, 27.95, 25.11
Pedestrian [email protected], 0.25, 0.25:
bbox AP40:47.3424, 42.3251, 38.3909
bev  AP40:13.6192, 11.8712, 10.1345
3d   AP40:13.0446, 11.2606, 9.6154
aos  AP40:32.24, 27.95, 25.11
Cyclist [email protected], 0.50, 0.50:
bbox AP40:39.7180, 26.7853, 25.8877
bev  AP40:2.2422, 1.2011, 1.1086
3d   AP40:1.4964, 0.8123, 0.7267
aos  AP40:26.13, 18.61, 17.95
Cyclist [email protected], 0.25, 0.25:
bbox AP40:39.7180, 26.7853, 25.8877
bev  AP40:11.7421, 6.6859, 6.1926
3d   AP40:11.2264, 6.1054, 5.7610
aos  AP40:26.13, 18.61, 17.95
Car [email protected], 0.70, 0.70:
bbox AP40:72.8897, 65.7473, 58.7460
bev  AP40:11.0352, 7.9578, 7.2419
3d   AP40:6.3220, 4.2078, 3.7063
aos  AP40:71.19, 63.68, 56.11
Car [email protected], 0.50, 0.50:
bbox AP40:72.8897, 65.7473, 58.7460
bev  AP40:32.1019, 23.0358, 21.6403
3d   AP40:27.9831, 19.6851, 18.4440
aos  AP40:71.19, 63.68, 56.11

Overall AP40@easy, moderate, hard:
bbox AP40:53.3167, 44.9526, 41.0082
bev  AP40:5.4302, 3.9141, 3.5066
3d   AP40:3.3643, 2.2743, 1.9977
aos  AP40:43.19, 36.75, 33.06

@DongkyuYu
Copy link

Hi @Tai-Wang!
Thank you for your efforts to share the PGD embodiment!
I have some confusions at your config file configs/pgd/pgd_r101_caffe_fpn_gn-head_3x4_4x_kitti-mono3d.py.
Why is the pred_keypoints option set true when the nuScenes experiments and the original paper didn't predict keypoints.
Is it just for get more performance? And it seems that at test time keypoints prediction didn't affect to bbox predictions, isn't it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants