Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train my own datasets (format is like coco datasets) #54

Closed
Xavier-Zeng opened this issue May 30, 2019 · 51 comments
Closed

How to train my own datasets (format is like coco datasets) #54

Xavier-Zeng opened this issue May 30, 2019 · 51 comments

Comments

@Xavier-Zeng
Copy link

Now I have converted my datasets format to coco format, andI want to train my own datasets using FCOS. I referenced GETTING_STARTED.md in mmdetection repo, and there is a tutorial in mmdetection repo to train my own datasets. But in FCOS repo, I find the file FCOS/maskrcnn_benchmark/data/datasets/coco.py is different like /mmdetection/mmdet/datasets/coco.py. Is there any suggestions?

@tianzhi0549
Copy link
Owner

tianzhi0549 commented May 31, 2019

@EDG-Zola You do not need to change this code.
In order to train FCOS on your own dataset, you need to,

  1. Add you dataset to
    "coco_2017_train": {
    . Please use _coco_style as the suffix of your dataset names.
  2. In https://github.com/tianzhi0549/FCOS/blob/master/configs/fcos/fcos_R_50_FPN_1x.yaml, change DATASETS to your own ones.
  3. Modify MODEL.FCOS.NUM_CLASSES in
    _C.MODEL.FCOS.NUM_CLASSES = 81 # the number of classes including background
    if your dataset has a different number of classes.

@sunpeng981712364
Copy link

Thanks for great works! just a refered question. If I have 29 classes, _C.MODEL.FCOS.NUM_CLASSES should be set to 30?

@sunpeng981712364
Copy link

@EDG-Zola You do not need to change this code.
In order to train FCOS on your own dataset, you need to,

  1. Add you dataset to https://github.com/tianzhi0549/FCOS/blob/master/maskrcnn_benchmark/config/defaults.py. Please use _coco_style as the suffix of your dataset names.
  2. In https://github.com/tianzhi0549/FCOS/blob/master/configs/fcos/fcos_R_50_FPN_1x.yaml, change DATASETS to your own ones.
  3. Modify MODEL.FCOS.NUM_CLASSES in
    _C.MODEL.FCOS.NUM_CLASSES = 81 # the number of classes including background

    if your dataset has a different number of classes.
    Thanks for great works! just a refered question. If I have 29 classes, _C.MODEL.FCOS.NUM_CLASSES should be set to 30?

@tianzhi0549
Copy link
Owner

@sunpeng981712364 If the 29 classes do not contain the background class, NUM_CLASSES should be set as 30.

@sunpeng981712364
Copy link

sunpeng981712364 commented Jun 4, 2019

hi, I use fcos_demo.py to visualize the result and it seems right, But when I predict use tools/testnet.py with coco protocol, all the AP/AR is close to zero. Do I need to change tools/testnet.py
@tianzhi0549
Should the following code be add?
top_predictions = self.select_top_predictions(predictions)

@tianzhi0549
Copy link
Owner

@sunpeng981712364 I am not sure what is wrong with your code. It might be helpful to debug your code line by line.

@sunpeng981712364
Copy link

@tianzhi0549 谢谢您的及时回复(#^.^#)嘻嘻

@liuguanglyc
Copy link

Can one or two 1080Ti GPU be used to train?

@tianzhi0549
Copy link
Owner

@liuguanglyc I think you can, but maybe you need to use a smaller input size (e.g., 600px).

@heng2j
Copy link

heng2j commented Jul 31, 2019

Hi @tianzhi0549 , I am trying to train with my own dataset with fcos_R_101_FPN_2x.

However, I encountered the error that mentioned

RuntimeError: Error(s) in loading state_dict for GeneralizedRCNN:
size mismatch for rpn.head.cls_logits.weight: copying a param with shape torch.Size([80, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 256, 3, 3]).
size mismatch for rpn.head.cls_logits.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([11]).

I also removed all the previous checkpoints from ~/.torch/models/

Would you please advice on the steps to retrain a model with your own coco style dataset? Thank you so much!

Training Command

python -m torch.distributed.launch \
    --nproc_per_node=3 \
    --master_port=$((RANDOM + 10000)) \
    tools/train_net.py \
    --skip-test \
    --config-file configs/fcos/fcos_R_101_FPN_2x.yaml \
    DATALOADER.NUM_WORKERS 2 \
    OUTPUT_DIR training_dir/fcos_R_101_FPN_2x

fcos_R_101_FPN_2x.yaml

MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN"
  WEIGHT: "../FCOS/FCOS_R_101_FPN_2x.pth"
  RPN_ONLY: True
  FCOS_ON: True
  BACKBONE:
    CONV_BODY: "R-101-FPN-RETINANET"
  RESNETS:
    BACKBONE_OUT_CHANNELS: 256
  RETINANET:
    USE_C5: False # FCOS uses P5 instead of C5
DATASETS:
  TRAIN: ("my_data_train_coco_style", "my_data_val_coco_style")
  TEST: ("my_data_test_coco_style",)
INPUT:
  MIN_SIZE_RANGE_TRAIN: (640, 800)
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333
DATALOADER:
  SIZE_DIVISIBILITY: 32
SOLVER:
  BASE_LR: 0.01
  WEIGHT_DECAY: 0.0001
  STEPS: (120000, 160000)
  MAX_ITER: 180000
  IMS_PER_BATCH: 4
  WARMUP_METHOD: "constant"

@heng2j
Copy link

heng2j commented Jul 31, 2019

Should I follow the retrain instruction from maskrcnn-benchmark, to trim the last layers. And also add the dataset statement in the _init.py file ?

@tianzhi0549
Copy link
Owner

@heng2j I don't think it is necessary if you have converted your datasets into the coco-style format.

@heng2j
Copy link

heng2j commented Jul 31, 2019

Hi @tianzhi0549 thank you for your quick response. And how about trim the last layers of I am retaining with the given FCOS_R_101_FPN_2x.pth?

@tianzhi0549
Copy link
Owner

@heng2j You might need to do that if you want to fine-tune from coco pre-trained models.

@heng2j
Copy link

heng2j commented Jul 31, 2019

Thank you for your confirmation @tianzhi0549 !! And one more related question, since I am performing some sort of incremental learning which will require manual feature extraction. Any suggestion on the best practices to extract features with FCOS? My target objects can be as small as 16x16 or less. Once again thank you so much for your help and your great work!!

@heng2j
Copy link

heng2j commented Aug 1, 2019

Hi @tianzhi0549, for fine turning with the pretrained model FCOS_R_101_FPN_2x.pth, as you suggested I removed only the following 2 keys from the head.

['module.rpn.head.cls_logits.weight', 'module.rpn.head.cls_logits.bias']

However, the training step completed immediately once started. Would you please advice on what will be the proper way for retrain? So the we will know how to better utilize FCOS for our own domain?

loading annotations into memory...
Done (t=0.14s)
creating index...
index created!
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
2019-07-31 21:59:50,050 maskrcnn_benchmark.trainer INFO: Start training
2019-07-31 21:59:50,285 maskrcnn_benchmark.trainer INFO: Total training time: 0:00:00.234026 (0.0000 s / it)

**Click to expand the logs:**

[FCOS]$ python -m torch.distributed.launch     --nproc_per_node=1     --master_port=$((RANDOM + 10000))     tools/train_net.py     --skip-test     --config-file configs/fcos/fcos_R_101_FPN_2x.yaml     DATALOADER.NUM_WORKERS 2     OUTPUT_DIR training_dir/fcos_R_101_FPN_2x
2019-07-31 21:59:34,252 maskrcnn_benchmark INFO: Using 1 GPUs
2019-07-31 21:59:34,252 maskrcnn_benchmark INFO: Namespace(config_file='configs/fcos/fcos_R_101_FPN_2x.yaml', distributed=False, local_rank=0, opts=['DATALOADER.NUM_WORKERS', '2', 'OUTPUT_DIR', 'training_dir/fcos_R_101_FPN_2x'], skip_test=True)
2019-07-31 21:59:34,252 maskrcnn_benchmark INFO: Collecting env info (might take some time)
2019-07-31 21:59:43,826 maskrcnn_benchmark INFO: 
PyTorch version: 1.0.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: CentOS Linux 7 (Core)
GCC version: (GCC) 4.9.1
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti


Nvidia driver version: 418.56
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] Could not collect
[conda] cuda92                    1.0                           0    pytorch
[conda] pytorch                   1.0.0           py3.7_cuda9.0.176_cudnn7.4.1_1    pytorch
[conda] torchvision               0.2.1                      py_2    pytorch
        Pillow (6.1.0)
2019-07-31 21:59:43,826 maskrcnn_benchmark INFO: Loaded configuration file configs/fcos/fcos_R_101_FPN_2x.yaml
2019-07-31 21:59:43,827 maskrcnn_benchmark INFO: 
MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN"
  WEIGHT: "../FCOS/FCOS_R_101_FPN_2x.pth"
  RPN_ONLY: True
  FCOS_ON: True
  BACKBONE:
    CONV_BODY: "R-101-FPN-RETINANET"
  RESNETS:
    BACKBONE_OUT_CHANNELS: 256
  RETINANET:
    USE_C5: False # FCOS uses P5 instead of C5
DATASETS:
  TRAIN: ("cofga_train_cocostyle", "cofga_val_cocostyle")
  TEST: ("cofga_test_cocostyle",)
INPUT:
  MIN_SIZE_RANGE_TRAIN: (640, 800)
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333
DATALOADER:
  SIZE_DIVISIBILITY: 32
SOLVER:
  BASE_LR: 0.01
  WEIGHT_DECAY: 0.0001
  STEPS: (120000, 160000)
  MAX_ITER: 180000
  IMS_PER_BATCH: 1
  WARMUP_METHOD: "constant"
2019-07-31 21:59:43,828 maskrcnn_benchmark INFO: Running with config:
DATALOADER:
  ASPECT_RATIO_GROUPING: True
  NUM_WORKERS: 2
  SIZE_DIVISIBILITY: 32
DATASETS:
  TEST: ('cofga_test_cocostyle',)
  TRAIN: ('cofga_train_cocostyle', 'cofga_val_cocostyle')
INPUT:
  MAX_SIZE_TEST: 1333
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_RANGE_TRAIN: (640, 800)
  MIN_SIZE_TEST: 800
  MIN_SIZE_TRAIN: (800,)
  PIXEL_MEAN: [102.9801, 115.9465, 122.7717]
  PIXEL_STD: [1.0, 1.0, 1.0]
  TO_BGR255: True
MODEL:
  BACKBONE:
    CONV_BODY: R-101-FPN-RETINANET
    FREEZE_CONV_BODY_AT: 2
    USE_GN: False
  CLS_AGNOSTIC_BBOX_REG: False
  DEVICE: cuda
  FBNET:
    ARCH: default
    ARCH_DEF: 
    BN_TYPE: bn
    DET_HEAD_BLOCKS: []
    DET_HEAD_LAST_SCALE: 1.0
    DET_HEAD_STRIDE: 0
    DW_CONV_SKIP_BN: True
    DW_CONV_SKIP_RELU: True
    KPTS_HEAD_BLOCKS: []
    KPTS_HEAD_LAST_SCALE: 0.0
    KPTS_HEAD_STRIDE: 0
    MASK_HEAD_BLOCKS: []
    MASK_HEAD_LAST_SCALE: 0.0
    MASK_HEAD_STRIDE: 0
    RPN_BN_TYPE: 
    RPN_HEAD_BLOCKS: 0
    SCALE_FACTOR: 1.0
    WIDTH_DIVISOR: 1
  FCOS:
    FPN_STRIDES: [8, 16, 32, 64, 128]
    INFERENCE_TH: 0.05
    LOSS_ALPHA: 0.25
    LOSS_GAMMA: 2.0
    NMS_TH: 0.6
    NUM_CLASSES: 2
    NUM_CONVS: 4
    PRE_NMS_TOP_N: 1000
    PRIOR_PROB: 0.01
  FCOS_ON: True
  FPN:
    USE_GN: False
    USE_RELU: False
  GROUP_NORM:
    DIM_PER_GP: -1
    EPSILON: 1e-05
    NUM_GROUPS: 32
  KEYPOINT_ON: False
  MASK_ON: False
  META_ARCHITECTURE: GeneralizedRCNN
  RESNETS:
    BACKBONE_OUT_CHANNELS: 256
    NUM_GROUPS: 1
    RES2_OUT_CHANNELS: 256
    RES5_DILATION: 1
    STEM_FUNC: StemWithFixedBatchNorm
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: True
    TRANS_FUNC: BottleneckWithFixedBatchNorm
    WIDTH_PER_GROUP: 64
  RETINANET:
    ANCHOR_SIZES: (32, 64, 128, 256, 512)
    ANCHOR_STRIDES: (8, 16, 32, 64, 128)
    ASPECT_RATIOS: (0.5, 1.0, 2.0)
    BBOX_REG_BETA: 0.11
    BBOX_REG_WEIGHT: 4.0
    BG_IOU_THRESHOLD: 0.4
    FG_IOU_THRESHOLD: 0.5
    INFERENCE_TH: 0.05
    LOSS_ALPHA: 0.25
    LOSS_GAMMA: 2.0
    NMS_TH: 0.4
    NUM_CLASSES: 81
    NUM_CONVS: 4
    OCTAVE: 2.0
    PRE_NMS_TOP_N: 1000
    PRIOR_PROB: 0.01
    SCALES_PER_OCTAVE: 3
    STRADDLE_THRESH: 0
    USE_C5: False
  RETINANET_ON: False
  ROI_BOX_HEAD:
    CONV_HEAD_DIM: 256
    DILATION: 1
    FEATURE_EXTRACTOR: ResNet50Conv5ROIFeatureExtractor
    MLP_HEAD_DIM: 1024
    NUM_CLASSES: 81
    NUM_STACKED_CONVS: 4
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_SCALES: (0.0625,)
    PREDICTOR: FastRCNNPredictor
    USE_GN: False
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
    BG_IOU_THRESHOLD: 0.5
    DETECTIONS_PER_IMG: 100
    FG_IOU_THRESHOLD: 0.5
    NMS: 0.5
    POSITIVE_FRACTION: 0.25
    SCORE_THRESH: 0.05
    USE_FPN: False
  ROI_KEYPOINT_HEAD:
    CONV_LAYERS: (512, 512, 512, 512, 512, 512, 512, 512)
    FEATURE_EXTRACTOR: KeypointRCNNFeatureExtractor
    MLP_HEAD_DIM: 1024
    NUM_CLASSES: 17
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_SCALES: (0.0625,)
    PREDICTOR: KeypointRCNNPredictor
    RESOLUTION: 14
    SHARE_BOX_FEATURE_EXTRACTOR: True
  ROI_MASK_HEAD:
    CONV_LAYERS: (256, 256, 256, 256)
    DILATION: 1
    FEATURE_EXTRACTOR: ResNet50Conv5ROIFeatureExtractor
    MLP_HEAD_DIM: 1024
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_SCALES: (0.0625,)
    POSTPROCESS_MASKS: False
    POSTPROCESS_MASKS_THRESHOLD: 0.5
    PREDICTOR: MaskRCNNC4Predictor
    RESOLUTION: 14
    SHARE_BOX_FEATURE_EXTRACTOR: True
    USE_GN: False
  RPN:
    ANCHOR_SIZES: (32, 64, 128, 256, 512)
    ANCHOR_STRIDE: (16,)
    ASPECT_RATIOS: (0.5, 1.0, 2.0)
    BATCH_SIZE_PER_IMAGE: 256
    BG_IOU_THRESHOLD: 0.3
    FG_IOU_THRESHOLD: 0.7
    FPN_POST_NMS_TOP_N_TEST: 2000
    FPN_POST_NMS_TOP_N_TRAIN: 2000
    MIN_SIZE: 0
    NMS_THRESH: 0.7
    POSITIVE_FRACTION: 0.5
    POST_NMS_TOP_N_TEST: 1000
    POST_NMS_TOP_N_TRAIN: 2000
    PRE_NMS_TOP_N_TEST: 6000
    PRE_NMS_TOP_N_TRAIN: 12000
    RPN_HEAD: SingleConvRPNHead
    STRADDLE_THRESH: 0
    USE_FPN: False
  RPN_ONLY: True
  USE_SYNCBN: False
  WEIGHT: ../FCOS/FCOS_R_101_FPN_2x.pth
OUTPUT_DIR: training_dir/fcos_R_101_FPN_2x
PATHS_CATALOG: ../FCOS/maskrcnn_benchmark/config/paths_catalog.py
SOLVER:
  BASE_LR: 0.01
  BIAS_LR_FACTOR: 2
  CHECKPOINT_PERIOD: 2500
  GAMMA: 0.1
  IMS_PER_BATCH: 1
  MAX_ITER: 180000
  MOMENTUM: 0.9
  STEPS: (120000, 160000)
  WARMUP_FACTOR: 0.3333333333333333
  WARMUP_ITERS: 500
  WARMUP_METHOD: constant
  WEIGHT_DECAY: 0.0001
  WEIGHT_DECAY_BIAS: 0
TEST:
  DETECTIONS_PER_IMG: 100
  EXPECTED_RESULTS: []
  EXPECTED_RESULTS_SIGMA_TOL: 4
  IMS_PER_BATCH: 8
2019-07-31 21:59:49,205 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from ../FCOS/FCOS_R_101_FPN_2x.pth
...
loading annotations into memory...
Done (t=0.14s)
creating index...
index created!
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
2019-07-31 21:59:50,050 maskrcnn_benchmark.trainer INFO: Start training
2019-07-31 21:59:50,285 maskrcnn_benchmark.trainer INFO: Total training time: 0:00:00.234026 (0.0000 s / it)

@tianzhi0549
Copy link
Owner

@heng2j You also need to remove solver states in the checkpoint.

@heng2j
Copy link

heng2j commented Aug 1, 2019

hi @tianzhi0549, do you mind to point me out how to remove the solver states in the checkpoint?

@heng2j
Copy link

heng2j commented Aug 1, 2019

And what are the solver states that I should pay attention to? And @sunpeng981712364 , would you please also share some light on how you did it?

@tianzhi0549
Copy link
Owner

tianzhi0549 commented Aug 1, 2019

@heng2j Do you use our provided pre-trained models? We have removed all solver states in them.

@heng2j
Copy link

heng2j commented Aug 1, 2019

Hi @tianzhi0549, yes I’m using your provided pre-trained model FCOS_R_101_FPN_2x.pth and i encountered the above issue.

Do you mind to take a look at the full log in my previous comment which included all the parameters that set up for the training. I’m also wondering which keys in the head I should remove from your given checkpoints ?

I only removed ['module.rpn.head.cls_logits.weight', 'module.rpn.head.cls_logits.bias'].

Would love to know how to properly train with your given model.

@tianzhi0549
Copy link
Owner

@heng2j Please post you full log here.

@heng2j
Copy link

heng2j commented Aug 1, 2019

Hi @tianzhi0549 ,

Here you go:

[FCOS]$ python -m torch.distributed.launch --nproc_per_node=1 --master_port=$((RANDOM + 10000)) tools/train_net.py --skip-test --config-file configs/fcos/fcos_R_101_FPN_2x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_101_FPN_2x
2019-07-31 21:59:34,252 maskrcnn_benchmark INFO: Using 1 GPUs
2019-07-31 21:59:34,252 maskrcnn_benchmark INFO: Namespace(config_file='configs/fcos/fcos_R_101_FPN_2x.yaml', distributed=False, local_rank=0, opts=['DATALOADER.NUM_WORKERS', '2', 'OUTPUT_DIR', 'training_dir/fcos_R_101_FPN_2x'], skip_test=True)
2019-07-31 21:59:34,252 maskrcnn_benchmark INFO: Collecting env info (might take some time)
2019-07-31 21:59:43,826 maskrcnn_benchmark INFO:
PyTorch version: 1.0.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: CentOS Linux 7 (Core)
GCC version: (GCC) 4.9.1
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti

Nvidia driver version: 418.56
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] Could not collect
[conda] cuda92 1.0 0 pytorch
[conda] pytorch 1.0.0 py3.7_cuda9.0.176_cudnn7.4.1_1 pytorch
[conda] torchvision 0.2.1 py_2 pytorch
Pillow (6.1.0)
2019-07-31 21:59:43,826 maskrcnn_benchmark INFO: Loaded configuration file configs/fcos/fcos_R_101_FPN_2x.yaml
2019-07-31 21:59:43,827 maskrcnn_benchmark INFO:
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "../FCOS/FCOS_R_101_FPN_2x.pth"
RPN_ONLY: True
FCOS_ON: True
BACKBONE:
CONV_BODY: "R-101-FPN-RETINANET"
RESNETS:
BACKBONE_OUT_CHANNELS: 256
RETINANET:
USE_C5: False # FCOS uses P5 instead of C5
DATASETS:
TRAIN: ("cofga_train_cocostyle", "cofga_val_cocostyle")
TEST: ("cofga_test_cocostyle",)
INPUT:
MIN_SIZE_RANGE_TRAIN: (640, 800)
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MAX_SIZE_TEST: 1333
DATALOADER:
SIZE_DIVISIBILITY: 32
SOLVER:
BASE_LR: 0.01
WEIGHT_DECAY: 0.0001
STEPS: (120000, 160000)
MAX_ITER: 180000
IMS_PER_BATCH: 1
WARMUP_METHOD: "constant"
2019-07-31 21:59:43,828 maskrcnn_benchmark INFO: Running with config:
DATALOADER:
ASPECT_RATIO_GROUPING: True
NUM_WORKERS: 2
SIZE_DIVISIBILITY: 32
DATASETS:
TEST: ('cofga_test_cocostyle',)
TRAIN: ('cofga_train_cocostyle', 'cofga_val_cocostyle')
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_RANGE_TRAIN: (640, 800)
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: (800,)
PIXEL_MEAN: [102.9801, 115.9465, 122.7717]
PIXEL_STD: [1.0, 1.0, 1.0]
TO_BGR255: True
MODEL:
BACKBONE:
CONV_BODY: R-101-FPN-RETINANET
FREEZE_CONV_BODY_AT: 2
USE_GN: False
CLS_AGNOSTIC_BBOX_REG: False
DEVICE: cuda
FBNET:
ARCH: default
ARCH_DEF:
BN_TYPE: bn
DET_HEAD_BLOCKS: []
DET_HEAD_LAST_SCALE: 1.0
DET_HEAD_STRIDE: 0
DW_CONV_SKIP_BN: True
DW_CONV_SKIP_RELU: True
KPTS_HEAD_BLOCKS: []
KPTS_HEAD_LAST_SCALE: 0.0
KPTS_HEAD_STRIDE: 0
MASK_HEAD_BLOCKS: []
MASK_HEAD_LAST_SCALE: 0.0
MASK_HEAD_STRIDE: 0
RPN_BN_TYPE:
RPN_HEAD_BLOCKS: 0
SCALE_FACTOR: 1.0
WIDTH_DIVISOR: 1
FCOS:
FPN_STRIDES: [8, 16, 32, 64, 128]
INFERENCE_TH: 0.05
LOSS_ALPHA: 0.25
LOSS_GAMMA: 2.0
NMS_TH: 0.6
NUM_CLASSES: 2
NUM_CONVS: 4
PRE_NMS_TOP_N: 1000
PRIOR_PROB: 0.01
FCOS_ON: True
FPN:
USE_GN: False
USE_RELU: False
GROUP_NORM:
DIM_PER_GP: -1
EPSILON: 1e-05
NUM_GROUPS: 32
KEYPOINT_ON: False
MASK_ON: False
META_ARCHITECTURE: GeneralizedRCNN
RESNETS:
BACKBONE_OUT_CHANNELS: 256
NUM_GROUPS: 1
RES2_OUT_CHANNELS: 256
RES5_DILATION: 1
STEM_FUNC: StemWithFixedBatchNorm
STEM_OUT_CHANNELS: 64
STRIDE_IN_1X1: True
TRANS_FUNC: BottleneckWithFixedBatchNorm
WIDTH_PER_GROUP: 64
RETINANET:
ANCHOR_SIZES: (32, 64, 128, 256, 512)
ANCHOR_STRIDES: (8, 16, 32, 64, 128)
ASPECT_RATIOS: (0.5, 1.0, 2.0)
BBOX_REG_BETA: 0.11
BBOX_REG_WEIGHT: 4.0
BG_IOU_THRESHOLD: 0.4
FG_IOU_THRESHOLD: 0.5
INFERENCE_TH: 0.05
LOSS_ALPHA: 0.25
LOSS_GAMMA: 2.0
NMS_TH: 0.4
NUM_CLASSES: 81
NUM_CONVS: 4
OCTAVE: 2.0
PRE_NMS_TOP_N: 1000
PRIOR_PROB: 0.01
SCALES_PER_OCTAVE: 3
STRADDLE_THRESH: 0
USE_C5: False
RETINANET_ON: False
ROI_BOX_HEAD:
CONV_HEAD_DIM: 256
DILATION: 1
FEATURE_EXTRACTOR: ResNet50Conv5ROIFeatureExtractor
MLP_HEAD_DIM: 1024
NUM_CLASSES: 81
NUM_STACKED_CONVS: 4
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_SCALES: (0.0625,)
PREDICTOR: FastRCNNPredictor
USE_GN: False
ROI_HEADS:
BATCH_SIZE_PER_IMAGE: 512
BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
BG_IOU_THRESHOLD: 0.5
DETECTIONS_PER_IMG: 100
FG_IOU_THRESHOLD: 0.5
NMS: 0.5
POSITIVE_FRACTION: 0.25
SCORE_THRESH: 0.05
USE_FPN: False
ROI_KEYPOINT_HEAD:
CONV_LAYERS: (512, 512, 512, 512, 512, 512, 512, 512)
FEATURE_EXTRACTOR: KeypointRCNNFeatureExtractor
MLP_HEAD_DIM: 1024
NUM_CLASSES: 17
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_SCALES: (0.0625,)
PREDICTOR: KeypointRCNNPredictor
RESOLUTION: 14
SHARE_BOX_FEATURE_EXTRACTOR: True
ROI_MASK_HEAD:
CONV_LAYERS: (256, 256, 256, 256)
DILATION: 1
FEATURE_EXTRACTOR: ResNet50Conv5ROIFeatureExtractor
MLP_HEAD_DIM: 1024
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_SCALES: (0.0625,)
POSTPROCESS_MASKS: False
POSTPROCESS_MASKS_THRESHOLD: 0.5
PREDICTOR: MaskRCNNC4Predictor
RESOLUTION: 14
SHARE_BOX_FEATURE_EXTRACTOR: True
USE_GN: False
RPN:
ANCHOR_SIZES: (32, 64, 128, 256, 512)
ANCHOR_STRIDE: (16,)
ASPECT_RATIOS: (0.5, 1.0, 2.0)
BATCH_SIZE_PER_IMAGE: 256
BG_IOU_THRESHOLD: 0.3
FG_IOU_THRESHOLD: 0.7
FPN_POST_NMS_TOP_N_TEST: 2000
FPN_POST_NMS_TOP_N_TRAIN: 2000
MIN_SIZE: 0
NMS_THRESH: 0.7
POSITIVE_FRACTION: 0.5
POST_NMS_TOP_N_TEST: 1000
POST_NMS_TOP_N_TRAIN: 2000
PRE_NMS_TOP_N_TEST: 6000
PRE_NMS_TOP_N_TRAIN: 12000
RPN_HEAD: SingleConvRPNHead
STRADDLE_THRESH: 0
USE_FPN: False
RPN_ONLY: True
USE_SYNCBN: False
WEIGHT: ../FCOS/FCOS_R_101_FPN_2x.pth
OUTPUT_DIR: training_dir/fcos_R_101_FPN_2x
PATHS_CATALOG: ../FCOS/maskrcnn_benchmark/config/paths_catalog.py
SOLVER:
BASE_LR: 0.01
BIAS_LR_FACTOR: 2
CHECKPOINT_PERIOD: 2500
GAMMA: 0.1
IMS_PER_BATCH: 1
MAX_ITER: 180000
MOMENTUM: 0.9
STEPS: (120000, 160000)
WARMUP_FACTOR: 0.3333333333333333
WARMUP_ITERS: 500
WARMUP_METHOD: constant
WEIGHT_DECAY: 0.0001
WEIGHT_DECAY_BIAS: 0
TEST:
DETECTIONS_PER_IMG: 100
EXPECTED_RESULTS: []
EXPECTED_RESULTS_SIGMA_TOL: 4
IMS_PER_BATCH: 8
2019-07-31 21:59:49,205 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from ../FCOS/FCOS_R_101_FPN_2x.pth
...
loading annotations into memory...
Done (t=0.14s)
creating index...
index created!
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
2019-07-31 21:59:50,050 maskrcnn_benchmark.trainer INFO: Start training
2019-07-31 21:59:50,285 maskrcnn_benchmark.trainer INFO: Total training time: 0:00:00.234026 (0.0000 s / it)

@tianzhi0549
Copy link
Owner

@heng2j Sorry, it's our fault. We did not remove iteration in the released checkpoints. Please remove it by yourself by following this code https://github.com/tianzhi0549/FCOS/blob/master/tools/remove_solver_states.py.

@heng2j
Copy link

heng2j commented Aug 1, 2019

Hi @tianzhi0549 , I was thinking to remove the iterations as well. Thank you for your confirmation, and thank you so much for your timely helps! I will give it a try later today.

@heng2j
Copy link

heng2j commented Aug 1, 2019

Hi @tianzhi0549 , thank you it works! And I am training the model now.

@tianzhi0549
Copy link
Owner

@heng2j Happy to know this.

@Shahadate-Rezvy
Copy link

Hi,
I got the following error in step 2 when trained coco dataset with Maskrcnn_Benchmark model. Any suggestion please.

v: i + 1 for i, v in enumerate(self.coco.getCatIds())
File "/home/zst19phu/anaconda3/envs/ptorch/lib/python3.7/site-packages/pycocotools-2.0-py3.7-linux-x86_64.egg/pycocotools/coco.py", line 170, in getCatIds
cats = self.dataset['categories']
KeyError: 'categories'

The error is in Step-2, Pycoco tools is not finding the catgories from the annotation file provided. Anyone else face the similar problem, if yes, what is the solution please?

Thank you.

@tianzhi0549
Copy link
Owner

@shahdate Can you try to reinstall coco?

@Shahadate-Rezvy
Copy link

@shahdate Can you try to reinstall coco?

Hi @tianzhi0549,

Thank you for your reply. I completely deleted and reinstalled the coco multiple times. But still it is not working.

@tianzhi0549
Copy link
Owner

@shahdate Are you sure you are using correct annotation json files of COCO?

@Shahadate-Rezvy
Copy link

Hi,
Many thanks. Problem solved. I was using json files which are not according to COCO format. Now I have used Json files of COCO and Cuda 10 instead of 9. It works.
Again lot of thanks.

@tianzhi0549
Copy link
Owner

@shahdate Happy to know that.

@Shahadate-Rezvy
Copy link

Hi
I have created a Maskrcnn_benchmark model for medical images with 3 classes (High grade, Normal grade and Low grade). My model can detect only high grade, not other two. I have used coco style json file, 5000 Iterations. Any help please.
Many thanks

@hello-piger
Copy link

hello! would you mind telling me where to add my dataset in step1? I cannot find the right place to add my dataset in defaults.py.thanhk you very much!

In order to train FCOS on your own dataset, you need to,
1.Add you dataset to https://github.com/tianzhi0549/FCOS/blob/master/maskrcnn_benchmark/config/defaults.py. Please use _coco_style as the suffix of your dataset names.
2.In https://github.com/tianzhi0549/FCOS/blob/master/configs/fcos/fcos_R_50_FPN_1x.yaml, change DATASETS to your own ones.
3.Modify MODEL.FCOS.NUM_CLASSES in

@tianzhi0549
Copy link
Owner

@hello-piger I have edited it. Please check it again.

@hello-piger
Copy link

thank you for your quick response.

@menggege321
Copy link

hello,the model is very good!

@dreamhighchina
Copy link

@sunpeng981712364 你训练好了吗?我的可以训练但是推理的时候,没有结果。

@milliema
Copy link

Now I have converted my datasets format to coco format, andI want to train my own datasets using FCOS. I referenced GETTING_STARTED.md in mmdetection repo, and there is a tutorial in mmdetection repo to train my own datasets. But in FCOS repo, I find the file FCOS/maskrcnn_benchmark/data/datasets/coco.py is different like /mmdetection/mmdet/datasets/coco.py. Is there any suggestions?

May I ask what kind of annotations do you use for training? Should we include the "segmentation" in coco labels for train?

@milliema
Copy link

@EDG-Zola You do not need to change this code.
In order to train FCOS on your own dataset, you need to,

1. Add you dataset to https://github.com/tianzhi0549/FCOS/blob/efb76e48e6490a93cc8b6b5dc93738fa1df34af5/fcos_core/config/paths_catalog.py#L10
   . Please use `_coco_style` as the suffix of your dataset names.

2. In https://github.com/tianzhi0549/FCOS/blob/master/configs/fcos/fcos_R_50_FPN_1x.yaml, change `DATASETS` to your own ones.

3. Modify `MODEL.FCOS.NUM_CLASSES` in https://github.com/tianzhi0549/FCOS/blob/ff8376bb903fe11a371df658f4bc87d3d6903125/maskrcnn_benchmark/config/defaults.py#L284
    if your dataset has a different number of classes.

Why should we use _coco_style as the suffix of own dataset names? Is there any particular requirements?

@Finniu
Copy link

Finniu commented Nov 7, 2019

@EDG-Zola You do not need to change this code.
In order to train FCOS on your own dataset, you need to,

  1. Add you dataset to
    "coco_2017_train": {

    . Please use _coco_style as the suffix of your dataset names.
  2. In https://github.com/tianzhi0549/FCOS/blob/master/configs/fcos/fcos_R_50_FPN_1x.yaml, change DATASETS to your own ones.
  3. Modify MODEL.FCOS.NUM_CLASSES in
    _C.MODEL.FCOS.NUM_CLASSES = 81 # the number of classes including background

    if your dataset has a different number of classes.

Hey, I found this file (

_C.MODEL.FCOS.NUM_CLASSES = 81 # the number of classes including background
) is not in origin cloned folder , and when i am training, this file is not used as well, i checked the origin code, which have been used is /FCOS/fcos_core/config/defaults.py, so should i change the num of class in this file?

If i change there is a dimension bug:

size mismatch for rpn.head.cls_logits.weight: copying a param with shape torch.Size([80, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 256, 3, 3]).

@Finniu
Copy link

Finniu commented Nov 7, 2019

hi, I use fcos_demo.py to visualize the result and it seems right, But when I predict use tools/testnet.py with coco protocol, all the AP/AR is close to zero. Do I need to change tools/testnet.py
@tianzhi0549
Should the following code be add?
top_predictions = self.select_top_predictions(predictions)

@sunpeng981712364 Hey, have you figured out the problem 0 AP?

@alen-mask
Copy link

well, i dont think only the 3 modifications are required so as to train custom datasets.
the thresholds_for_classes has to be changed too...otherwise the trained model will gives totally different scores for each bboxes, and then the output will be none(at least, my data return very low scores compared to coco-setting).

@sathyamsn
Copy link

Hi, I'm trying to run training custom data set with 4 classes from the pre-trained model downloaded from this git. I ran the remove solver class on this downloaded .pth file and using in the .yaml. But however i keep getting below error. Please guide me which step I'm missing. Thanks!

2020-09-17 06:17:38,972 fcos_core.utils.checkpoint INFO: Loading checkpoint from pretrained_models/FCOS_syncbn_bs32_c128_MNV2_FPN_1x_wo_solver_states.pth
Traceback (most recent call last):
File "tools/train_net.py", line 180, in
main()
File "tools/train_net.py", line 173, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 59, in train
extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT)
File "/home/sathya/FCOS/fcos_core/utils/checkpoint.py", line 62, in load
self._load_model(checkpoint)
File "/home/sathya/FCOS/fcos_core/utils/checkpoint.py", line 98, in _load_model
load_state_dict(self.model, checkpoint.pop("model"))
File "/home/sathya/FCOS/fcos_core/utils/model_serialization.py", line 80, in load_state_dict
model.load_state_dict(model_state_dict)
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 779, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DistributedDataParallel:
size mismatch for module.rpn.head.cls_logits.weight: copying a param with shape torch.Size([80, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([4, 128, 3, 3]).
size mismatch for module.rpn.head.cls_logits.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([4]).

@tianzhi0549
Copy link
Owner

@sathyamsn Please remove the weights module.rpn.head.cls_logits.weight from the pre-trained checkpoint. If you do not know how to remove the weights, please refer to

del model["optimizer"]
.

@sathyamsn
Copy link

sathyamsn commented Sep 17, 2020

Thanks for the quick response. Added below removals in the remove solver code.

#################################################
del model["model"]["module.rpn.head.cls_logits.weight"]
del model["model"]["module.rpn.head.cls_logits.bias"]
#################################################

But this time entire training skipped and started evaluation directly and mAP=0. Please help. Thanks.

2020-09-17 06:51:04,827 fcos_core.trainer INFO: Start training
Done (t=0.00s)
creating index...
index created!
1%|# | 4/734 [00:00<04:14, 2.87it/s]loading annotations into memory...
loading annotations into memory...
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
loading annotations into memory...
Done (t=0.06s)
creating index...
2020-09-17 06:51:04,965 fcos_core.trainer INFO: Total training time: 0:00:00.136299 (0.0000 s / it)
loading annotations into memory...
index created!
1%|#6
index created!
2020-09-17 06:51:05,042 fcos_core.inference INFO: Start evaluation on coco_cust_validation dataset(5867 images).
1%|##1 | 8/734 [00:00<02:23, 5.05it/s]Done (t=0.06s)

@sathyamsn
Copy link

Just noticed that unlike tensorflow - the starting step should be of higher than per-trained model step. So my per-trained model trained till 90K. So when I gave 100000, training started. Thanks.

However mAP is very low. On analyzing the detected bbox size is very low when compared to the actual gt bbox size. Any suggestions.

@autumnfairytale7
Copy link

Thanks for the quick response. Added below removals in the remove solver code.

#################################################
del model["model"]["module.rpn.head.cls_logits.weight"]
del model["model"]["module.rpn.head.cls_logits.bias"]
#################################################

But this time entire training skipped and started evaluation directly and mAP=0. Please help. Thanks.

2020-09-17 06:51:04,827 fcos_core.trainer INFO: Start training
Done (t=0.00s)
creating index...
index created!
1%|# | 4/734 [00:00<04:14, 2.87it/s]loading annotations into memory...
loading annotations into memory...
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
loading annotations into memory...
Done (t=0.06s)
creating index...
2020-09-17 06:51:04,965 fcos_core.trainer INFO: Total training time: 0:00:00.136299 (0.0000 s / it)
loading annotations into memory...
index created!
1%|#6
index created!
2020-09-17 06:51:05,042 fcos_core.inference INFO: Start evaluation on coco_cust_validation dataset(5867 images).
1%|##1 | 8/734 [00:00<02:23, 5.05it/s]Done (t=0.06s)

Can you tell me how to remove the weight in details?

@sathyamsn
Copy link

@autumnfairytale7 As mentioned in previous comment by @tianzhi0549 , run the code FCOS/tools/remove_solver_states.py passing your pre trained model and remove the weights as per your error message.

@EiMaker
Copy link

EiMaker commented Dec 23, 2021

@tianzhi0549 hi, my class is 5 including background, but my ap is all 1.0, i want to ask you what factors might cause this problem? thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests