add min_area and pj_crop #3435

alemelis · 2019-06-18T12:26:37Z

What is this?

This PR brings back the data augmentation method implemented in pjreddie's repo.
Related to #3119

It also adds a new parameter, min_area, to be used during training to filter out detections min_area-.times smaller than the training image (net-work size). This helps with datasets ill-annotated in which too small objects are labelled even if no useful information can be retrieved.

Why?

During training the image is randomly cropped and re-scaled to simulate a zoom-in/out effect. This process is different in pjreddie and AlexeyAB repos as these employ statistically different cropping behaviours (Figure 2a)

The resulting behaviour is depicted below (blue and pink for AB's and pjreddie's, respectively). AB's method ensure that the corners of the cropped image are always in the dashed regions so that the image centre is always cropped. This goes at the expense of allowing the image orientation to randomly go from landscape to portrait (squeeze effect). pjreddie's method instead allows harsher zoom-in/out while retaining the original orientation (also, the center of the image is not forced to be always in the crop)

How to use

The cropping behaviours are regulated by the following parameters

jitter (which is defined multiple times in the [yolo] layers) is used to identify two deltas (dw and dh) to be used to generate the random crop [6]. In the ice-cream plot, the jitter parameter identify the cone base diameter.
scale_min and scale_max (used only in PJ's random cropping strategy) are used to define the random width range (the ice-cream cone length)

This PR makes possible to use both cropping methods via pj_crop switch:

0 for AlexeyAB's method
1 for pjreddie's,
2 for random between the two on an image-by-image basis

Change the .cfg file as in yolov3-tiny_pjcrop.cfg (provided)

pj_crop = 1
scale_min = 0.25
scale_max = 4.0
min_area = 0.001
...
jitter = 0.2

alemelis · 2019-06-19T12:27:58Z

@AlexeyAB @cenit
any thought on this?

AlexeyAB · 2019-06-19T13:14:40Z

@alemelis Hi, Thanks!

This PR brings back the data augmentation method implemented in pjreddie's repo.

Do you meant, that it is uses letter_box (keeping aspect-ratio Resizing : keeping aspect ratio, or not #232 (comment) ) how it is done in pjreddie repo? https://github.com/AlexeyAB/darknet/pull/3435/files#diff-2ceac7e68fdac00b370188285ab286f7R890
Why is min_area better than lowest_w / lowest_h? https://github.com/AlexeyAB/darknet/pull/3435/files#diff-2ceac7e68fdac00b370188285ab286f7R385 lowest_w and lowest_h are calculated automatically regard to resized relative size of object and network size (when size of object less than 1x1 pixel) instead of a fixed value
Why do we need scale_min and scale_max which are absent in pjreddie repo?
All random-generator (include your crop_style = random_gen()%2;) should be placed inside this if:

darknet/src/data.c

Lines 860 to 861 in 819ace3

if (!augmentation_calculated || !track)

{

since it is used to train on frame-seqneces from video, so data augmentation must be the same for all frames from one seqnece Implement Yolo-LSTM (~+4-9 AP) for detection on Video with high mAP and without blinking issues #3114 (comment)

It isn't correct, since it doesn't use break; at the end of each case and doesn't default: so I think it would be better to use if else instead of switch case there:

darknet/src/data.c

Lines 890 to 935 in 819ace3

    
           switch(crop_style){ 
        
               case 0: 
        
               { // pjreddie 
        
                   float new_ar = (ow + rand_uniform(-dw, dw)) / (oh + rand_uniform(-dh, dh)); 
        
                   float scale = rand_uniform(scale_min, scale_max); 
        
                   float nw, nh; 
        
                   if(new_ar < 1){ 
        
                       nh = scale * h; 
        
                       nw = nh * new_ar; 
        
                   } else { 
        
                       nw = scale * w; 
        
                       nh = nw / new_ar; 
        
                   } 
        
                   pleft = rand_uniform(0, w - nw); 
        
                   ptop = rand_uniform(0, h - nh); 
        
                   dx = (float)pleft/nw; 
        
                   dy = (float)ptop/nh; 
        
                   swidth = (int)nw; 
        
                   sheight = (int)nh; 
        
                   sx = nw/ow; 
        
                   sy = nh/oh; 
        
               } 
        
               case 1: 
        
               { // AlexeyAB 
        
                   int pright, pbot; 
        
                   pleft = rand_precalc_random(-dw, dw, r1); 
        
                   pright = rand_precalc_random(-dw, dw, r2); 
        
                   ptop = rand_precalc_random(-dh, dh, r3); 
        
                   pbot = rand_precalc_random(-dh, dh, r4); 
        
                   swidth = ow - pleft - pright; 
        
                   sheight = oh - ptop - pbot; 
        
                   sx = (float)swidth / ow; 
        
                   sy = (float)sheight / oh; 
        
                   dx = ((float)pleft / ow) / sx; 
        
                   dy = ((float)ptop / oh) / sy; 
        
               } 
        
           }

Do you mean that

if we use pj_crop=1 then will be used pjreddie crop method
if we use pj_crop=2 then will be used randomly one of
- pjreddie crop method (letterbox)
- AlexeyAB crop method (resize)

Did you test models by using all 3 cases: (1) OLD and new changed code (2)pj_crop=1 (3)pj_crop=2 ? What mAP did you get?

AlexeyAB · 2019-06-22T16:22:15Z

@alemelis

I added param letter_box (like your pj_crop) that can be used in the [net] section in cfg-file to train with keeping aspect ratio: c9129c2

[net]
letter_box=1

So you can test your additional suggestions

scale_min = 0.25
scale_max = 4.0
min_area = 0.001

and if you get higher mAP in some cases by using new parameters, then I will be happy to merge your pool request.

alemelis · 2019-06-26T10:51:21Z

@AlexeyAB
it took me a while, but I've just finished 6 trainings setting pjcrop to 0, 1, and 1 and switching ON/OFF min_area parameter. I used the tiny v3 model on a custom dataset of ~50k fullHD images with 13 classes annotated.

these are the 6 training logs

two things to notice:

the loss is lower when min_area is switched ON, as too small annotations are filtered out and the model can focus on the best ones
the mAP curve seems to be more stable when pjcrop = 1 is set. As you can see there are random drops for pjcrop = 0 and = 2

The best mAP during training is shown below

warning: format specifies type 'long' but the argument has type 'uint64_t'

./src/col2im.c:43:9: warning: implicitly declaring library function 'memset' with type 'void *(void *, int, unsigned long)' [-Wimplicit-function-declaration] memset(Y, 0, sizeof(float) * N); // NOLINT(caffe/alt_fn)

alemelis · 2019-07-02T14:40:53Z

@AlexeyAB

this is to reply to your questions above

yep
I do not think min_area is better than lowest_w and lowest_h, but it can be manually set depending on your dataset; I found it useful
scale_min and scale_max were hardcoded in pj's. Now they have been discarded, but again I found it useful to make them as parameters
done
done
yes
see above

I also fixed few compilation warnings

AlexeyAB · 2019-07-02T23:53:02Z

@alemelis Thank you very much!
Currently, unfortunately, there are problems and no time. As time appears, I will definitely review your code.

Yes, I think scale_min and scale_max are good feature.

Yes, I see that pjcrop=2 without minarea, and minarea with pjcrop=0/1 gives some improvements.

timwintle · 2019-07-05T19:06:37Z

src/data.c

@@ -380,7 +382,7 @@ void fill_truth_detection(const char *path, int num_boxes, float *truth, int cla
            ++sub;
            continue;
        }
-        if ((w < lowest_w || h < lowest_h)) {
+        if ((w < lowest_w || h < lowest_h) || (area < min_area)) {


lowest_w and lowest_h are both calculated relative to the network size.

Interested in why you've used a constant (w.r.t. network size) float, rather than defining it as the (integer) area in input pixels.

area is expressed in "yolo" units. Hence, this is independent from the network resolution and it is related to the image size

In my dataset, I mainly use 1920x1080px images, and a min_area=0.0001 will ignore all the annotations smaller than 14x14px. These are arguably useful for learning, but I can reintroduce them by changing min_area without touching the annotations.

A bit more detail on my thinking..

There may be two reasons for ignoring these small annotations:

a) Because of human error annotating the data.
b) Because we assume there is not enough information to accurately detect an object with a bounding area of <min_area

if (a) is true, then more data should surely fix this (law of large numbers etc). If we really want to remove these then this a trivial quick data-cleanup operation on our training images prior to learning.

if (b) is the reason, then shouldn't this filter be applied after data augmentation, based on the amount of data available during training - rather than the amount of data in the input image?

I agree that is trivial to fix case a) through some data cleaning, but I found that different architectures (v2, v3, tiny v3, etc...) at different resolutions may require a slightly different min_area threshold. I suspect this may indicate that the amount of information in a single annotation may be or not relevant depending on the net we want to train. Hence, I'd be worried of discarding a-priori some annotations only because a certain net couldn't learn anything from it.

This, in turn, links to b).

shouldn't this filter be applied after data augmentation, based on the amount of data available during training - rather than the amount of data in the input image?

Probably this would be viable in case of fixed augmentation, i.e., the images are scaled all the times in the same manner. This is not darkness case as I'm showing in the first comment to this PR.

I hope this make sense :)

alemelis added 4 commits June 18, 2019 12:34

add min_area and pj_crop

53734f6

fix double brackets

4c59ae0

fix missing pright, pbot

870e28c

fix redefine dx, dy

819ace3

alemelis and others added 7 commits July 2, 2019 14:24

Merge branch 'master' into augmentation

c0e8265

replace switch with if

bbb8030

solve following warning

14857ae

warning: format specifies type 'long' but the argument has type 'uint64_t'

solve

cf27ffe

./src/col2im.c:43:9: warning: implicitly declaring library function 'memset' with type 'void *(void *, int, unsigned long)' [-Wimplicit-function-declaration] memset(Y, 0, sizeof(float) * N); // NOLINT(caffe/alt_fn)

solve few warning: unused variable

db9f167

augment both functions

90a9f98

mini fix

8979de4

timwintle reviewed Jul 5, 2019

View reviewed changes

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add min_area and pj_crop #3435

add min_area and pj_crop #3435

alemelis commented Jun 18, 2019

alemelis commented Jun 19, 2019

AlexeyAB commented Jun 19, 2019

AlexeyAB commented Jun 22, 2019

alemelis commented Jun 26, 2019

alemelis commented Jul 2, 2019 •

edited

Loading

AlexeyAB commented Jul 2, 2019

timwintle Jul 5, 2019

alemelis Jul 6, 2019 •

edited

Loading

timwintle Jul 7, 2019

alemelis Jul 7, 2019

add min_area and pj_crop #3435

add min_area and pj_crop #3435

Conversation

alemelis commented Jun 18, 2019

What is this?

Why?

How to use

alemelis commented Jun 19, 2019

AlexeyAB commented Jun 19, 2019

AlexeyAB commented Jun 22, 2019

alemelis commented Jun 26, 2019

alemelis commented Jul 2, 2019 • edited Loading

AlexeyAB commented Jul 2, 2019

timwintle Jul 5, 2019

Choose a reason for hiding this comment

alemelis Jul 6, 2019 • edited Loading

Choose a reason for hiding this comment

timwintle Jul 7, 2019

Choose a reason for hiding this comment

alemelis Jul 7, 2019

Choose a reason for hiding this comment

alemelis commented Jul 2, 2019 •

edited

Loading

alemelis Jul 6, 2019 •

edited

Loading