Experimental results #50

lshssel · 2023-11-27T10:16:17Z

Hi，
Because our group only assigned me a 2080Ti, the training took a long time, for MOWODB's task 1, it took 43 hours.
Unfortunately, on training to the 35th epoch, wandb crashes, so its curve also stops at the 35th epoch.
However, the program is still running without errors, and the file "checkpoint0040.pth" is also generated in the end, and the program can run smoothly when I use this file to train task 2.

Below are the wandb graphs and hyperparameters, which don't work very well, and I may need to tune the parameters as close to the original performance as possible.

K_AP50 is 52.476, U_R50 is 21.042

lshssel · 2023-12-02T04:19:06Z

Second experiment

orrzohar · 2023-12-03T04:28:20Z

Hi @lshssel,

Hmmm batch_size 1 will be more difficult to fine-tune, but let's try.
Given your experiments, I would try:
lr=2e-5, lr_drop=60, epochs=70

The main idea is that you want the improvement to saturate and then reduce the learning rate. Training continually after the improvement has saturated doesn't help at all (U_R just goes down and AP50 doesn't go up) but if you reduce the lr_drop too soon (before AP50 starts to saturate) then the K_AP50 is 'frozen' too soon and doesn't improve enough.

Best,
Orr

lshssel · 2023-12-03T12:30:16Z

Thanks for the reply, I'll try later

orrzohar · 2023-12-17T00:31:29Z

Hi @lshssel,
Were your results sufficiently improved?
If so, can you give me all the details re. your system/hyperparamers so I can add it for future users on the README?
Best,
Orr

lshssel · 2023-12-17T03:56:11Z

one 2080Ti，for all experiments，batch_size=1
about epochs，the first value is in "main_open_world.py"，and the second value is in "M_OWOD_BENCHMARK.sh".
t1.2 : lr=4e-5 lr_backbone=4e-6 epochs=51=41 lr_drop=35 K_AP50=58.36 U_R50=16.50
t1.3 : lr=2e-5 lr_backbone=4e-6 epochs=51=41 lr_drop=35 K_AP50=57.99 U_R50=19.27
t1.4 : lr=2e-5 lr_backbone=4e-6 epochs=56=46 lr_drop=40 K_AP50=57.60 U_R50=18.55
t1.6 : lr=2e-5 lr_backbone=4e-6 epochs=61=41 lr_drop=40 K_AP50=57.17 U_R50=19.34

Looking forward to your suggestions！

lshssel · 2023-12-17T03:59:40Z

orrzohar · 2023-12-17T20:14:07Z

Hi @lshssel,
I would like to try something new with you. I had the idea that what is happening is that with different batch sizes, the objectness temperature does need to change.
Good news: no need for training. I would try to use the checkpoints t1.2, t1.3, and re-evaluate with different --obj_temp and sweep a few values (e.g.,0.9, 1.1, 1.2) - default is 1. Should be relatively quick - as you only need to evaluate (use the --eval flag).

Best,
Orr

lshssel · 2023-12-18T08:18:27Z

I evaluated with t1.3checkpoint40 and t1.2 has been removed. When training, obj_temp = 1.3，obj_loss_coef=8e-4.
I also used obj_loss_coef with different values，but nothing has changed

obj_temp=1.1 K_AP50=57.1914 U_R50=19.2453
obj_temp=1.2 K_AP50=57.6161 U_R50=19.2581
obj_temp=1.3 K_AP50=57.9826 U_R50=19.2624 obj_loss_coef=8e-4
obj_temp=1.4 K_AP50=57.9075 U_R50=19.2367
obj_temp=1.5 K_AP50=57.8653 U_R50=19.2453

obj_loss_coef=4e-4 K_AP50=57.9826 U_R50=19.2624
obj_loss_coef=8e-4 K_AP50=57.9826 U_R50=19.2624
obj_loss_coef=1.6e-3 K_AP50=57.9826 U_R50=19.2624
obj_loss_coef=4e-3 K_AP50=57.9826 U_R50=19.2624

So t1.3 is probably the best result that a 2080Ti can show

orrzohar · 2023-12-18T22:17:16Z

Hi @lshssel,
I want to ensure you understand you don't need to train with a different obj_temp -- you can change this just for evaluation. Unfortunately, it does seem that this is the best result with batch_size=1. Perhaps we could improve it a little more, but probably not much.

I want to add this to the readme. Would you mind providing all the hyperparameters you changed?

lshssel · 2023-12-19T04:08:18Z

Hi，
Yes, I understand what you mean, I use different obj_temp values for evaluation.
As mentioned earlier, changing the value of the obj_temp did not improve performance.
If batch size=2, then cuda out of memory, so it can only be 1 on a 2080Ti(11G).
My hyperparameter is set to:
lr=2e-5 lr_backbone=4e-6 batch size=1, nothing else has changed.
Thank you again for your excellent work and answering my questions.

WangPingA · 2024-06-06T01:12:41Z

Hello, I also used a 2080Ti card to complete the entire experiment. I conducted the experiment according to "lr=2e-5, lr'backbone=4e-6, batch size=1, obj_temp=1.3". My results are shown in the following figure. I don't know why some results are actually higher than the results mentioned in the paper. By the way, it took me about 8 days to complete the entire experiment

orrzohar · 2024-06-17T14:32:03Z

Hi @WangPingA,

When you train a model with a different batch size, your results will vary. That is because your gradient updates will not be the same, as you use a different batch size. Variations of +-2 seem reasonable.

lshssel also ran experiments with a 2080Ti, and got:

If you are interested in applications, then perhaps my recent work, FOMO, will interest you; it is much less compute-heavy to train and will have relatively strong open-world performance by leveraging foundation object detection model. An easy upgrade there is to switch owl-vit to owlv2.

Best,
Orr

orrzohar self-assigned this Dec 2, 2023

lshssel closed this as completed Dec 12, 2023

lshssel reopened this Dec 17, 2023

orrzohar closed this as completed Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental results #50

Experimental results #50

lshssel commented Nov 27, 2023 •

edited

Loading

lshssel commented Dec 2, 2023

orrzohar commented Dec 3, 2023

lshssel commented Dec 3, 2023

orrzohar commented Dec 17, 2023

lshssel commented Dec 17, 2023 •

edited

Loading

lshssel commented Dec 17, 2023

orrzohar commented Dec 17, 2023

lshssel commented Dec 18, 2023

orrzohar commented Dec 18, 2023 •

edited

Loading

lshssel commented Dec 19, 2023 •

edited

Loading

WangPingA commented Jun 6, 2024 •

edited

Loading

orrzohar commented Jun 17, 2024

Experimental results #50

Experimental results #50

Comments

lshssel commented Nov 27, 2023 • edited Loading

lshssel commented Dec 2, 2023

orrzohar commented Dec 3, 2023

lshssel commented Dec 3, 2023

orrzohar commented Dec 17, 2023

lshssel commented Dec 17, 2023 • edited Loading

lshssel commented Dec 17, 2023

orrzohar commented Dec 17, 2023

lshssel commented Dec 18, 2023

orrzohar commented Dec 18, 2023 • edited Loading

lshssel commented Dec 19, 2023 • edited Loading

WangPingA commented Jun 6, 2024 • edited Loading

orrzohar commented Jun 17, 2024

lshssel commented Nov 27, 2023 •

edited

Loading

lshssel commented Dec 17, 2023 •

edited

Loading

orrzohar commented Dec 18, 2023 •

edited

Loading

lshssel commented Dec 19, 2023 •

edited

Loading

WangPingA commented Jun 6, 2024 •

edited

Loading