-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two questions about the experimental results in Tabel 1 of the paper. #16
Comments
Hi, Thanks for your raised question.
|
Hi,
1 pre-trained weights:
The pre-training weights I use is
https://drive.google.com/file/d/1JfrhN144Hdg7we213H1WxwR3lGYOlmIn/view
the location where the picture is shown in Annex 1.
2 shell:
I basically did not change the shell relative to VideoMAE, but only changed the batch size from 256 to 64. The equivalent is now using the parameters of K400 pretrain 800 rounds and fintune 50 rounds on ssv2. The final results obtained are compared with the same case using ssv2 pretrain 1600 rounds, as the picture in Annex 2 shows, they both exceed 65+ acc1. And the shell is in the Annex 3.
|
Thanks for your reply. Did you experiment with the VideoMAE codebase? I guess you experiment with strong data augmentation and optimizer (e.g. AdamW). For fair comparison to linear probling, We experiment with the same setting as linear probe, which uses SGD and does not contain strong data augmentation. Please let me know if I miss something. |
Thanks for your reply. I will do the experiment to verify it again! |
@ShoufaChen @yangzhen1997 I was also experiencing the same problem. Even though you removed those augmentations and AdamW optimizer, will your method still be able to improve the results? Based on my experiments, adding augmentations and AdamW optimizer did not improve (and sometimes degraded) the performance. This is because in full fine-tuning, they are used to reduce the model overfitting when tuning many parameters. However, in VPT and your method, since we are only tuning a small fraction of parameters, it does not improve model performance. Therefore, would it be fair to report the full fine-tuning results without any augmentations and sophisticated optimizers? |
Hi, I would like to ask you two questions about the experimental results in the paper's Table 1.
![截屏2022-10-10 17 56 13](https://user-images.githubusercontent.com/60163965/194841874-4e2dfc8a-9a96-401e-b4b6-0c570d0644cc.png)
![1](https://user-images.githubusercontent.com/60163965/194843124-209c1e2b-fade-4189-84ab-b56bd96ee7d2.png)
I would like to ask where the acc 53.97 of full tuning of ssv2 was obtained?
When I read VideoMAE, I found that pretrain on ssv2 and then finetune on ssv2 can get 69.3 results. I know your paper is using the K400's pretrain parameters, but I also did experiments and I can achieve 65+ results with 50 rounds finetune on ssv2:
The text was updated successfully, but these errors were encountered: