fix bug for MSVD

OpenGVLab · Sep 24, 2023 · a0d65b4 · a0d65b4
1 parent 2c03d2a
commit a0d65b4
Show file tree

Hide file tree

Showing 2 changed files with 11 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -6,11 +6,12 @@ By [Kunchang Li](https://scholar.google.com/citations?user=D4tLSbsAAAAJ), [Yali
 
 ## Update
 
-- :warning: **2023/09/20**: **Fix bug in UMT pretraining.** Add autocast for teacher forward, which can halve the pretraining time.
+- :warning: **2023/09/25**: **Bug for MSVD retrieval.** Check it [here](./multi_modality/README#warning).
+- :rocket: **2023/09/20**: **Fix bug in UMT pretraining.** Add autocast for teacher forward, which can halve the pretraining time.
 - :fire: **2023/07/19**: **All the code and models are released.**
   - [single_modality](./single_modality/): Single-modality pretraining and finetuning.
     - Action Classification: [Kinetics](https://www.deepmind.com/open-source/kinetics), [Moments in Time](http://moments.csail.mit.edu/), [Something-Something](https://developer.qualcomm.com/software/ai-datasets/something-something).
-    - Action Detecetion: [AVA](http://research.google.com/ava/).
+    - Action Detection: [AVA](http://research.google.com/ava/).
     - **The models and scripts are in [MODEL_ZOO](./single_modality/MODEL_ZOO.md). Have a try!**
   - [multi_modality](./multi_modality/): Multi-modality pretraining and finetuning.
     - Video-Text Retrieval: [MSRVTT](https://www.microsoft.com/en-us/research/publication/msr-vtt-a-large-video-description-dataset-for-bridging-video-and-language/), [DiDeMo](https://github.com/LisaAnne/TemporalLanguageRelease), [ActivityNet](http://activity-net.org/), [LSMDC](https://sites.google.com/site/describingmovies/), [MSVD](https://www.cs.utexas.edu/users/ml/clamp/videoDescription/), [Something-Something](https://github.com/jayleicn/singularity).

diff --git a/multi_modality/README.md b/multi_modality/README.md
@@ -14,6 +14,14 @@ You can find the dataset instructions in [DATASET](DATASET.md). We have provide
 
 You can find all the models and the scripts in [MODEL_ZOO](./MODEL_ZOO.md).
 
+## Warning 
+
+Thanks for some recent issues, and I finally find **the bug for MSVD testing**. I have been confused for the 'extremely high result' for a long time. 😄
+
+Different from ANet and DiDeMo, which use paragraph for retrieval, MSVD has multiple text for one videos. Thus the `is_paragraph_retrieval` should be set to `False` for retrieval.
+
+After fixing the bug, the zero-shot results for `ViT-L/16_25M` is `49.0`, not `72.2`. The results are quite normal, but still the best zero-shot results. I will conduct the corresponding experiments and update the results in the paper latter.
+
 ## Pre-Training
 
 We use [CLIP](https://github.com/openai/CLIP) pretrained models as the unmasked teachers by default: