Skip to content

Commit

Permalink
fix bug for MSVD
Browse files Browse the repository at this point in the history
  • Loading branch information
Andy1621 committed Sep 24, 2023
1 parent 2c03d2a commit a0d65b4
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 2 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@ By [Kunchang Li](https://scholar.google.com/citations?user=D4tLSbsAAAAJ), [Yali

## Update

- :warning: **2023/09/20**: **Fix bug in UMT pretraining.** Add autocast for teacher forward, which can halve the pretraining time.
- :warning: **2023/09/25**: **Bug for MSVD retrieval.** Check it [here](./multi_modality/README#warning).
- :rocket: **2023/09/20**: **Fix bug in UMT pretraining.** Add autocast for teacher forward, which can halve the pretraining time.
- :fire: **2023/07/19**: **All the code and models are released.**
- [single_modality](./single_modality/): Single-modality pretraining and finetuning.
- Action Classification: [Kinetics](https://www.deepmind.com/open-source/kinetics), [Moments in Time](http://moments.csail.mit.edu/), [Something-Something](https://developer.qualcomm.com/software/ai-datasets/something-something).
- Action Detecetion: [AVA](http://research.google.com/ava/).
- Action Detection: [AVA](http://research.google.com/ava/).
- **The models and scripts are in [MODEL_ZOO](./single_modality/MODEL_ZOO.md). Have a try!**
- [multi_modality](./multi_modality/): Multi-modality pretraining and finetuning.
- Video-Text Retrieval: [MSRVTT](https://www.microsoft.com/en-us/research/publication/msr-vtt-a-large-video-description-dataset-for-bridging-video-and-language/), [DiDeMo](https://github.com/LisaAnne/TemporalLanguageRelease), [ActivityNet](http://activity-net.org/), [LSMDC](https://sites.google.com/site/describingmovies/), [MSVD](https://www.cs.utexas.edu/users/ml/clamp/videoDescription/), [Something-Something](https://github.com/jayleicn/singularity).
Expand Down
8 changes: 8 additions & 0 deletions multi_modality/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,14 @@ You can find the dataset instructions in [DATASET](DATASET.md). We have provide

You can find all the models and the scripts in [MODEL_ZOO](./MODEL_ZOO.md).

## Warning

Thanks for some recent issues, and I finally find **the bug for MSVD testing**. I have been confused for the 'extremely high result' for a long time. 😄

Different from ANet and DiDeMo, which use paragraph for retrieval, MSVD has multiple text for one videos. Thus the `is_paragraph_retrieval` should be set to `False` for retrieval.

After fixing the bug, the zero-shot results for `ViT-L/16_25M` is `49.0`, not `72.2`. The results are quite normal, but still the best zero-shot results. I will conduct the corresponding experiments and update the results in the paper latter.

## Pre-Training

We use [CLIP](https://github.com/openai/CLIP) pretrained models as the unmasked teachers by default:
Expand Down

0 comments on commit a0d65b4

Please sign in to comment.