There are 4 columns in the csv files.
- audiocap_id: The id unique to the audio clips and its corresponding caption.
- youtube_id: The youtube clip that the audio belongs to. You can use this to obtain the VGGish embedding from AudioSet.
- start_time: The start time of the clip.
- caption: The audio caption.
Split | AudioCaps | Downloaded |
---|---|---|
Train | 49,838 | 45,528 |
Validation | 495 | 449 |
Test | 975 | 893 |
Total | 51,308 | 46,870 |
Last edit: Jan 30, 2023