You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I read the paper, and it says that you used FHO subset of Ego4D dataset.
However, it seems that there are only about 1700 video clips in FHO subset, and your training batch size is 512.
Plus, it seems that there are some video clips that do not contain audio modality.
Hence, the code repeatedly search for videoes that contain audio and gain single sample.
Therefore, it takes so long time to consist single batch.
Is there something wrong in my understanding? or you just rely on high num_workers?
The text was updated successfully, but these errors were encountered:
Hello, I read the paper, and it says that you used FHO subset of Ego4D dataset.
However, it seems that there are only about 1700 video clips in FHO subset, and your training batch size is 512.
Plus, it seems that there are some video clips that do not contain audio modality.
Hence, the code repeatedly search for videoes that contain audio and gain single sample.
Therefore, it takes so long time to consist single batch.
Is there something wrong in my understanding? or you just rely on high num_workers?
The text was updated successfully, but these errors were encountered: