Total duration of segments after filtering bad segments is less than result in paper #13

ngocson1804 · 2022-06-08T09:06:04Z

Hi,
I ran step 4 and 5 using the file /data/ja/202103.csv you provided. I got more than 10M files with a total duration of over 10,000 hours for all segments. But after filtering bad segments with min_confidence_score=-0.3, the total of number of good segments is only about 480,000 with a total duration of 351 hours. So, the yield is roughly 3.5% and the total duration is much less than what you mentioned in the paper (1,300 hours). Do you know the possible reasons?

vebmaylrie · 2022-06-08T11:26:40Z

Please decrease the threshold. We used -3.0 to obtain >1300 hour data.

ngocson1804 · 2022-06-08T14:51:02Z

Thank you for the suggestion! I tried using the threshold -3.0 and got 5.7 million segments for a total duration of 6,046 hours, which is way more than 1,300 hours. So, I checked your paper more carefully and it seems that you applied the -3.0 threshold only to the top 15k videos and the single-speaker subset to get 1,376 hours. Meanwhile, I got a total of over 100,000 videos. So, is there any reason to use only the top 15k videos? Should I use all of the 100k videos to get 6,046 hours of segments with confident score over -3.0?

Also, is there any chance you could share the dev_easy_jun21, eval_easy_jun21, dev_normal_jun21 and eval_normal_jun21 sets?

vebmaylrie · 2022-06-09T07:08:18Z

I got a total of over 100,000 videos. So, is there any reason to use only the top 15k videos?
There is no special reason. Our experiments using 15k videos were pilot studies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Total duration of segments after filtering bad segments is less than result in paper #13

Total duration of segments after filtering bad segments is less than result in paper #13

ngocson1804 commented Jun 8, 2022 •

edited

Loading

vebmaylrie commented Jun 8, 2022

ngocson1804 commented Jun 8, 2022

vebmaylrie commented Jun 9, 2022

Total duration of segments after filtering bad segments is less than result in paper #13

Total duration of segments after filtering bad segments is less than result in paper #13

Comments

ngocson1804 commented Jun 8, 2022 • edited Loading

vebmaylrie commented Jun 8, 2022

ngocson1804 commented Jun 8, 2022

vebmaylrie commented Jun 9, 2022

ngocson1804 commented Jun 8, 2022 •

edited

Loading