Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[main branch]Request for Test Datasets for All Downstream Tasks #44

Open
ggiggit opened this issue Nov 11, 2024 · 2 comments
Open

[main branch]Request for Test Datasets for All Downstream Tasks #44

ggiggit opened this issue Nov 11, 2024 · 2 comments

Comments

@ggiggit
Copy link

ggiggit commented Nov 11, 2024

Hi,

I am currently working with the Codec-SUPERB dataset from the main branch and would like to obtain the test datasets for all downstream tasks. Specifically, I am looking to get the test datasets for the following tasks:

ASR Task:

  • Dataset: LibriSpeech's test-clean and test-other

ASV Task:

  • Dataset: Not specified (Is it also using VoxCeleb1's test set as mentioned in the SLT paper?)

ER Task:

  • Dataset: IEMOCAP's 4-class balanced subset

ASE Task:

  • Dataset: AudioSet's validation set

I tried using the data with the original prefix for downstream task testing, but I found that it lacks labels, making it impossible to evaluate the downstream tasks. To ensure the accuracy and reliability of my experiments, I would like to directly obtain the test datasets and their corresponding labels for these tasks.

If possible, could you please provide the download links or the method to obtain these datasets? Thank you very much for your help and support.

@voidful
Copy link
Owner

voidful commented Nov 12, 2024

Thank you for reaching out regarding the Codec-SUPERB dataset and your work on downstream task evaluations. I understand that you're seeking test datasets and their corresponding labels for ASR, ASV, ER, and ASE tasks, and I'd be happy to provide further clarification.

For downstream evaluations in our current setup, we utilize specific pre-trained models rather than retraining new models from scratch for each task. These pre-trained models are applied across various tasks to ensure consistency in evaluation and comparability across metrics.

I’ll prepare and share more details on the models we’re using in each task shortly. In the meantime, please feel free to reach out if you need any specific dataset access or further assistance with setting up your evaluations.
Thank you for your patience, and please let me know if there’s anything else I can help with.

@ggiggit
Copy link
Author

ggiggit commented Nov 14, 2024

Hi,

Thank you for your response.

I am trying to reproduce the results from the Codec-SUPERB leaderboard, but I do not have the datasets needed for the downstream task evaluations. Could you please provide the datasets so that I can reproduce the results?

Your assistance in this matter would be greatly appreciated.

Thank you for your help and support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants