Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MusicCaps train-test splitting details #4

Open
jpgard opened this issue Sep 14, 2023 · 2 comments
Open

MusicCaps train-test splitting details #4

jpgard opened this issue Sep 14, 2023 · 2 comments

Comments

@jpgard
Copy link

jpgard commented Sep 14, 2023

Congrats on the great work! This is a really useful model and the demo is super handy as well.

I am wondering how you performed train-test splitting on the MusicCaps dataset? Specifically, which parts of the MusicCaps dataset were used for LP-MusicCaps training, and which were used for evaluation? Was the audio from the eval set used during training (with generated captions)? Is there code in the repo somewhere that could be used to replicate your splitting process?

The paper says "we present the captioning result for MusicCaps [12] evaluation set", but it is not clear if the audio and tags from that evaluation set were used during model training? MusicCaps also contains a few different fields ("is_balanced_subset", "is_audioset_eval") that seem like they could possibly be used in the test set partitioning, it would be great to know how you divided the dataset and what audio/tags/captions were used at various stages of the experiments.

Thank you for the clarification!

@jpgard
Copy link
Author

jpgard commented Sep 15, 2023

I can see the track_split.json with the contents below, but it isn't clear from that file what the validation/test sets are, nor where this splitting is used in the experiments.

{
    "train_track": [
        "[rOOBAGxxjBk]-[10-20]",
        "[OmjfHQB_lcs]-[30-40]",
        "[KxVbdGPAfjE]-[30-40]",
        "[WyGJdstaxK4]-[30-40]",
        "[qEGNzCWQdqo]-[30-40]",
        "[Zbmm_hXcrA0]-[160-170]",
        "[AHmcuClSTL4]-[100-110]",
        "[OMcoFfaCaGM]-[30-40]",
        "[pIwn0udLJXI]-[120-130]",
        "[60OIHit4Q-M]-[30-40]",
        "[kh6rmFg3U4k]-[480-490]",
        "[24cmo2fEQo8]-[60-70]",
        "[-kpR93atgd8]-[30-40]",
        "[4zZiWBp0b08]-[30-40]",
        "[yreWOyWr6Uk]-[330-340]",
        "[aKhM6zyL--k]-[330-340]",
        "[XEIP1OUXU8E]-[140-150]",
        "[WTVC7ZI9WtY]-[30-40]",
        "[yRWndZvIAHc]-[30-40]",
        "[sOJSjVp6UTc]-[30-40]"
    ],
    "valid_track": [],
    "test_track": []
}

@seungheondoh
Copy link
Owner

seungheondoh commented Sep 18, 2023

@jpgard Thank you for reporting this issue! First, I apologize for any confusion caused by the paper.

  1. MusicCaps Evaluation Split: We used 2.86k items from "is_audioset_eval" as the evaluation split. Please use the link below for reference. Additionally, we only used "caption_ground_truth" as the evaluation captions.
  1. Pseudo captions from the MusicCaps dataset are only used in Section 3, "EVALUATION OF PSEUDO CAPTIONS," and pseudo captions based on the MSD dataset are used in Section 5, "AUTOMATIC MUSIC CAPTIONING."

If you have any further questions or concerns, please feel free to reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants