Sample ID Dataset 2.0

Introduction

A dataset for automatic sample identication. Created in 2011 for the research described in [1] and [2].

This dataset contains 105 sample relations (ids starting with S in samples.csv) between 76 songs that make use of one or more samples, and 68 songs that were sampled (ids starting with T in tracks.csv).

This dataset contains only metadata, with track titles and a few more annotations. Contact me if you would like to use audio or specific features.

Usage

The dataset is intended to be used for evaluation following a standard retrieval paradigm with query and candidate files.

The 76 tracks that contain samples are used as queries. The 68 songs are used as candidates, together with optional 'noise' files.

In [1] and [2], 320 'noise' files similar to the candidates in genre and length were added to challenge the system.

Data

Only samples used in hip hop music were considered. Regarding sample origins, there were no genre restrictions.

For representativeness, the ground truth was chosen to include both short and long samples, tonal and percussive samples, and isolated samples (the only layer in the mix) as well as background samples. So-called ‘interpolations’, i.e. samples that have been re-recorded in the studio, were avoided, as were non-musical samples (e.g. film dialogue).

The dataset was compiled using valuable information from WhoSampled and Hip Hop is Read.

Changes since 2011

Removed entries

S102 (T177 sampled by T178)

Fixed WAV files

T027.wav
T078.wav

References

Please cite one of the following when using this dataset.

[1] Van Balen, J. (2011). Automatic Recognition of Samples in Musical Audio. Master Thesis, Universitat Pompeu Fabra, Barcelona, Spain.

[2] Van Balen, J., Serrà, J., & Haro, M. (2012). Automatic Identification of Samples in Hip Hop Music. In Int. Symp. on Computer Music Modeling and Retrieval (CMMR). London, United Kingdom.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
useful_lists		useful_lists
LICENSE		LICENSE
readme.md		readme.md
samples.csv		samples.csv
tracks.csv		tracks.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sample ID Dataset 2.0

Introduction

Usage

Data

Changes since 2011

Removed entries

Fixed WAV files

References

About

Releases

Packages

License

jvbalen/sample_100

Folders and files

Latest commit

History

Repository files navigation

Sample ID Dataset 2.0

Introduction

Usage

Data

Changes since 2011

Removed entries

Fixed WAV files

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages