Time Machine Track(TMTrack) Dataset Generator

Dataset generator for The Hot 100 billboard standard. Uses Wikipedia as the source for the track for a given year.

Generated Metadata

The generator makes use of Wikipedia to get the Hot 100 songs from 1946-2020. Once the song and artist is known, a Youtube link of the instrumental version of the song is fetched and is appended to each record. The csv will be dumped as billboard_top_100.csv to the directory that is specified in config.

Dataset Schema

The output dataset consists of 4 columns labled: [year, artist, song, youtube_search_url].

year(numerical)= year the song made it to Hot 100.
artist(string)= artist of the song that made it to Hot 100.
song(string)= title of the song that made it to Hot 100.
youtube_search_url(string)= youtube url of the instrumental song. If url is not found, the default value of the field will be ""(empty string).

Sample

year,artist,song,youtube_search_url
2000,Faith Hill,Breathe,https://www.youtube.com/watch?v=DDfcnBpQDNY
2000,"Santana, Rob Thomas",Smooth,https://www.youtube.com/watch?v=TDjDIhiIXQs
2000,"Santana, The Product G&B",Maria Maria,https://www.youtube.com/watch?v=DFDAWasYOfo

Use Cases

The uses case for the dataset is for performing data science based on DSP analysis of songs. Post processing of the out CSV

Config

Inside of config.py the following config options are available for the user to change when generating the dataset:

OUTPUT_DIR= directory to which to save the output dataset csv.
START_YEAR= the start year for which to get The Hot 100. Support range is between [1946,2020].
END_YEAR= the end year for which to get The Hot 100. Support range is between [1946,2020].

Setup & Running

To run and generate the dataset, the following steps must be followed:

Install Python 3.7+.
Get dependencies with pip3 install -r requirements.txt
Setup config as described.
Run the script with python3 run.py. Note that this step could take few hours.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Time Machine Track(TMTrack) Dataset Generator

Generated Metadata

Dataset Schema

Sample

Use Cases

Config

Setup & Running

About

Releases

Packages

Contributors 2

Languages

ml-lubich/music-sentimental-analysis

Folders and files

Latest commit

History

Repository files navigation

Time Machine Track(TMTrack) Dataset Generator

Generated Metadata

Dataset Schema

Sample

Use Cases

Config

Setup & Running

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages