Sparkify-data-modelling

This is project 1/5 of Udacitys Data Engineering Nanodegree. In this project a database for storing music and artist records are created. Data is then extracted from the source, transformed using Pandas DataFrame, and loaded into the database. Two sets of data is used in the ETL process; song and log data. Song data provides song and artist information, while Log data is more extensive; providing covers song, artist and some metadata about each song. Log data is more extensive, providing artist and artist metadata.

Prerequisites for running the program

Prerequisites for running the project is python 3.x and postgres with a default database named "studentdb" available.

Starting the program

Execute "create_tables.py". This will create a fresh instance of the sparkifydb with empty tables.
Execute "etl.py". This will load the data into the tables

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data		data
.gitignore		.gitignore
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparkify-data-modelling

Prerequisites for running the program

Starting the program

About

Releases

Packages

Languages

ovsundal/Sparkify-sql-data-modelling

Folders and files

Latest commit

History

Repository files navigation

Sparkify-data-modelling

Prerequisites for running the program

Starting the program

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages