Project: Data Modeling with Cassandra

In order to understand user preferences for Sparkify, collected data from their streaming app were modeled with Apache Cassandra. It was created 3 Tables:

song_library The PRIMARY KEY in the table consists of two fields sessionId (Partition key) and itemInSession (Clustering key). Partitioning is done by sessionId. Withing that partition the rows are ordered by the itemInSession.
song_playlist_session The PRIMARY KEY in the table consists of composite partition keys (userId and sessionId), and Clustering key itemInSession. Partitioning is done by userId and sessionId. Withing the partition rows are ordered by itemInSession.
songs_by_users The PRIMARY KEY in the table consists of two fields song (Partition key) and userId (Clustering key). Partitioning is done by song. Withing that partition the rows are ordered by the userId

In order to answer the following questions:

Give me the artist, song title and song's length in the music app history that was heard during sessionId = 338, and itemInSession = 4
Give me only the following: name of artist, song (sorted by itemInSession) and user (first and last name) for userid = 10, sessionid = 182
Give me every username (first and last) in my music app history who listened to the song 'All Hands Against His Own'

The corresponding queries:

query = "select artist, song, length from song_library WHERE sessionId=338 AND itemInSession=4"

query = "select userId, sessionId, firstName, lastName, itemInSession, artist, song from song_user_history WHERE userId=10 AND sessionId=182"

query = "select song, firstName, lastName, artist from songs_by_users WHERE song='All Hands Against His Own' "

How to run the ETL

Run all cells in the Project_1B_ETL.ipynb file. It is assumed that Apache Cassandra is already installed and the $ nodetool status shows Active status.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Project_1B_ETL.ipynb		Project_1B_ETL.ipynb
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Data Modeling with Cassandra

How to run the ETL

About

Releases

Packages

Languages

NatLun137/Data_modeling_with_Cassandra

Folders and files

Latest commit

History

Repository files navigation

Project: Data Modeling with Cassandra

How to run the ETL

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages