Project Summary

Analyze data on songs and user activity on a music streaming app to understand what music users are listening to on the service. Design a database schema to facilitate the analysis. User activity and metadata on songs are recorded in separate JSON files which are kept in separate directories. Create an ETL pipeline to populate the new Postgres database using data from these files in the two directories. The database uses a star schema with a fact table and several dimension tables.

Schema

Ideally, to comply with a star schema, artist_id should be dropped from the songs table.

Execution

Clone the project and run the following commands in order:

To connect to the database and create the tables, run python sparkifydb_pg/create_tables.py
To populate the tables, run python sparkifydb_pg/etl.py
Query the database

Files

sql_queries.py: Contains all relevant queries that are used for creating tables and inserting data.

create_tables.py: Creates the database and the constituent tables.

etl.py: Parses the data files, transforms the data and populates the tables in the database.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
sparkifydb_pg		sparkifydb_pg
tests		tests
.gitignore		.gitignore
README.md		README.md
README.rst		README.rst
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Summary

Schema

Execution

Files

About

Releases

Packages

Languages

sunnykan/sparkifydb-pg

Folders and files

Latest commit

History

Repository files navigation

Project Summary

Schema

Execution

Files

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages