Sparkify - Churn Prediction for music streaming app with PySpark

This repository is part of the final project submited to Udacity for the Data Science Nanodegree. The objective is to predict churn, from a simulated music streaming app, using historical data from user interactions.

A blog post with a detailed analysis is available at https://medium.com/@ttozatto.ds/churn-prediction-for-music-streaming-app-sparkify-d6e26d1ac80f

Dependencies

pyspark
matplotlib

Files

utils.py -> function to load and treat data, create, train and evaluate ML models
main.py -> script to run the full process, from loading the dataset to showing results
medium-sparkify-event-data.json -> dataset with user interactions in the app. Available at: https://video.udacity-data.com/topher/2018/December/5c1d6681_medium-sparkify-event-data/medium-sparkify-event-data.json
Sparkify.ipynb -> Initial exploratory analysis. Final modeling and tuning were done in the 2 scripts listed above.

Summary of Results

Test Scores

Parameters for best models

Feature importance

Aknowledgements:

I would like to pay my special regards to:

Udacity, that proposed this work in the Data Science Nanodegree.
Spark team and community, that provides a powerful opensource tool to everyone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Sparkify - Churn Prediction for music streaming app with PySpark

Dependencies

Files

Summary of Results

Test Scores

Parameters for best models

Feature importance

Aknowledgements:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Sparkify - Churn Prediction for music streaming app with PySpark

Dependencies

Files

Summary of Results

Test Scores

Parameters for best models

Feature importance

Aknowledgements: