An interactive way for users to observe trends in major tech hub cities
Assuming you have the basics set up, please proceed to pip install the following to your local or virtual environment
pip install flask pymongo pandas python-dotenv dnspython sklearn requests
NOTE: Our env file is not included as it is related to our individual Mongo database
Version for these prerequisites include...
dnspython==2.0.0
Flask==1.1.2
pandas==1.1.5
pymongo==3.11.2
python-dotenv==0.15.0
scikit-learn==0.23.2
sklearn==0.0
requests==2.24.0
Completing the above, proceed to run the code by
python app.py
Our group set out to develop a machine learning model that can predict whether a zip code is a tech hub or not.
-
Census report API (Age, education, ethnic group, median salary)
-
Zillow API (Real estate data)
Our objective was to find usable data from the data sources listed above and make readable in a JSON format to work with our JavaScript visualization libraries. Our approach starts with identify the level of detail for location (city, neighborhood, zip codes, etc.) that is consistent across our data sources. Web APIs will then be used to pull data for NYC regions to feed into an unsupervised learning model.
Used Pandas for ETL. Cleaned the data, and gathered the specific features that we wanted. Merged the census and zillow dataframes, using zip code as our key.
Unsupervised k-mean machine learning
-
Created five clusters, using the elbow method, to define the parameters of a tech hub. This served as our training set.
-
Analyzed each cluster to determine which cluster we would use to determine tech hub viability.
-
Created a new column to identify the zip codes as a tech hub or not.
Supervised logistic regression machine learning:
-
Split data into training and testing sets.
-
Trained a logistical regression model based on outputs defined from our unsupervised machine learning model.
-
Used this model to predict which locations across the US are tech hubs
-
Exported trained logistical model through pickle in order to run our model through flask application
From here, all the data was loaded in an AWS database by creating an S3 bucket. This allows for our data to be stored remotely, which allows for anybody to run our model without needing to download all the data locally.
Then, using a provided API which we used on our Flask app
👤 Deep Patel
- Website: www.mrdeeppatel.com
- Github: @Frozte
- LinkedIn: @Deep Patel
👤 Joshua Coronel
- Github: @joshuajonme
- LinkedIn: @Joshua Coronel
👤 Keana Mabilog
- Github: @keana-m
- LinkedIn: @Keana Mabilog
👤 Stephano Castro
- Github: @castrostephano
- LinkedIn: @Stephano Castro
Give a ⭐️ if this project helped you!
Copyright © 2020 Deep Patel, Joshua Coronel, Keana Mabilog & Stephano Castro.
This project is MIT licensed.
This README was generated with readme-md-generator