Skip to content

DeepIntoData/machine-learning-tech-hubs

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nextTech - Find Your Next Start Up City Here

Version Documentation License: MIT--License

An interactive way for users to observe trends in major tech hub cities



✔️ Prerequisites

Assuming you have the basics set up, please proceed to pip install the following to your local or virtual environment

pip install flask pymongo pandas python-dotenv dnspython sklearn requests

NOTE: Our env file is not included as it is related to our individual Mongo database

Version for these prerequisites include...

dnspython==2.0.0
Flask==1.1.2
pandas==1.1.5
pymongo==3.11.2
python-dotenv==0.15.0
scikit-learn==0.23.2
sklearn==0.0
requests==2.24.0

🖥️ Usage

Completing the above, proceed to run the code by

python app.py

🚧 Project Outline

Our group set out to develop a machine learning model that can predict whether a zip code is a tech hub or not.

Data Sources

Gathering data

Our objective was to find usable data from the data sources listed above and make readable in a JSON format to work with our JavaScript visualization libraries. Our approach starts with identify the level of detail for location (city, neighborhood, zip codes, etc.) that is consistent across our data sources. Web APIs will then be used to pull data for NYC regions to feed into an unsupervised learning model.

Data Wrangling

Used Pandas for ETL. Cleaned the data, and gathered the specific features that we wanted. Merged the census and zillow dataframes, using zip code as our key.

Machine Learning

Unsupervised k-mean machine learning

  1. Created five clusters, using the elbow method, to define the parameters of a tech hub. This served as our training set.

  2. Analyzed each cluster to determine which cluster we would use to determine tech hub viability.

  3. Created a new column to identify the zip codes as a tech hub or not.

Supervised logistic regression machine learning:

  1. Split data into training and testing sets.

  2. Trained a logistical regression model based on outputs defined from our unsupervised machine learning model.

  3. Used this model to predict which locations across the US are tech hubs

  4. Exported trained logistical model through pickle in order to run our model through flask application

Data Loading

From here, all the data was loaded in an AWS database by creating an S3 bucket. This allows for our data to be stored remotely, which allows for anybody to run our model without needing to download all the data locally.

Then, using a provided API which we used on our Flask app


📖 Authors

👤 Deep Patel

👤 Joshua Coronel

👤 Keana Mabilog

👤 Stephano Castro


👌 Show your support

Give a ⭐️ if this project helped you!


📝 License

Copyright © 2020 Deep Patel, Joshua Coronel, Keana Mabilog & Stephano Castro.
This project is MIT licensed.


This README was generated with readme-md-generator

About

Using machine learning to predict the next tech hub

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 56.0%
  • HTML 24.4%
  • CSS 9.4%
  • JavaScript 8.3%
  • Python 1.6%
  • PHP 0.3%