Machine-Learning

It contains implementation of various Supervised and Unsupervised Machine Learning Algorithm using Sckit Learn Libraries: The folder heirarchy is as follows:

Clustering: Implementation of Kmeans CLustering using python SckitLearn libraries
Linear Regression: Implementation of Linear Regression Model + Logistic Regression Model + Regularised Models(Elastic Net & Lasso) + Hyperparameter tuning using Grid Search CV + ROC curve and AUC for logostic model
Machine Learning Pipelines: Implements ML pipeline which include filling in missing values, Scaling the columns and implementating a ML Model
Random Forest: Implements Random Forest for both classification and Regression. Implement Ensemble Learning using Random Forest and Linear Regression
SVM: Implements SVM classifier using python libraries

Machine Learning Projects:

=> Clustering&PCA: Perform Unsupervised k means clustering to design an environment to predict a class/category from a dataset based on specific features of that class. However, all the features are not strong enough or in other words features not that much variance/uniqueness across the classes. So clustering model needs to be designed. Following steps were performed:

Loading the dataset into a pandas dataframe
Scaling of the dataset
Seperating features from labels
Implement K means Clustering
Plot Elbow Plot to find the actual number of classes in the dataset
Write a function to calculate purity score of the clusters
Implement k-means clustering for different distance metrics using pyclusteting library and find purity score
Find the best metrics based purity score
Use selection criteria (ANOVA, Chi-squared) to select best three features and use them for K-Means clustering
Implement PCA on the dataset
Plot captured variance with respect to increasing latent dimensionality

=>Book Recommendation Based On Text Classification The purpose of this project was the recommendation of a new book to a reader based on the content of the previous book a reader read. The whole content of each book was converted into tokens. Stop words were removed from the tokens and then the stemming process was implemented on the tokens. A tf-idf model (term frequency-inverse document frequency) was then implemented to define the importance of each word depending on how frequent it is in this text and how infrequent it is in all the other documents. As a result, a high tf-idf score for a word will indicate that this word is specific to this text. Furthermore, Cosine similarity was then implemented to find the pairwise difference between each book from the other book. Lastly, the result was visualized using a dendrogram to show the recommended books after reading a specific book. The data set used for this project was a list of all books written by Charles Dickens available at Project Gutenberg.

=>Forcasting temperature based on yearly weather data: Environment and its changes are the most complex system.The dataset contains total 10 features. Each row contains an hourly record of weather status and the data was recorded for the time period between 2006 and 2016.This project implements Linear Regression to predict temperature and implement Logistic Regresion to predict class label for precipitation type. Following steps were performed:

Loading the dataset into a pandas dataframe
Draw a heat map to find insignificant features for predicting temperature
Remove insignificant features
Seperating Features and Labels
Divide data into Train and Test in 70/30 ratio
Implement Linear Regression and calculate Accuracy
Create a regularised Regression model by implementing Ridge Regression, Grid Search CV, kfold Cross validation and calculate accuracy
Implement Logistic Regression to predict Precipitation Type Columns class label
Discuss the test performance using precision, recall and confusion matrix

=>Classification model to predict the approval of a credit card application The aim of the project was to overcome the task of manually analyzing the credit card applications and using the power of machine learning to create an automatic credit card approval predictor using ML techniques. The data set used for this project was taken from Credit Card Approval Dataset from the UCI Machine learning Repository. The dataset was anonymized for confidentiality purposes. This project involved loading the data, filling the missing values using means for numerical columns, and most frequent values for categorical columns. Moreover, the categorical columns were then converted to numerical using One hot encoder. The numerical columns were also standardized between 0 and 1. The data set was then split in train and the test split and logistic model was then implemented. To improve the accuracy of the model a regularized logistic model was then created using hyperparameter tuning and 5-fold cross-validation. The classification accuracy of the model was recorded as 85%. A classification report and confusion matrix was also created to better understand the results.

=>Categorization of invoices into categories(Salary,food,Traveling) using Natural Language Procsssing Most of the companies spent loads of money at various places like employee salaries, buying raw materials, transport etc which makes it difficult to manually catogorise each invoice in a relevant category. This helps later as the company would know where did they spent the most amount of the money. This project involces loading a dataset of Text invoices using pyhon. Performing text processing by removing stop words and perform stemming and lemitization. Features are then seperated from the class labels(Salary, food, raw material etc) and the dataset is divided into test and train set in 70/30 ratio. Then CNN, sequential model with LST layerand RNN models are implemented to perform Text Classification and accuracy is noted for each of the model. All three models use loss='categorical_crossentropy', optimizer='adam', metrics as accuracy. Graph for accuracy and loss are plotted and the model is fit for certain amount of epochs. Accuracy of the model is calculated on the test set. RNN model records the highest Text Classification accuracy of 89.45%.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Clustering		Clustering
Linear Regression		Linear Regression
Machine Learning Pipelines		Machine Learning Pipelines
Machine Learning Projects		Machine Learning Projects
Random Forest Algorithm		Random Forest Algorithm
SVM		SVM
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine-Learning

About

Releases

Packages

Languages

maddy501/Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Machine-Learning

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages