This project develops a predictive model to assess student performance using machine learning regression techniques. Aimed at educational institutions, the model predicts academic outcomes based on students' demographic and academic data. The repository includes a full suite of scripts for data preprocessing, model training, prediction, and evaluation. Additionally, the project is configured for AWS deployment with CI/CD integration, ensuring scalability and ease of updates. This approach enhances decision-making and strategic planning in education by providing actionable insights into student performance.
Below are the Independent Features used in the Model:
- gender: Student's gender (male or female)
- race_ethnicity: Group classification (e.g., group D)
- parental_level_of_education: Highest education level of parents
- lunch: Type of lunch received (standard or free/reduced)
- test_preparation_course: Completion status of a preparatory course
- physics_score: Score in physics subject
- chemistry_score: Score in chemistry subject
Model is trained using the below Machine Learning Algorithms and Algorithm giving the highest accuracy is chosen:
- Linear Regression: Predicts outcomes using linear relationships.
- Decision Tree: Splits data based on value conditions.
- Random Forest: Ensemble of decision trees, reduces overfitting.
- Gradient Boosting: Improves weak models with error-driven updates.
- AdaBoost: Boosts weak learners into strong ones iteratively.
- CatBoost: Gradient boosting with categorical data optimization.
- XGBoost: Optimized gradient boosting with scalability and speed.
- Comprehensive data preprocessing and feature engineering
- Multiple regression algorithms including Linear Regression, Decision Tree, Random Forest, and Boosting Algorithms.
- Deployment setup for AWS using CI/CD pipelines.
-
Data Ingestion is apart of components module which means reading dataset from databases/file locations
-
Ingested data is transformed inside data_transformation.py
-
Model Training will happen in model_trainer.py
-
Model prediction is occuring in predict_pipeline.py
-
Logger has all the log files
-
Exception handling is taking care by exception.py, exc_info() tells the file name and line number where the exception is occuring
- Utilization of AWS services like IAM, ECR, and EC2.
- Docker setup on AWS EC2 for container management.
- GitHub Actions for CI/CD.
- Create a user in AWS IAM .
- Create a repository in AWS Elastic Container Registry.
- Create an instance in AWS EC2.
- Setup Docker in AWS EC2 instance.
- Create Runner in GitHub, execute the commands step by step to for Download, Configure, Use the self-Hosted Runner.
sudo apt-get update -y
sudo apt-get upgrade
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker
Run the commands shown after the runner is created in GitHub for Downloading, Configuring and Running the self-hosted runner
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION = us-east-1
AWS_ECR_LOGIN_URI = demo>> 566373416292.dkr.ecr.ap-south-1.amazonaws.com
ECR_REPOSITORY_NAME = simple-app
Contributions to this project are welcome. Please fork the repository and submit a pull request.