This project focuses on the development of a machine learning model to identify phishing websites. Utilizing a neural network, the model classifies websites into 'phishing' or 'non-phishing' categories based on various features extracted from URLs.
Phishing attacks are a prevalent issue in cybersecurity, where attackers lure individuals into providing sensitive data. This project aims to contribute to digital security by accurately identifying and flagging potential phishing websites.
The dataset used in this project includes various features extracted from website URLs. Each entry in the dataset is labeled as 'phishing' or 'non-phishing'.
The project involves the following steps:
- Data Preprocessing: Cleansing and transforming raw data into a suitable format for the model.
- Feature Engineering: Selecting and engineering features from URLs that are indicative of phishing activities.
- Model Training: Training a neural network model to classify websites.
- Evaluation: Assessing the model's performance using metrics like accuracy, precision, recall, and F1-score.
- Python: Primary programming language
- TensorFlow and Keras: For building and training the neural network model
- Scikit-learn: For data preprocessing and model evaluation
- Pandas and NumPy: For data manipulation and numerical computations
- Matplotlib and Seaborn: For data visualization
The model achieved an accuracy of approximately 92.86% on the validation set, indicating strong performance in distinguishing between phishing and non-phishing websites.
- Clone the repository.
- Install the required dependencies.
- Run the Jupyter notebooks to train the model and make predictions.
Future improvements might include experimenting with different machine learning algorithms, enhancing feature engineering, and using a larger dataset for training.