This project focuses on solving a classification problem with 3 classes using the Wine dataset. The pipeline includes data preprocessing, model building, and performance evaluation steps.
- Source: UCI Machine Learning Repository
- Problem: Classification with 3 classes.
- Preprocessing: Min-Max scaling and replacing NaN values with the mean value of each class.
-
Data Preprocessing:
- Min-Max scaling: Normalize the feature values.
- Replace NaN values: Replace missing values with the mean value of each class.
-
Data Splitting:
- Divide the data into 10% test and 90% train using stratified sampling.
-
One-Hot Encoding:
- Encode each class using one-hot encoding:
- Class 1: [1, 0, 0]
- Class 2: [0, 1, 0]
- Class 3: [0, 0, 1]
- Encode each class using one-hot encoding:
-
Model Building:
- Use a Softmax activation in the last layer to train the network to recognize the one-hot encoding.
- Employ the multi-layer perceptron (MLP) network using TensorFlow and Keras.
- Perform parameter exploration using tenfold cross-validation design with stratified sampling:
- Number of layers: [1, 2, 3]
- Neurons per layer: [32, 64, 128]
- Learning rate: [0.001, 0.01, 0.1]
-
Performance Evaluation:
- Use F1 score for performance evaluation.
- Explore the best parameters for the dataset while reducing bias towards a specific data splitting using cross-validation.
-
Pipeline Implementation:
- Create a custom pipeline by passing the output of one library function to the next.
- Add control flow to organize the tenfold cross-validation experiments.