- Introduction
- Installation
- Data Collection
- Data Conversion
- Model Generation
- Analysis and Findings
- Classification Report
In this report, we will discuss the process of converting the data and generating a model for handwriting sensor recognition. The goal of this project was to develop a model that can accurately identify different classes of handwriting based on sensor data.
-
Please install
python
andmongodb
on your device -
Install pipenv
pip install pipenv
- Cd to project folder and install the dependencies
pipenv install
- Download the data
curl -L -o data.zip "https://drive.google.com/uc?export=download&id=1J0u1RfI5838teqmBPWm_GIldPL5gWPwX"
- Extract the data
unzip data.zip
-
In
config.yml
enter your mongodb uri -
Don't forget to select the correct kernel on both
ipynb
files -
Run all cells on
aiot_dataset_creation.ipynb
-
Run all cells on
aiot_project_2023.ipynb
A report on our data collection can be found here
- Read data from the database and store it in a DataFrame.
- Apply a sliding window technique to capture temporal information.
- Use a low pass filter to remove high-frequency noise and smooth the data.
- Flatten the 2D instances into a 1D vector and replace None values with NaN.
- Split the data into training and testing sets, and scale the features.
- Impute NaN values using custom weights to handle missing data.
- Apply PCA to reduce the dimensionality of the data.
These steps were taken to convert the sensor data into a suitable format for further processing and model training.
In the model generation phase, we experimented with different machine learning algorithms, including SVM, Random Forest, and CNN. For SVM and Random Forest, we utilized the data transformed by PCA, while CNN used the raw data without PCA.
We performed grid search to find the best hyperparameters for SVM and Random Forest models. Grid search involved testing various combinations of hyperparameters to determine the optimal configuration that yielded the highest accuracy.
In the case of CNN, we focused on exploring different architectures by varying the number of neurons in the network. This allowed us to evaluate the impact of network complexity on the model's accuracy.
By trying different algorithms and tuning their hyperparameters, we aimed to find the best model that could accurately classify the handwriting sensor data.
Model | Best Parameters | Best estimator | Best score | Accuracy |
---|---|---|---|---|
SVM | {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'} | SVC(C=10, gamma=0.001) | 0.8552 | 0.8281 |
Random Forest | {'max_depth': 30, 'n_estimators': 200} | RandomForestClassifier(max_depth=30, n_estimators=200) | 0.8316 | 0.8906 |
CNN | {'optimizer': 'adam', 'learning_rate': 0.001} | CNNModel(hidden_layer=[60]) | N/A | 0.8769 |
Overall, the Random Forest model achieved the highest accuracy of 89.06%, closely followed by the CNN model with 87.70%. The SVM model also performed reasonably well with an accuracy of 82.81%. These results suggest that both ensemble methods (Random Forest) and deep learning (CNN) can be effective the classification task, but further analysis and evaluation of the CNN model are needed to obtain a comprehensive understanding of its performance.
The following classification report provides detailed performance metrics for each letter class in the dataset (Result from the random forest):
Based on the provided classification report, here are the results for each letter class:
Class | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
α | 0.71 | 0.62 | 0.67 | 8 |
β | 0.90 | 0.75 | 0.82 | 12 |
γ | 0.89 | 1.00 | 0.94 | 8 |
δ | 0.80 | 1.00 | 0.89 | 4 |
ε | 0.83 | 0.91 | 0.87 | 11 |
ζ | 1.00 | 0.89 | 0.94 | 9 |
η | 0.83 | 1.00 | 0.91 | 5 |
θ | 1.00 | 1.00 | 1.00 | 7 |
Observations:
- Overall, the model achieved high precision, recall, and F1-scores for most letter classes, indicating accurate predictions.
- Class 'α' had a relatively lower precision and recall compared to other classes, but still achieved a reasonable F1-score.
- Class 'β' had a lower recall compared to precision, indicating some difficulty in correctly identifying instances of this class.
- Classes 'γ', 'δ', 'ζ', 'η, and 'θ' achieved excellent precision, recall, and F1-scores, demonstrating high performance in classification.