Skip to content

A fullstack gesture recognition project that utilises Mediapipe and LSTM Neural Networks to recognise gestures in real time captured by device camera using OpenCV and pay way to enable IoT device control using those gestures in the future iterations.

Notifications You must be signed in to change notification settings

muditgarg48/gesture_based_device_control

Repository files navigation

GESTURE RECOGNITION-BASED DEVICE CONTROL

INTRODUCTION

My final year project that I am working on, is Gesture recognition based smart device control like IoT devices that we use in our daily lives. Hand gestures are the key medium for non-verbal communication used by communities which are challenged in terms of speech or hearing. It acts as a medium for their interaction with the world. And in this ever-evolving landscape of technology, making the technology more inclusive and accessible to these diverse user groups remains paramount. This project aims to contribute to this area of technology by focusing on the development of a system that will be designed to recognise gestures and their corresponding commands for the purpose of controlling various IoT devices that people use in their day-to-day life. This endeavour will contribute to the broader goal of enhancing accessibility of this technology to a wider audience and bridge the gap between the people who rely on hand gestures as their primary mode of communication and the technology they use. This project gives the user, the ability to enter their own set of gestures that they want the system (or the trained neural network in this case) to recognise and their corresponding commands. Then the user is given the opportunity to use their device’s camera feed to collect the training data for these gestures. This data is then processed to churn out the useful data, which is the movement of the key points in their hands and is converted to a Numpy array, which is then finally provided to the Long Short-Term Memory (LSTM) neural net to train on. The user then establishes connection with the desired IoT device and then uses their device’s camera feed and the trained neural network to recognise the gestures (more specifically, the commands) which is then sent to the IoT device using the local network.

Note: The project was developed with Python v3.11.5 and Pip version 24.0.

INDEX

PROJECT SETUP here

FILES AND FOLDERS DESCRIPTION here

data/

This folder contains all the necessary data required by the project to run

global_variables/ here

This folder includes all the global variables used by various scripts within the project

my-project-env/ here

This folder denotes the virtual environment of the project. This is usually generated for the user by the project_setup.py script when the user runs it for the first time after cloning the repository.

scripts/

This folder contains all the scripts required to do various tasks. Particular care has been taken to name them to describe their purpose. A more detailed explanation is also present in this README.

Note:

Remember to run the scripts from within the folder as they are configured to find the relative path that way. Running them from another location might affect the scripts' ability to search for necessary files and folders.

tensorboard-logs/ here

This folder contains all the files necessary for Tensorboard.

PROJECT SETUP

This project involves a lot of moving parts but fear not. The project also includes several Python scripts to ensure the user doesn't have to set up anything manually if those scripts are run before starting the project.

Follow the following steps to set the project suitable for running:

  1. Download Python from here and install it.

    • Ensure that it is added to the environment variables.
    • The Python setup has an option to check if you want the setup to directly add Python to the environment variables.
    • Mediapipe, a Google-developed library used in this project, requires:
      • Python version 3.8 to 3.11
      • Pip version 20.3+
  2. Run the project_setup.py script by either one of these commands in the terminal:

    python project_script.py
    

    (or)

    python3 project_script.py
    
  3. Activate the virtual environment using this command in the terminal. Note, that the quotation marks (" ") are necessary because the command is running the virtual environment activation script in another folder directly without traversing to it.

    For Windows:

    "{name of virtual env}/Scripts/activate.bat"
    

    For other OS:

    "{name of virtual env}/Scripts/activate"
    
  4. Run project_integrity_check.py script to check the health of the project workspace and if it is ready to run the project, by either one of these commands in the terminal:

    python project_integrity_check.py
    

    (or)

    python3 project_script.py
    

FILES and FOLDERS DESCRIPTIONS

project_setup.py

The project_setup.py script handles all the project setup steps that are required before successfully running the script. Please ensure to run it first after cloning this repository to ensure all the dependencies are satisfied. The steps included in this file are:

  1. Checks for Python >=3.9 or <=3.11.

  2. Checks for the availability of requirements.txt file to get the list of all the dependencies of this project.

  3. Checks for Pip installation. It downloads and installs Pip if not found. The link used by the script to download pip is this.

  4. Checks for the availability of venv Python package for creating and usage of a virtual environment for this project. It is installed using the following command globally if not found:

    pip install venv
    
  5. Checks for the presence of the project's virtual environment. The default name of this project's virtual environment is "my-project-env" and the name is stored in the global_variables.py for every script to access.

    • If the script doesn't find the virtual environment it was looking for, it creates the virtual environment and activates it using:
      python -m venv {name of virtual env}
      
  6. Check if the virtual environment is active or not. If yes, proceed to the next step. Else, it activates the virtual environment using:

    For Windows:

    "{name of virtual env}/Scripts/activate.bat"
    

    For other OS:

    "{name of virtual env}/Scripts/activate"
    
Note: The quotation marks (" ") are necessary because the command is running the virtual environment activation script in another folder directly without traversing to it.
  1. Once the virtual environment is created and activated, it installs all the dependencies for the project using Pip and requirements.txt using:

    pip install -r requirements.txt
    
  2. Once all the steps are completed, the script adds the project's virtual environment in the list of compatible kernels for the main Jupyter Notebook to use.

Note:
  • The virtual environment is only activated for the duration of the script to install the dependencies. Please reactivate the virtual environment using the aforementioned command (also mentioned at the end of the execution of this script) if you intend to run the project through the terminal.
  • Remember to change the kernel for the Jupyter Notebook to the project's virtual environment. The option is available at the top right corner for Visual Studio Code once you open the notebook.

project_integrity_check.py

The project_integrity_check.py script checks if the project workspace is ready before successfully running the project. Please ensure to run it after running project_setup.py and activating the virtual environment first to ensure that the project will not have a setup issue during execution. The steps included in this file are:

  1. Check Python version >=3.9 or <=3.11
  2. Check the availability of the virtual environment and if it is activated.
  3. Check if all the necessary packages are accessible by the project.
  4. Check if the camera feed is accessible by the OpenCV library for the project.
Note:

This script does not perform any task but only is responsible for doing checks. Run the project_setup.py script again to redo the setup process if any checks fail.

requirements.txt

The requirements.txt file contains all the dependencies that are required by this project to run successfully. This file is auto-generated using Pip by the following command:

pip freeze > requirements.txt

.gitignore

The .gitignore file has been custom-created for this project to add all the files and folders that do not need to be tracked by Git. These include:

  • The training data folder
  • The virtual environment folder
  • The Python cache
  • The Pip installation Python script which is downloaded if the user doesn't have Pip

data/models/

The models folder is intended to be the one-stop folder to save all the models that will be trained during the development of the project.

data/training-action-data/

The training-action-data folder is intended to store all the videos broken down into frames stored in NumPy arrays that will be used to train the model for recognizing gestures stored in the available_gestures.npy

data/available_gestures.npy

The available_gestures.npy file is the stored version of a Numpy array which stores all the gestures compatible with the project.

global_variables/

The global_variables folder contains all the necessary global variables in one place that are necessary for the project to run. There are two types of global variables in this folder, - fixed.py are the ones that need to be fixed and should be not touched - user_specific.py are the ones that are flexible for the user to change according to his needs

Note: Be very careful while modifying contents of user_specific.py file. Unnecessary alterations not according to the default values of this file might break the project.

/my-project-env

The default virtual environment folder which stores all the packages locally which are required by the project without globally installing them in your system and slowing it down. To activate it, go to the Scripts folder inside it and run activate.bat (for Windows) or activate (for others)

scripts/camera_feed_testing.py

The camera_feed_testing.py python script is used to test if the project can open up the camera in the user's system and access the camera feed.

scripts/commandline_functions.py

The commandline_functions.py python file only contains some functions that help other scripts during their execution in the command line.

scripts/dataset_functions.py

The dataset_functions.py python scripts contain all the necessary functions required for various actions related to the datasets stored in this project for training the models. The script runs in an infinite loop to ask for which function you want to perform by a switch case till you stop it. These include:

  • Print the dataset details which include:

    • The gestures included
    • The number of video folders present in the dataset available to train
  • Add onto the already present dataset and if not present, create one

  • Completely clean and erase the training_action_data folder with data folder

scripts/gesture_functions.py

The gesture_functions.py python scripts contain all the necessary functions required for various actions related to the list of gestures compatible with this project for recognition. The script runs in an infinite loop to ask for which function you want to perform by a switch case till you stop it. These include:

  • Reset the compatible gesture list to

    • Toggle Lights
    • Increase their brightness
    • Decrease their brightness
  • Load gestures from available_gestures.npy

  • Save the modified gesture list back to available_gestures.npy

  • Show available gestures stored in available_gestures.npy

  • Add a gesture to available_gestures.npy list.

scripts/mediapipe_functions.py

The mediapipe_functions.py python script contains all the necessary functions developed using Mediapipe by Google to recognize key points in both hands and draw them on the receiving camera feed.

scripts/model_functions.py

The model_functions python scripts contain all the necessary functions required for various actions related to the models created within this project for gesture recognition. The script runs in an infinite loop to ask for which function you want to perform by a switch case till you stop it. These include:

  • Print the list of all the pretrained models present in the storage
  • Train a new model based on the dataset present in the data/training_action_data/
  • Test an existing model from the data/training_action_data/

scripts/tensorboard_training_monitor.py

The tensorboard_training_monitor.py Python script is responsible for activating the Tensorboard which is an interactive and useful dashboard that shows all the stats while training of the model.

Note: This script should be run before starting the training of the model.

tensorboard_logs/

The tensorboard_logs folder contains the autogenerated files by Tensorboard to monitor the training of any model using Tensorflow. These files are utilised by the tensorboard_training_monitor.py python script in scripts/ folder to launch an interactive web application to monitor the training of the model.

About

A fullstack gesture recognition project that utilises Mediapipe and LSTM Neural Networks to recognise gestures in real time captured by device camera using OpenCV and pay way to enable IoT device control using those gestures in the future iterations.

Topics

Resources

Stars

Watchers

Forks