Speech Drives Templates 3D

This repository allows you to train a model from a video/videos and to predict the gesture of a person from an audio track.
This repository is based on SpeechDrivesTeamplates, it does this work on 2 coordinates (x,y) and we've expanded it to 3 coordinates (x,y,z).

Organization

The main directory is splitted into 2 sub directories:

Preprocessing3D: It expands the SpeechDrivesTeamplates's preprocessing but managing a 3rd coordinate (z), in order to create a 3d dataset.
SpeechDrivesTeamplates: It's the core of the project and it contains all the files to train a 3D model and test it.

The file changed in SpeechDrivesTeamplates are:

core/networks/poses_recostruction/autoencoder.py
core/networks/keypoints_generation/generator.py
core/datasets/gesture_dataset.py
core/utils/keypoint_visualization.py
core/datasets/speaker_stat.py
core/pipelines/voice2pose.py
data_process/4_1_calculate_mean_std.py

Dataset

We created a small 3D dataset based on an university teacher, but it would be better creating a grater dataset to have a better model, you can use the speakers of Speech2Gesture.
You can use our preprocessed dataset from this link.
You can find the results from this link.

Execute Preprocessing3D

To build a 3D dataset, we provide scripts in Preprocessing3D.
It's necessary to run the scripts in the following order:

1_1_change_fps.py, it needs 2 arguments: videos directory and target directory where videos in 15FPS will be saved.
1_2_video2frames.py it needs 2 arguments: fps videos directory and directory where the frames will be saved (we suggest using the same path specified in the code).
preprocessing.py, it needs 2 arguments: frames path and output path.
fixing.py, it needs one argument: output path.
3_1_generate_clips it generates the NPZ files
3_2_split_train_val_test.py, it creates a csv file.
4_1_calculate_mean_std.py, it calculates the mean and std for each keypoint.
4_2_parse_mean_std_npz.py, it reshapes the mean and std.

After that, insert the mean and std in speaker_stat.py

Execute Model

To run the model, we suggest using "Execute.ypnb" file in Google Colab.
We suggest running in local the files:preprocessing.py and fixing.py, you could find problems relative to python version in Google colab.
To run the code on your local machine, you need to install on your device Cuda.
You also need to create dataset and output directory and set it in the configuration files (voice2pose_sdt_bp.yaml, default.py).

Training

python main.py --config_file configs/voice2pose_sdt_bp.yaml --tag speaker_name DATASET.SPEAKER speaker_name SYS.NUM_WORKERS 32

Authors

Lorenzo Cassano (mat.718331)
Jacopo D'Abramo (mat. 716484)

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Preprocessing3D		Preprocessing3D
SpeechDrivesTemplates		SpeechDrivesTemplates
Documentazione.docx		Documentazione.docx
DocumentazionePdf.pdf		DocumentazionePdf.pdf
Execute.ipynb		Execute.ipynb
README.md		README.md
RELAZIONE CASO DI STUDIO – CORSO DI SISTEMI.pptx		RELAZIONE CASO DI STUDIO – CORSO DI SISTEMI.pptx
paper.pdf		paper.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Drives Templates 3D

Organization

Dataset

Execute Preprocessing3D

Execute Model

Training

Authors

About

Releases

Packages

Languages

LorenzoCassano/SpeechDrivesTemplates3D

Folders and files

Latest commit

History

Repository files navigation

Speech Drives Templates 3D

Organization

Dataset

Execute Preprocessing3D

Execute Model

Training

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages