Skip to content

Implementation of a learning model using NLP and music theory techniques to generate music through MIDI files.

Notifications You must be signed in to change notification settings

petcomputacaoufrgs/papagaio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 

Repository files navigation

drawing Papagaio

About

Music resamples language as a temporal sequence of articulated sounds. They say something, often something human.

Although, there are crucial differences between language and music. We can still describe it as a sequence of symbols in the simplest form of understanding. Translating something complex into something simpler, but usable by computational models.

Thus, the objective of this project is to establish a communication between the human, that understands music in the most intense way that the brain can interpret through information, and the machine.

We'll create a model that can generate music based on the input information, i.e., generate a sequence of sounds which are related in some way with the sounds passed as input.

We'll use Natural Language Processing (NLP) methods, observing the music as it were a language, abstracting it. Doing this, the machine can recognize and process similar data.

On the first step, we'll use text generation techniques, using Recurrent Neural Networks (RNNs) and Long-Short Term Memories (LSTMs). With the effectiveness of the training, even if it's reasonable, we'll perform the same implementation using specific models such as Transformers.

Dataset

The dataset is a composition of several songs in MIDI format. The .mid files are split by artist and we have, in total, XXXX files.

The dataset can be found on Kaggle here and in the official website. We used the Clean MIDI subset.

Data preprocessing

From an input file with songs in MIDI format, we preprocess the data in order to encode them using multi-hot encoding.

Using this type of encoding, we use an essential factor of music: the time. In this way, the problem is different from a text generation problem due to the addition of one more dimension.

image

For each bar, we separate them into 32 different frames, where each frame is an 88-position multi-hot vector, which each position represents the notes of a standard keyboard. The notes that are being played at the exact instant of the frame receive the value '1' in the respective position of the vector, whereas the notes that are turned off receive the value '0'.

The representation of the list is exemplified below.

image

LSTM Model

Structure

Training and validation

Tests and music generation

Improvements and optimizations

About

Implementation of a learning model using NLP and music theory techniques to generate music through MIDI files.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •