Intelligent music generation is a field of generative models in AI that has recently gained traction by advances in similar fields, such as image generation by DALL-E and text generation by ChatGPT. Much of the existing work in music generation relies on traditional deep learning methods. One prominent example is OpenAI's MuseNet, which can generate "good-sounding" snippets of songs that resemble existing pieces of musical work, and it is trained on a transformer network based on the GPT-2 codex.
However, one key idea that remains open for exploration in deep learning based music generation is the use of existing musical principles to guide generation. There are many techniques that have been explored previously to include domain knowledge into deep neural networks, but there lacks any significant advance in the realm of music generation.
The goal of this project is to explore the effects of guiding music generation by defining high-level primitives that can be parameterized by the network and employed to assist with music generation that outperforms traditional methods to create songs that are considered to be "good" by general consensus. (High-level primitives are conventional building blocks used in all ranges of music, such as common chords and progressions.) To validate this approach, we can conduct user studies to gauge whether people prefer songs generated in the traditional manner or songs generated by this novel method.
Gimbop uses a program called midicsv to generate MIDI files from a human-readable CSV format.