Your browser does not support the audio element.
This model is trained on a music audio dataset with mostly Chinese pop songs. We transcribe the audio into symbolic domain using MIR models (beat tracking, chord detection, structure analysis, 5-stem transcription and music tagging). The outputs of MIR models are then transformed into a token sequence to train a transformer-based language model.
Some characteristics we think are interesting:
Our model is also able to take control inputs, such as chord progression and song structure. Below is a piece generated with uncommon chord progression (looping major keys A-B-C-D-E-F-G-A).
Although the input chord progression is quite uncommon, the model is still able to find a balance between following the input and producing a reasonable melody, which surprises us.