This project features a song generation model using diffusion models. The model is capable of conditioning on song history, generating plausible future seconds in the song.
The repository is organized as follows:
nextune-main/
├── configs/ # Configuration files for training/sampling
│ ├── config_process.yaml # Audio pre/post-processing configurations
│ ├── config_sample.yaml # Sampling-specific configurations
│ └── config_train.yaml # Training-specific configurations
├── diffusion/ # Core diffusion modules
├── docs/ # Documentation
│ ├── assets/ # Documentation assets
│ └── README.md # Documentation overview and usage guide
├── models/ # Model and architecture components
│ ├── backbones/ # Backbone networks for NexTune
│ │ ├── model_nextune.py # NexTune backbone model
│ │ └── layers.py # Custom layers and utility functions
│ └── test_model_nextune.py # Test model_td backbone efficiency
├── scripts/ # Scripts for running tasks
│ ├── preprocess.py # Audio pre-processing script
│ ├── postprocess.py # Audio post-processing script
│ ├── sample.py # Sampling script
│ └── train.py # Training script
├── utils/ # Utility functions and helper scripts
│ ├── metrics.py # Evaluation metrics functions
│ └── __init__.py # Makes utils a package
├── main.py # Main file to run train and sample
├── requirements.txt # Project dependencies
└── README.md # Project overview, setup, and usage instructions
First, download and set up the repo:
git clone https://github.com/TunAI-Lab/nextune.git
cd nextune
Then, create a python 3.11 conda env and install the requirements
# Install NexTune
conda create --name venv python=3.10
conda activate venv
pip install -r requirements.txt
To launch data pre-processing :
python -m scripts.preprocess --config configs/config_process.yaml
To launch data post-processing:
python -m scripts.postprocess --config configs/config_process.yaml
We provide a training script for NexTune model in scripts/train.py
.
To launch NexTune training with N
GPUs on one node:
accelerate launch -m scripts.train --config configs/config_train.yaml
To launch NexTune training with 1
GPU (id=1):
accelerate launch --num-processes=1 --gpu_ids 1 -m scripts.train --config configs/config_train.yaml
To sample trajectories from a pretrained NexTune model, run:
python -m scripts.sample --config configs/config_sample.yaml
The sampling results are automatically saved in the model's designated results directory, organized within the samples subfolder for easy access.