DiffMusic: A Zero-shot Diffusion-Based Framework for Music Inverse Problem

This repository contains the implementation for final project of the CommE5070 Deep Learning for Music Analysis and Generation course, Fall 2024, at National Taiwan University. For a detailed report, please refer to this slides.

Setup

To set up the virtual environment and install the required packages, use the following commands:

virtualenv --python=python3.10 diffmusic
source diffmusic/bin/activate
pip install -r requirements.txt

Download CLAP pretrained weight

mkdir CLAP_weights
cd CLAP_weights

wget https://huggingface.co/microsoft/msclap/resolve/main/CLAP_weights_2022.pth

wget https://huggingface.co/microsoft/msclap/resolve/main/CLAP_weights_2023.pth

cd ..

Data Preparation

To download the dataset, run the following script:

bash scripts/download_data.sh

Generating Music for Inverse Problems

To address an inverse problem, you can use the following command:

python run.py \
    --task <Inverse Problem Task: {music_generation, music_inpainting, phase_retrieval, super_resolution, dereverberation, style_guidance}> \
    --scheduler <Sampling Scheduler: ddim, dps, mpgd, dsg, diffmusic> \
    --config_path <Path to Model Configuration> \
    --prompt ""

Available Inverse Problem Tasks

The following tasks can be specified with the --task option:

music_generation
music_inpainting
phase_retrieval
super_resolution
dereverberation
style_guidance

Available Scheduler

The following tasks can be specified with the --scheduler option:

ddim
dps
mpgd
dsg
diffmusic

Available Model Configurations

Specify the model configuration file with the --config_path option:

configs/audioldm2.yaml
configs/musicldm.yaml

Example Command

To perform music inpainting with a specific configuration:

python run.py \
    --task "music_inpainting" \
    --config_path "configs/musicldm.yaml" \
    --prompt ""

To perform style guidance with a specific configuration:

python run.py \
    --task "style_guidance" \
    --config_path "configs/audioldm2.yaml" \
    --prompt "A female reporter is singing"

Environment

We implemented the code on an environment running Ubuntu 22.04.1, utilizing a 12th Generation Intel(R) Core(TM) i7-12700 CPU, along with a single NVIDIA GeForce RTX 4090 GPU equipped with 24 GB of dedicated memory.

Citation

If you use this code, please cite the following:

@misc{liao2024_diffmusic,
    title  = {DiffMusic: A Unified Diffusion-Based Framework for Music Inverse Problem},
    author = {Jia-Wei Liao, Pin-Chi Pan, and Sheng-Ping Yang},
    url    = {https://github.com/jwliao1209/DiffMusic},
    year   = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
configs		configs
diffmusic		diffmusic
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
report.pdf		report.pdf
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiffMusic: A Zero-shot Diffusion-Based Framework for Music Inverse Problem

Setup

Download CLAP pretrained weight

Data Preparation

Generating Music for Inverse Problems

Available Inverse Problem Tasks

Available Scheduler

Available Model Configurations

Example Command

Environment

Citation

About

Releases

Packages

Contributors 2

Languages

License

jwliao1209/DiffMusic

Folders and files

Latest commit

History

Repository files navigation

DiffMusic: A Zero-shot Diffusion-Based Framework for Music Inverse Problem

Setup

Download CLAP pretrained weight

Data Preparation

Generating Music for Inverse Problems

Available Inverse Problem Tasks

Available Scheduler

Available Model Configurations

Example Command

Environment

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages