Here we will track the latest Audio AI Agent, including speech, music, sound effects, etc.
Date | Source | Description | Paper | Code | Trained Model |
---|---|---|---|---|---|
06.12 | JAMMIN-GPT | JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live | arXiv | GitHub | - |
19.11 | M2UGen | M2UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models | arXiv | - | - |
14.11 | Qwen-Audio | Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models | arXiv | GitHub | - |
02.11 | FLAP | FLAP: Fast Language-Audio Pre-training | arXiv | - | - |
29.10 | JEN-1 Composer | JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation | arXiv | - | - |
20.10 | SALMONN | SALMONN: Towards Generic Hearing Abilities for Large Language Models | arXiv | GitHub | Hugging Face |
19.10 | Loop Copilot | Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing | arXiv | - | - |
18.10 | MusicAgent | MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models | arXiv | GitHub | - |
11.10 | LLark | LLark: A Multimodal Foundation Model for Music | arXiv | GitHub | - |
01.10 | UniAudio | UniAudio: An Audio Foundation Model Toward Universal Audio Generation | arXiv | GitHub | - |
18.09 | Dynamic-SUPERB | Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech | arXiv | GitHub | - |