README.md
| 1 | --- |
| 2 | license: cc-by-nc-4.0 |
| 3 | language: |
| 4 | - en |
| 5 | - zh |
| 6 | pipeline_tag: audio-classification |
| 7 | tags: |
| 8 | - music |
| 9 | --- |
| 10 | |
| 11 | # MuQ & MuQ-MuLan |
| 12 | |
| 13 | <div> |
| 14 | <a href='#'><img alt="Static Badge" src="https://img.shields.io/badge/Python-3.8%2B-blue?logo=python&logoColor=white"></a> |
| 15 | <a href='https://arxiv.org/abs/2501.01108'><img alt="Static Badge" src="https://img.shields.io/badge/arXiv-2501.01108-%23b31b1b?logo=arxiv&link=https%3A%2F%2Farxiv.org%2F"></a> |
| 16 | <a href='https://huggingface.co/OpenMuQ'><img alt="Static Badge" src="https://img.shields.io/badge/huggingface-OpenMuQ-%23FFD21E?logo=huggingface&link=https%3A%2F%2Fhuggingface.co%2FOpenMuQ"></a> |
| 17 | <a href='https://pytorch.org/'><img alt="Static Badge" src="https://img.shields.io/badge/framework-PyTorch-%23EE4C2C?logo=pytorch"></a> |
| 18 | <a href='https://pypi.org/project/muq'><img alt="Static Badge" src="https://img.shields.io/badge/pip%20install-muq-green?logo=PyPI&logoColor=white&link=https%3A%2F%2Fpypi.org%2Fproject%2Fmuq"></a> |
| 19 | </div> |
| 20 | |
| 21 | |
| 22 | This is the official repository for the paper *"**MuQ**: Self-Supervised **Mu**sic Representation Learning |
| 23 | with Mel Residual Vector **Q**uantization"*. For more detailed information, we strongly recommend referring to https://github.com/tencent-ailab/MuQ and the [paper]((https://arxiv.org/abs/2501.01108)). |
| 24 | |
| 25 | In this repo, the following models are released: |
| 26 | |
| 27 | - **MuQ**(see [this link](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter)): A large music foundation model pre-trained via Self-Supervised Learning (SSL), achieving SOTA in various MIR tasks. |
| 28 | - **MuQ-MuLan**(see [this link](https://huggingface.co/OpenMuQ/MuQ-MuLan-large)): A music-text joint embedding model trained via contrastive learning, supporting both English and Chinese texts. |
| 29 | |
| 30 | |
| 31 | ## Usage |
| 32 | |
| 33 | To begin with, please use pip to install the official `muq` lib, and ensure that your `python>=3.8`: |
| 34 | ```bash |
| 35 | pip3 install muq |
| 36 | ``` |
| 37 | |
| 38 | |
| 39 | |
| 40 | To extract music audio features using **MuQ**: |
| 41 | ```python |
| 42 | import torch, librosa |
| 43 | from muq import MuQ |
| 44 | |
| 45 | device = 'cuda' |
| 46 | wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000) |
| 47 | wavs = torch.tensor(wav).unsqueeze(0).to(device) |
| 48 | |
| 49 | # This will automatically fetch the checkpoint from huggingface |
| 50 | muq = MuQ.from_pretrained("OpenMuQ/MuQ-large-msd-iter") |
| 51 | muq = muq.to(device).eval() |
| 52 | |
| 53 | with torch.no_grad(): |
| 54 | output = muq(wavs, output_hidden_states=True) |
| 55 | |
| 56 | print('Total number of layers: ', len(output.hidden_states)) |
| 57 | print('Feature shape: ', output.last_hidden_state.shape) |
| 58 | |
| 59 | ``` |
| 60 | |
| 61 | |
| 62 | |
| 63 | Using **MuQ-MuLan** to extract the music and text embeddings and calculate the similarity: |
| 64 | ```python |
| 65 | import torch, librosa |
| 66 | from muq import MuQMuLan |
| 67 | |
| 68 | # This will automatically fetch checkpoints from huggingface |
| 69 | device = 'cuda' |
| 70 | mulan = MuQMuLan.from_pretrained("OpenMuQ/MuQ-MuLan-large") |
| 71 | mulan = mulan.to(device).eval() |
| 72 | |
| 73 | # Extract music embeddings |
| 74 | wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000) |
| 75 | wavs = torch.tensor(wav).unsqueeze(0).to(device) |
| 76 | with torch.no_grad(): |
| 77 | audio_embeds = mulan(wavs = wavs) |
| 78 | |
| 79 | # Extract text embeddings (texts can be in English or Chinese) |
| 80 | texts = ["classical genres, hopeful mood, piano.", "一首适合海边风景的小提琴曲,节奏欢快"] |
| 81 | with torch.no_grad(): |
| 82 | text_embeds = mulan(texts = texts) |
| 83 | |
| 84 | # Calculate dot product similarity |
| 85 | sim = mulan.calc_similarity(audio_embeds, text_embeds) |
| 86 | print(sim) |
| 87 | ``` |
| 88 | |
| 89 | ## Model Checkpoints |
| 90 | |
| 91 | | Model Name | Parameters | Data | HuggingFace🤗 | |
| 92 | | ----------- | --- | --- | ----------- | |
| 93 | | MuQ | ~300M | MSD dataset | [OpenMuQ/MuQ-large-msd-iter](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter) | |
| 94 | | MuQ-MuLan | ~700M | music-text pairs | [OpenMuQ/MuQ-MuLan-large](https://huggingface.co/OpenMuQ/MuQ-MuLan-large) | |
| 95 | |
| 96 | **Note**: Please note that the open-sourced MuQ was trained on the Million Song Dataset. Due to differences in dataset size, the open-sourced model may not achieve the same level of performance as reported in the paper. |
| 97 | |
| 98 | ## License |
| 99 | |
| 100 | The code is released under the MIT license. |
| 101 | |
| 102 | The model weights (MuQ-large-msd-iter, MuQ-MuLan-large) are released under the CC-BY-NC 4.0 license. |
| 103 | |
| 104 | ## Citation |
| 105 | |
| 106 | ``` |
| 107 | @article{zhu2025muq, |
| 108 | title={MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization}, |
| 109 | author={Haina Zhu and Yizhi Zhou and Hangting Chen and Jianwei Yu and Ziyang Ma and Rongzhi Gu and Yi Luo and Wei Tan and Xie Chen}, |
| 110 | journal={arXiv preprint arXiv:2501.01108}, |
| 111 | year={2025} |
| 112 | } |
| 113 | ``` |
| 114 | |