README.md
4.3 KB · 114 lines · markdown Raw
1 ---
2 license: cc-by-nc-4.0
3 language:
4 - en
5 - zh
6 pipeline_tag: audio-classification
7 tags:
8 - music
9 ---
10
11 # MuQ & MuQ-MuLan
12
13 <div>
14 <a href='#'><img alt="Static Badge" src="https://img.shields.io/badge/Python-3.8%2B-blue?logo=python&logoColor=white"></a>
15 <a href='https://arxiv.org/abs/2501.01108'><img alt="Static Badge" src="https://img.shields.io/badge/arXiv-2501.01108-%23b31b1b?logo=arxiv&link=https%3A%2F%2Farxiv.org%2F"></a>
16 <a href='https://huggingface.co/OpenMuQ'><img alt="Static Badge" src="https://img.shields.io/badge/huggingface-OpenMuQ-%23FFD21E?logo=huggingface&link=https%3A%2F%2Fhuggingface.co%2FOpenMuQ"></a>
17 <a href='https://pytorch.org/'><img alt="Static Badge" src="https://img.shields.io/badge/framework-PyTorch-%23EE4C2C?logo=pytorch"></a>
18 <a href='https://pypi.org/project/muq'><img alt="Static Badge" src="https://img.shields.io/badge/pip%20install-muq-green?logo=PyPI&logoColor=white&link=https%3A%2F%2Fpypi.org%2Fproject%2Fmuq"></a>
19 </div>
20
21
22 This is the official repository for the paper *"**MuQ**: Self-Supervised **Mu**sic Representation Learning
23 with Mel Residual Vector **Q**uantization"*. For more detailed information, we strongly recommend referring to https://github.com/tencent-ailab/MuQ and the [paper]((https://arxiv.org/abs/2501.01108)).
24
25 In this repo, the following models are released:
26
27 - **MuQ**(see [this link](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter)): A large music foundation model pre-trained via Self-Supervised Learning (SSL), achieving SOTA in various MIR tasks.
28 - **MuQ-MuLan**(see [this link](https://huggingface.co/OpenMuQ/MuQ-MuLan-large)): A music-text joint embedding model trained via contrastive learning, supporting both English and Chinese texts.
29
30
31 ## Usage
32
33 To begin with, please use pip to install the official `muq` lib, and ensure that your `python>=3.8`:
34 ```bash
35 pip3 install muq
36 ```
37
38
39
40 To extract music audio features using **MuQ**:
41 ```python
42 import torch, librosa
43 from muq import MuQ
44
45 device = 'cuda'
46 wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
47 wavs = torch.tensor(wav).unsqueeze(0).to(device)
48
49 # This will automatically fetch the checkpoint from huggingface
50 muq = MuQ.from_pretrained("OpenMuQ/MuQ-large-msd-iter")
51 muq = muq.to(device).eval()
52
53 with torch.no_grad():
54 output = muq(wavs, output_hidden_states=True)
55
56 print('Total number of layers: ', len(output.hidden_states))
57 print('Feature shape: ', output.last_hidden_state.shape)
58
59 ```
60
61
62
63 Using **MuQ-MuLan** to extract the music and text embeddings and calculate the similarity:
64 ```python
65 import torch, librosa
66 from muq import MuQMuLan
67
68 # This will automatically fetch checkpoints from huggingface
69 device = 'cuda'
70 mulan = MuQMuLan.from_pretrained("OpenMuQ/MuQ-MuLan-large")
71 mulan = mulan.to(device).eval()
72
73 # Extract music embeddings
74 wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
75 wavs = torch.tensor(wav).unsqueeze(0).to(device)
76 with torch.no_grad():
77 audio_embeds = mulan(wavs = wavs)
78
79 # Extract text embeddings (texts can be in English or Chinese)
80 texts = ["classical genres, hopeful mood, piano.", "一首适合海边风景的小提琴曲,节奏欢快"]
81 with torch.no_grad():
82 text_embeds = mulan(texts = texts)
83
84 # Calculate dot product similarity
85 sim = mulan.calc_similarity(audio_embeds, text_embeds)
86 print(sim)
87 ```
88
89 ## Model Checkpoints
90
91 | Model Name | Parameters | Data | HuggingFace🤗 |
92 | ----------- | --- | --- | ----------- |
93 | MuQ | ~300M | MSD dataset | [OpenMuQ/MuQ-large-msd-iter](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter) |
94 | MuQ-MuLan | ~700M | music-text pairs | [OpenMuQ/MuQ-MuLan-large](https://huggingface.co/OpenMuQ/MuQ-MuLan-large) |
95
96 **Note**: Please note that the open-sourced MuQ was trained on the Million Song Dataset. Due to differences in dataset size, the open-sourced model may not achieve the same level of performance as reported in the paper.
97
98 ## License
99
100 The code is released under the MIT license.
101
102 The model weights (MuQ-large-msd-iter, MuQ-MuLan-large) are released under the CC-BY-NC 4.0 license.
103
104 ## Citation
105
106 ```
107 @article{zhu2025muq,
108 title={MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization},
109 author={Haina Zhu and Yizhi Zhou and Hangting Chen and Jianwei Yu and Ziyang Ma and Rongzhi Gu and Yi Luo and Wei Tan and Xie Chen},
110 journal={arXiv preprint arXiv:2501.01108},
111 year={2025}
112 }
113 ```
114