README.md · MuQ-large-msd-iter

README.md

4.3 KB · 114 lines · markdown Raw

1	`---`
2	`license: cc-by-nc-4.0`
3	`language:`
4	`- en`
5	`- zh`
6	`pipeline_tag: audio-classification`
7	`tags:`
8	`- music`
9	`---`
10
11	`# MuQ & MuQ-MuLan`
12
13	`<div>`
14	`<a href='#'><img alt="Static Badge" src="https://img.shields.io/badge/Python-3.8%2B-blue?logo=python&logoColor=white"></a>`
15	`<a href='https://arxiv.org/abs/2501.01108'><img alt="Static Badge" src="https://img.shields.io/badge/arXiv-2501.01108-%23b31b1b?logo=arxiv&link=https%3A%2F%2Farxiv.org%2F"></a>`
16	`<a href='https://huggingface.co/OpenMuQ'><img alt="Static Badge" src="https://img.shields.io/badge/huggingface-OpenMuQ-%23FFD21E?logo=huggingface&link=https%3A%2F%2Fhuggingface.co%2FOpenMuQ"></a>`
17	`<a href='https://pytorch.org/'><img alt="Static Badge" src="https://img.shields.io/badge/framework-PyTorch-%23EE4C2C?logo=pytorch"></a>`
18	`<a href='https://pypi.org/project/muq'><img alt="Static Badge" src="https://img.shields.io/badge/pip%20install-muq-green?logo=PyPI&logoColor=white&link=https%3A%2F%2Fpypi.org%2Fproject%2Fmuq"></a>`
19	`</div>`
20
21
22	`This is the official repository for the paper "MuQ: Self-Supervised Mu*sic Representation Learning`
23	`with Mel Residual Vector Quantization"*. For more detailed information, we strongly recommend referring to https://github.com/tencent-ailab/MuQ and the [paper]((https://arxiv.org/abs/2501.01108)).`
24
25	`In this repo, the following models are released:`
26
27	`- MuQ(see [this link](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter)): A large music foundation model pre-trained via Self-Supervised Learning (SSL), achieving SOTA in various MIR tasks.`
28	`- MuQ-MuLan(see [this link](https://huggingface.co/OpenMuQ/MuQ-MuLan-large)): A music-text joint embedding model trained via contrastive learning, supporting both English and Chinese texts.`
29
30
31	`## Usage`
32
33	To begin with, please use pip to install the official `muq` lib, and ensure that your `python>=3.8`:
34	```bash
35	`pip3 install muq`
36	```
37
38
39
40	`To extract music audio features using MuQ:`
41	```python
42	`import torch, librosa`
43	`from muq import MuQ`
44
45	`device = 'cuda'`
46	`wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)`
47	`wavs = torch.tensor(wav).unsqueeze(0).to(device)`
48
49	`# This will automatically fetch the checkpoint from huggingface`
50	`muq = MuQ.from_pretrained("OpenMuQ/MuQ-large-msd-iter")`
51	`muq = muq.to(device).eval()`
52
53	`with torch.no_grad():`
54	`output = muq(wavs, output_hidden_states=True)`
55
56	`print('Total number of layers: ', len(output.hidden_states))`
57	`print('Feature shape: ', output.last_hidden_state.shape)`
58
59	```
60
61
62
63	`Using MuQ-MuLan to extract the music and text embeddings and calculate the similarity:`
64	```python
65	`import torch, librosa`
66	`from muq import MuQMuLan`
67
68	`# This will automatically fetch checkpoints from huggingface`
69	`device = 'cuda'`
70	`mulan = MuQMuLan.from_pretrained("OpenMuQ/MuQ-MuLan-large")`
71	`mulan = mulan.to(device).eval()`
72
73	`# Extract music embeddings`
74	`wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)`
75	`wavs = torch.tensor(wav).unsqueeze(0).to(device)`
76	`with torch.no_grad():`
77	`audio_embeds = mulan(wavs = wavs)`
78
79	`# Extract text embeddings (texts can be in English or Chinese)`
80	`texts = ["classical genres, hopeful mood, piano.", "一首适合海边风景的小提琴曲，节奏欢快"]`
81	`with torch.no_grad():`
82	`text_embeds = mulan(texts = texts)`
83
84	`# Calculate dot product similarity`
85	`sim = mulan.calc_similarity(audio_embeds, text_embeds)`
86	`print(sim)`
87	```
88
89	`## Model Checkpoints`
90
91	`\| Model Name \| Parameters \| Data \| HuggingFace🤗 \|`
92	`\| ----------- \| --- \| --- \| ----------- \|`
93	`\| MuQ \| ~300M \| MSD dataset \| [OpenMuQ/MuQ-large-msd-iter](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter) \|`
94	`\| MuQ-MuLan \| ~700M \| music-text pairs \| [OpenMuQ/MuQ-MuLan-large](https://huggingface.co/OpenMuQ/MuQ-MuLan-large) \|`
95
96	`Note: Please note that the open-sourced MuQ was trained on the Million Song Dataset. Due to differences in dataset size, the open-sourced model may not achieve the same level of performance as reported in the paper.`
97
98	`## License`
99
100	`The code is released under the MIT license.`
101
102	`The model weights (MuQ-large-msd-iter, MuQ-MuLan-large) are released under the CC-BY-NC 4.0 license.`
103
104	`## Citation`
105
106	```
107	`@article{zhu2025muq,`
108	`title={MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization},`
109	`author={Haina Zhu and Yizhi Zhou and Hangting Chen and Jianwei Yu and Ziyang Ma and Rongzhi Gu and Yi Luo and Wei Tan and Xie Chen},`
110	`journal={arXiv preprint arXiv:2501.01108},`
111	`year={2025}`
112	`}`
113	```
114