README.md · Qwen3-Coder-30B-A3B-Instruct

README.md

5.3 KB · 167 lines · markdown Raw

1	`---`
2	`library_name: transformers`
3	`license: apache-2.0`
4	`license_link: https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE`
5	`pipeline_tag: text-generation`
6	`---`
7
8	`# Qwen3-Coder-30B-A3B-Instruct`
9	`<a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">`
10	`<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>`
11	`</a>`
12
13	`## Highlights`
14
15	`Qwen3-Coder is available in multiple sizes. Today, we're excited to introduce Qwen3-Coder-30B-A3B-Instruct. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements:`
16
17	`- Significant Performance among open models on Agentic Coding, Agentic Browser-Use, and other foundational coding tasks.`
18	`- Long-context Capabilities with native support for 256K tokens, extendable up to 1M tokens using Yarn, optimized for repository-scale understanding.`
19	`- Agentic Coding supporting for most platform such as Qwen Code, CLINE, featuring a specially designed function call format.`
20
21	`![image/jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Coder/qwen3-coder-30a3-main.jpg)`
22
23	`## Model Overview`
24
25	`Qwen3-Coder-30B-A3B-Instruct has the following features:`
26	`- Type: Causal Language Models`
27	`- Training Stage: Pretraining & Post-training`
28	`- Number of Parameters: 30.5B in total and 3.3B activated`
29	`- Number of Layers: 48`
30	`- Number of Attention Heads (GQA): 32 for Q and 4 for KV`
31	`- Number of Experts: 128`
32	`- Number of Activated Experts: 8`
33	`- Context Length: 262,144 natively.`
34
35	NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.
36
37	`For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-coder/), [GitHub](https://github.com/QwenLM/Qwen3-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/).`
38
39
40	`## Quickstart`
41
42	We advise you to use the latest version of `transformers`.
43
44	With `transformers<4.51.0`, you will encounter the following error:
45	```
46	`KeyError: 'qwen3_moe'`
47	```
48
49	`The following contains a code snippet illustrating how to use the model generate content based on given inputs.`
50	```python
51	`from transformers import AutoModelForCausalLM, AutoTokenizer`
52
53	`model_name = "Qwen/Qwen3-Coder-30B-A3B-Instruct"`
54
55	`# load the tokenizer and the model`
56	`tokenizer = AutoTokenizer.from_pretrained(model_name)`
57	`model = AutoModelForCausalLM.from_pretrained(`
58	`model_name,`
59	`torch_dtype="auto",`
60	`device_map="auto"`
61	`)`
62
63	`# prepare the model input`
64	`prompt = "Write a quick sort algorithm."`
65	`messages = [`
66	`{"role": "user", "content": prompt}`
67	`]`
68	`text = tokenizer.apply_chat_template(`
69	`messages,`
70	`tokenize=False,`
71	`add_generation_prompt=True,`
72	`)`
73	`model_inputs = tokenizer([text], return_tensors="pt").to(model.device)`
74
75	`# conduct text completion`
76	`generated_ids = model.generate(`
77	`**model_inputs,`
78	`max_new_tokens=65536`
79	`)`
80	`output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()`
81
82	`content = tokenizer.decode(output_ids, skip_special_tokens=True)`
83
84	`print("content:", content)`
85	```
86
87	Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.
88
89	`For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.`
90
91	`## Agentic Coding`
92
93	`Qwen3-Coder excels in tool calling capabilities.`
94
95	`You can simply define or use any tools as following example.`
96	```python
97	`# Your tool implementation`
98	`def square_the_number(num: float) -> dict:`
99	`return num ** 2`
100
101	`# Define Tools`
102	`tools=[`
103	`{`
104	`"type":"function",`
105	`"function":{`
106	`"name": "square_the_number",`
107	`"description": "output the square of the number.",`
108	`"parameters": {`
109	`"type": "object",`
110	`"required": ["input_num"],`
111	`"properties": {`
112	`'input_num': {`
113	`'type': 'number',`
114	`'description': 'input_num is a number that will be squared'`
115	`}`
116	`},`
117	`}`
118	`}`
119	`}`
120	`]`
121
122	`import OpenAI`
123	`# Define LLM`
124	`client = OpenAI(`
125	`# Use a custom endpoint compatible with OpenAI API`
126	`base_url='http://localhost:8000/v1', # api_base`
127	`api_key="EMPTY"`
128	`)`
129
130	`messages = [{'role': 'user', 'content': 'square the number 1024'}]`
131
132	`completion = client.chat.completions.create(`
133	`messages=messages,`
134	`model="Qwen3-Coder-30B-A3B-Instruct",`
135	`max_tokens=65536,`
136	`tools=tools,`
137	`)`
138
139	`print(completion.choice[0])`
140	```
141
142	`## Best Practices`
143
144	`To achieve optimal performance, we recommend the following settings:`
145
146	`1. Sampling Parameters:`
147	- We suggest using `temperature=0.7`, `top_p=0.8`, `top_k=20`, `repetition_penalty=1.05`.
148
149	`2. Adequate Output Length: We recommend using an output length of 65,536 tokens for most queries, which is adequate for instruct models.`
150
151
152	`### Citation`
153
154	`If you find our work helpful, feel free to give us a cite.`
155
156	```
157	`@misc{qwen3technicalreport,`
158	`title={Qwen3 Technical Report},`
159	`author={Qwen Team},`
160	`year={2025},`
161	`eprint={2505.09388},`
162	`archivePrefix={arXiv},`
163	`primaryClass={cs.CL},`
164	`url={https://arxiv.org/abs/2505.09388},`
165	`}`
166	```
167