README.md
| 1 | --- |
| 2 | library_name: transformers |
| 3 | license: apache-2.0 |
| 4 | license_link: https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE |
| 5 | pipeline_tag: text-generation |
| 6 | --- |
| 7 | |
| 8 | # Qwen3-Coder-30B-A3B-Instruct |
| 9 | <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;"> |
| 10 | <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/> |
| 11 | </a> |
| 12 | |
| 13 | ## Highlights |
| 14 | |
| 15 | **Qwen3-Coder** is available in multiple sizes. Today, we're excited to introduce **Qwen3-Coder-30B-A3B-Instruct**. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements: |
| 16 | |
| 17 | - **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks. |
| 18 | - **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding. |
| 19 | - **Agentic Coding** supporting for most platform such as **Qwen Code**, **CLINE**, featuring a specially designed function call format. |
| 20 | |
| 21 |  |
| 22 | |
| 23 | ## Model Overview |
| 24 | |
| 25 | **Qwen3-Coder-30B-A3B-Instruct** has the following features: |
| 26 | - Type: Causal Language Models |
| 27 | - Training Stage: Pretraining & Post-training |
| 28 | - Number of Parameters: 30.5B in total and 3.3B activated |
| 29 | - Number of Layers: 48 |
| 30 | - Number of Attention Heads (GQA): 32 for Q and 4 for KV |
| 31 | - Number of Experts: 128 |
| 32 | - Number of Activated Experts: 8 |
| 33 | - Context Length: **262,144 natively**. |
| 34 | |
| 35 | **NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.** |
| 36 | |
| 37 | For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-coder/), [GitHub](https://github.com/QwenLM/Qwen3-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/). |
| 38 | |
| 39 | |
| 40 | ## Quickstart |
| 41 | |
| 42 | We advise you to use the latest version of `transformers`. |
| 43 | |
| 44 | With `transformers<4.51.0`, you will encounter the following error: |
| 45 | ``` |
| 46 | KeyError: 'qwen3_moe' |
| 47 | ``` |
| 48 | |
| 49 | The following contains a code snippet illustrating how to use the model generate content based on given inputs. |
| 50 | ```python |
| 51 | from transformers import AutoModelForCausalLM, AutoTokenizer |
| 52 | |
| 53 | model_name = "Qwen/Qwen3-Coder-30B-A3B-Instruct" |
| 54 | |
| 55 | # load the tokenizer and the model |
| 56 | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| 57 | model = AutoModelForCausalLM.from_pretrained( |
| 58 | model_name, |
| 59 | torch_dtype="auto", |
| 60 | device_map="auto" |
| 61 | ) |
| 62 | |
| 63 | # prepare the model input |
| 64 | prompt = "Write a quick sort algorithm." |
| 65 | messages = [ |
| 66 | {"role": "user", "content": prompt} |
| 67 | ] |
| 68 | text = tokenizer.apply_chat_template( |
| 69 | messages, |
| 70 | tokenize=False, |
| 71 | add_generation_prompt=True, |
| 72 | ) |
| 73 | model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
| 74 | |
| 75 | # conduct text completion |
| 76 | generated_ids = model.generate( |
| 77 | **model_inputs, |
| 78 | max_new_tokens=65536 |
| 79 | ) |
| 80 | output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() |
| 81 | |
| 82 | content = tokenizer.decode(output_ids, skip_special_tokens=True) |
| 83 | |
| 84 | print("content:", content) |
| 85 | ``` |
| 86 | |
| 87 | **Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.** |
| 88 | |
| 89 | For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3. |
| 90 | |
| 91 | ## Agentic Coding |
| 92 | |
| 93 | Qwen3-Coder excels in tool calling capabilities. |
| 94 | |
| 95 | You can simply define or use any tools as following example. |
| 96 | ```python |
| 97 | # Your tool implementation |
| 98 | def square_the_number(num: float) -> dict: |
| 99 | return num ** 2 |
| 100 | |
| 101 | # Define Tools |
| 102 | tools=[ |
| 103 | { |
| 104 | "type":"function", |
| 105 | "function":{ |
| 106 | "name": "square_the_number", |
| 107 | "description": "output the square of the number.", |
| 108 | "parameters": { |
| 109 | "type": "object", |
| 110 | "required": ["input_num"], |
| 111 | "properties": { |
| 112 | 'input_num': { |
| 113 | 'type': 'number', |
| 114 | 'description': 'input_num is a number that will be squared' |
| 115 | } |
| 116 | }, |
| 117 | } |
| 118 | } |
| 119 | } |
| 120 | ] |
| 121 | |
| 122 | import OpenAI |
| 123 | # Define LLM |
| 124 | client = OpenAI( |
| 125 | # Use a custom endpoint compatible with OpenAI API |
| 126 | base_url='http://localhost:8000/v1', # api_base |
| 127 | api_key="EMPTY" |
| 128 | ) |
| 129 | |
| 130 | messages = [{'role': 'user', 'content': 'square the number 1024'}] |
| 131 | |
| 132 | completion = client.chat.completions.create( |
| 133 | messages=messages, |
| 134 | model="Qwen3-Coder-30B-A3B-Instruct", |
| 135 | max_tokens=65536, |
| 136 | tools=tools, |
| 137 | ) |
| 138 | |
| 139 | print(completion.choice[0]) |
| 140 | ``` |
| 141 | |
| 142 | ## Best Practices |
| 143 | |
| 144 | To achieve optimal performance, we recommend the following settings: |
| 145 | |
| 146 | 1. **Sampling Parameters**: |
| 147 | - We suggest using `temperature=0.7`, `top_p=0.8`, `top_k=20`, `repetition_penalty=1.05`. |
| 148 | |
| 149 | 2. **Adequate Output Length**: We recommend using an output length of 65,536 tokens for most queries, which is adequate for instruct models. |
| 150 | |
| 151 | |
| 152 | ### Citation |
| 153 | |
| 154 | If you find our work helpful, feel free to give us a cite. |
| 155 | |
| 156 | ``` |
| 157 | @misc{qwen3technicalreport, |
| 158 | title={Qwen3 Technical Report}, |
| 159 | author={Qwen Team}, |
| 160 | year={2025}, |
| 161 | eprint={2505.09388}, |
| 162 | archivePrefix={arXiv}, |
| 163 | primaryClass={cs.CL}, |
| 164 | url={https://arxiv.org/abs/2505.09388}, |
| 165 | } |
| 166 | ``` |
| 167 | |