README.md · gpt-oss-120b

README.md

6.9 KB · 182 lines · markdown Raw

1	`---`
2	`license: apache-2.0`
3	`pipeline_tag: text-generation`
4	`library_name: transformers`
5	`tags:`
6	`- vllm`
7	`---`
8
9	`<p align="center">`
10	`<img alt="gpt-oss-120b" src="https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-120b.svg">`
11	`</p>`
12
13	`<p align="center">`
14	`<a href="https://gpt-oss.com"><strong>Try gpt-oss</strong></a> ·`
15	`<a href="https://cookbook.openai.com/topic/gpt-oss"><strong>Guides</strong></a> ·`
16	`<a href="https://arxiv.org/abs/2508.10925"><strong>Model card</strong></a> ·`
17	`<a href="https://openai.com/index/introducing-gpt-oss/"><strong>OpenAI blog</strong></a>`
18	`</p>`
19
20	`<br>`
21
22	`Welcome to the gpt-oss series, [OpenAI’s open-weight models](https://openai.com/open-models) designed for powerful reasoning, agentic tasks, and versatile developer use cases.`
23
24	`We’re releasing two flavors of these open models:`
25	- `gpt-oss-120b` — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters)
26	- `gpt-oss-20b` — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)
27
28	`Both models were trained on our [harmony response format](https://github.com/openai/harmony) and should only be used with the harmony format as it will not work correctly otherwise.`
29
30
31	`> [!NOTE]`
32	> This model card is dedicated to the larger `gpt-oss-120b` model. Check out [`gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) for the smaller model.
33
34	`# Highlights`
35
36	`* Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.`
37	`* Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.`
38	`* Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.`
39	`* Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.`
40	`* Agentic capabilities: Use the models’ native capabilities for function calling, [web browsing](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#browser), [Python code execution](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#python), and Structured Outputs.`
41	* MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making `gpt-oss-120b` run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the `gpt-oss-20b` model run within 16GB of memory. All evals were performed with the same MXFP4 quantization.
42
43	`---`
44
45	`# Inference examples`
46
47	`## Transformers`
48
49	You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package.
50
51	`To get started, install the necessary dependencies to setup your environment:`
52
53	```
54	`pip install -U transformers kernels torch`
55	```
56
57	`Once, setup you can proceed to run the model by running the snippet below:`
58
59	```py
60	`from transformers import pipeline`
61	`import torch`
62
63	`model_id = "openai/gpt-oss-120b"`
64
65	`pipe = pipeline(`
66	`"text-generation",`
67	`model=model_id,`
68	`torch_dtype="auto",`
69	`device_map="auto",`
70	`)`
71
72	`messages = [`
73	`{"role": "user", "content": "Explain quantum mechanics clearly and concisely."},`
74	`]`
75
76	`outputs = pipe(`
77	`messages,`
78	`max_new_tokens=256,`
79	`)`
80	`print(outputs[0]["generated_text"][-1])`
81	```
82
83	Alternatively, you can run the model via [`Transformers Serve`](https://huggingface.co/docs/transformers/main/serving) to spin up a OpenAI-compatible webserver:
84
85	```
86	`transformers serve`
87	`transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-120b`
88	```
89
90	`[Learn more about how to use gpt-oss with Transformers.](https://cookbook.openai.com/articles/gpt-oss/run-transformers)`
91
92	`## vLLM`
93
94	`vLLM recommends using [uv](https://docs.astral.sh/uv/) for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.`
95
96	```bash
97	`uv pip install --pre vllm==0.10.1+gptoss \`
98	`--extra-index-url https://wheels.vllm.ai/gpt-oss/ \`
99	`--extra-index-url https://download.pytorch.org/whl/nightly/cu128 \`
100	`--index-strategy unsafe-best-match`
101
102	`vllm serve openai/gpt-oss-120b`
103	```
104
105	`[Learn more about how to use gpt-oss with vLLM.](https://cookbook.openai.com/articles/gpt-oss/run-vllm)`
106
107	`## PyTorch / Triton`
108
109	`To learn about how to use this model with PyTorch and Triton, check out our [reference implementations in the gpt-oss repository](https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-pytorch-implementation).`
110
111	`## Ollama`
112
113	`If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after [installing Ollama](https://ollama.com/download).`
114
115	```bash
116	`# gpt-oss-120b`
117	`ollama pull gpt-oss:120b`
118	`ollama run gpt-oss:120b`
119	```
120
121	`[Learn more about how to use gpt-oss with Ollama.](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)`
122
123	`#### LM Studio`
124
125	`If you are using [LM Studio](https://lmstudio.ai/) you can use the following commands to download.`
126
127	```bash
128	`# gpt-oss-120b`
129	`lms get openai/gpt-oss-120b`
130	```
131
132	`Check out our [awesome list](https://github.com/openai/gpt-oss/blob/main/awesome-gpt-oss.md) for a broader collection of gpt-oss resources and inference partners.`
133
134	`---`
135
136	`# Download the model`
137
138	`You can download the model weights from the [Hugging Face Hub](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) directly from Hugging Face CLI:`
139
140	```shell
141	`# gpt-oss-120b`
142	`huggingface-cli download openai/gpt-oss-120b --include "original/*" --local-dir gpt-oss-120b/`
143	`pip install gpt-oss`
144	`python -m gpt_oss.chat model/`
145	```
146
147	`# Reasoning levels`
148
149	`You can adjust the reasoning level that suits your task across three levels:`
150
151	`* Low: Fast responses for general dialogue.`
152	`* Medium: Balanced speed and detail.`
153	`* High: Deep and detailed analysis.`
154
155	`The reasoning level can be set in the system prompts, e.g., "Reasoning: high".`
156
157	`# Tool use`
158
159	`The gpt-oss models are excellent for:`
160	`* Web browsing (using built-in browsing tools)`
161	`* Function calling with defined schemas`
162	`* Agentic operations like browser tasks`
163
164	`# Fine-tuning`
165
166	`Both gpt-oss models can be fine-tuned for a variety of specialized use cases.`
167
168	This larger model `gpt-oss-120b` can be fine-tuned on a single H100 node, whereas the smaller [`gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) can even be fine-tuned on consumer hardware.
169
170	`# Citation`
171
172	```bibtex
173	`@misc{openai2025gptoss120bgptoss20bmodel,`
174	`title={gpt-oss-120b & gpt-oss-20b Model Card},`
175	`author={OpenAI},`
176	`year={2025},`
177	`eprint={2508.10925},`
178	`archivePrefix={arXiv},`
179	`primaryClass={cs.CL},`
180	`url={https://arxiv.org/abs/2508.10925},`
181	`}`
182	```