README.md
17.1 KB · 478 lines · markdown Raw
1 ---
2 library_name: transformers
3 tags:
4 - text-generation
5 - qwen2.5-coder
6 - rl-swarm
7 - genrl-swarm
8 - grpo
9 - gensyn
10 - trl
11 - code-generation
12 - programming
13 - continuous-training
14 - reinforcement-learning
15 - safetensors
16 - gguf
17 - math
18 - logic
19 - conversational
20 - text-generation-inference
21 - I am tall_tame_panther
22 - python
23 - agent
24 license: mit
25 language:
26 - en
27 base_model:
28 - Qwen/Qwen2.5-Coder-0.5B
29 ---
30
31 <h1 align="center">Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm Agent-ID (tall_tame_panther)</h1>
32
33 <h2 align="center">Gensyn RL-Swarm: Training & GGUF Quantized LLMs for Inference</h2>
34
35 <p align="center">
36 <a href="https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Model-blue" alt="Model"></a>
37 <a href="https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/tree/main"><img src="https://img.shields.io/badge/GGUF-Available-8A2BE2" alt="GGUF"></a>
38 <img src="https://img.shields.io/badge/LLama.cpp-Compatible-orange" alt="llama.cpp">
39 <a href="https://gensyn.ai"><img src="https://img.shields.io/badge/Trained%20with-Gensyn%20RL--Swarm-pink" alt="Gensyn"></a>
40 <a href="https://github.com/gensyn-ai/rl-swarm/releases"><img src="https://img.shields.io/github/v/release/gensyn-ai/rl-swarm?label=Version&color=FF0069" alt="version"></a>
41 <a href="https://github.com/gensyn-ai/rl-swarm/blob/main/LICENSE.TXT"><img src="https://img.shields.io/badge/License-MIT-green" alt="License"></a>
42 </p>
43
44 <div align="center">
45
46 [![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-pink?style=for-the-badge)](https://gensyn.ai)
47
48 </div>
49
50 ---
51
52 ## Model Overview
53
54 Our pick an **experimental (advanced) mode** at this model a continuously trained `Qwen2.5-Coder-0.5B-Instruct` fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Group Relative Policy Optimization)** and supported format **GGUF (llama.cpp)** for enhanced code generation capabilities. **Note: Current training focuses on programming challenges with adaptive weighted sampling**.
55
56 - **Agent ID:** `tall_tame_panther`
57 - **Training Status:** 🟢 LIVE - Model updates automatically every 5-10 minutes
58 - **Auto-Sync GGUF Pipeline Status:** 🟢 LIVE - Commits update automatically every hour
59 - **Current Progress:** Round 13,533+ / 100,000 (13.53%)
60 - **Framework Version:** Gensyn RL-Swarm v0.7.0
61 - **Contract:** SwarmCoordinator v0.4.2
62
63 ## Key Features
64
65 - **Real-time Training**: Continuous learning with distributed RL across Gensyn swarm network
66 - **Adaptive System**: Dynamic quality enhanced and dataset weighting for optimal learning
67 - **Multi-domain Coding**: Trained on MBPP and CodeContests datasets with adaptive sampling
68 - **GGUF Support**: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K)
69 - **llama.cpp Compatible**: Ready for edge deployment and local inference
70 - **BF16 Precision**: Trained with bfloat16 for optimal performance
71 - **TGI Compatible**: Supports Text Generation Inference for production deployment
72 - **Chat Format Support**: Inherits Qwen2.5 chat template for conversational use
73
74 ## Training Data
75
76 The model is trained on a composite dataset with adaptive weighted sampling strategy:
77
78 | Dataset | Initial Weight | Adaptive Range | Focus Area |
79 |---------|----------------|----------------|------------|
80 | MBPP | 5 | 4-6 | Basic Python programming problems with test cases |
81 | CodeContests | 5 | 4-6 | Competitive programming challenges |
82
83 **Total Dataset Size:** Streaming datasets with infinite iteration
84 **Training Samples per Round:** 2
85 **Evaluation:** Real-time via Swarm Coordination with Ollama-based evaluator else Judge
86
87 ## Adaptive Sampling Strategy
88
89 > "When the solvers perform well, the proposer automatically increases the difficulty to keep challenging solvers to get better over time." - CodeZero-blog
90
91 ```diff
92 The implementation features an adaptive sampling system that adjusts dataset weights based on performance
93 The system monitors performance metrics every 5 rounds and adjusts the dataset weights to maintain optimal learning balance
94 - Update dataset weights based on recent performance
95 - Calculate recent average performance for each dataset
96 - Adjust/use weighted sampling if adaptive, based on perform difference
97 - Performance better on MBPP (Mostly Basic Python Problems)
98 - Performance better on CodeContests
99 - Update dataset weights every rounds & keep balanced
100 ```
101
102 ## Adaptive Reward System
103 ### Quality Enhanced Implementation
104
105
106 > "Rewards are derived from multiple lightweight checks, ranging from code validity and formatting to alignment with the problem statement, combined into a single interpretable score." - CodeZero-blog
107
108 ```diff
109 The reward system includes a quality data enhanced mechanism that evaluates code structure and documentation
110 - Calculate quality data enhanced for well-structured code
111 - Documentation enhanced
112 - Structure enhanced
113 - Algorithmic efficiency (simple heuristic)
114 - Scale with base reward to avoid inflation
115 ```
116
117 ### Adaptive Threshold System
118
119
120 ```diff
121 The system also includes an adaptive threshold mechanism that adjusts based on recent performance
122 - Function adaptive threshold based on recent performance
123 - Performance quality data is consistently high
124 ```
125
126 ## Quick Performance Simulation
127 ### Reward Comparison
128
129 Based on our simulation with 1000 samples, the adaptive reward system shows significant improvement
130
131 | System | MBPP Avg Reward | CodeContests Avg Reward | Overall Avg Reward | Improvement |
132 |---------|----------------|------------------------|-------------------|-------------|
133 | Original | 0.234 | -0.156 | 0.039 | - |
134 | Adaptive | 0.312 | -0.098 | 0.107 | ~174% |
135
136 ### Training Progress
137
138 Based on the logs provided, the model shows consistent progress:
139
140 Metric data visualize train/loss by Weights & Biases (WanDB)
141 - Soon LIVE!
142
143 ```
144 [2025-11-14 04:22:50,632][genrl.logging_utils.global_defs][INFO] - __ Joining round: 13053
145 [2025-11-14 04:23:50,633][genrl.logging_utils.global_defs][INFO] - Starting round: 13053/100000.
146 Map: 100%|______________________________________| 1/1 [00:00<00:00, 158.65 examples/s]
147 Map: 100%|______________________________________| 1/1 [00:00<00:00, 191.92 examples/s]
148 [2025-11-14 04:25:12,646][genrl.logging_utils.global_defs][INFO] - pushing model to huggingface
149 Processing Files (1 / 1) : 100%|___| 988MB / 988MB, 94.3MB/s
150 New Data Upload : 100%|___| 983MB / 983MB, 94.3MB/s
151 .....kpb5lid/model.safetensors: 100%|___| 988MB / 988MB, 94.3MB/s
152 [2025-11-14 04:27:01,877][genrl.logging_utils.global_defs][INFO] - Already finished round: 13053. Next check in 160.0s.
153 ```
154
155 ## Quick Start Inferences
156
157 ### Standard Transformers
158
159 ```bash
160 from transformers import AutoModelForCausalLM, AutoTokenizer
161 model = AutoModelForCausalLM.from_pretrained(
162 "0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther",
163 torch_dtype="auto",
164 device_map="auto"
165 )
166 tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
167 prompt = "Write a function to calculate the factorial of a number."
168 inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
169 outputs = model.generate(**inputs, max_length=256, temperature=0.7, top_p=0.8)
170 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
171 ```
172
173 ### Chat Format (Conversational)
174
175 ```bash
176 from transformers import AutoModelForCausalLM, AutoTokenizer
177 model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
178 tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
179 messages = [
180 {"role": "system", "content": "You are an expert Python programmer."},
181 {"role": "user", "content": "Write a function to check if a string is a palindrome."}
182 ]
183 text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
184 inputs = tokenizer(text, return_tensors="pt")
185 outputs = model.generate(**inputs, max_length=512)
186 print(tokenizer.decode(outputs[0]))
187 ```
188
189 ### Text Generation Inference (TGI)
190
191 ```bash
192 docker run -d --gpus all \
193 -p 8080:80 \
194 -v $PWD/data:/data \
195 ghcr.io/huggingface/text-generation-inference:latest \
196 --model-id 0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther \
197 --max-input-length 4096 \
198 --max-total-tokens 8192
199 ```
200
201 ### GGUF with LLAMA.CPP
202
203 ```bash
204 # Download quantized model (recommended: Q4_K_M)
205 wget https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen2.5-Coder-0.5B-Q4_K_M.gguf
206 # Run inference
207 ./llama-cli -m Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf \
208 -p "Write a function to implement binary search in Python." \
209 --temp 0.7 --top-p 0.8
210 ```
211
212 ### Ollama
213
214 ```bash
215 # Create Modelfile
216 cat > Modelfile << 'EOF'
217 FROM ./0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/Qwen2.5-Coder-0.5B-Q4_K_M.gguf
218 PARAMETER temperature 0.7
219 PARAMETER top_p 0.8
220 PARAMETER top_k 20
221 SYSTEM "You are an expert Python programmer who writes clean, documented code."
222 EOF
223 # Create and run
224 ollama create qwen2.5-coder-swarm -f Modelfile
225 ollama run qwen2.5-coder-swarm "Write a function to calculate the factorial of a number."
226 ```
227
228 ## Available GGUF Quantization
229
230 | Format | Size | Precision | Use Case | Download |
231 |--------|------|-----------|----------|----------|
232 | Safetensors (BF16) | 988 MB | BF16 | Full precision training/fine-tuning | `model.safetensors` |
233 | GGUF F16 | 994 MB | FP16 | High quality inference | `Qwen2.5-Coder-0.5B-F16.gguf` |
234 | GGUF Q6_K | 506 MB | 6-bit | High quality compression | `Qwen2.5-Coder-0.5B-Q6_K.gguf` |
235 | GGUF Q5_K_M | 420 MB | 5-bit | Balanced quality/size | `Qwen2.5-Coder-0.5B-Q5_K_M.gguf` |
236 | GGUF Q4_K_M | 398 MB | 4-bit | **Recommended** for production | `Qwen2.5-Coder-0.5B-Q4_K_M.gguf` |
237 | GGUF Q3_K_M | 355 MB | 3-bit | Smallest, fastest | `Qwen2.5-Coder-0.5B-Q3_K_M.gguf` |
238
239 > All GGUF formats are **llama.cpp is compatible** ready to use **Inferences chat** and auto-update be hourly.
240
241
242 ## Chat Format & Conversational
243
244 This model inherits **Qwen2.5's chat template** for structured conversations.
245
246 ### Format Structure
247
248 ```
249 <|im_start|>system
250 {system_message}
251 <|im_end|>
252 <|im_start|>user
253 {user_message}
254 <|im_end|>
255 <|im_start|>assistant
256 {assistant_response}
257 <|im_end|>
258 ```
259
260 ### Chat Template Features
261
262 - **System Instructions**: Guide model behavior with system messages
263 - **Multi-turn Dialogue**: Maintains conversation context
264 - **Tool Calling**: Support function calling (if enabled in training)
265 - **Code Generation**: Optimized for generating Python code
266
267 **Note**: While model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on **programming challenges**.
268
269 ### Gensyn RL-Swarm Quick-Architecture
270
271 ```diff
272 Training Framework:
273 - Method: GRPO (Group Relative Policy Optimization)
274 - Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
275 - Training Regime: bfloat16 mixed precision
276 - Max Rounds: 100000
277 - Update Frequency: Every 5-10 minutes
278 - Generations per Round: 2
279 - Batch size: Combine
280 - Tree-based Model: 2 tree
281 - Seed: 42
282 Blockchain Integration:
283 - Network: Gensyn Testnet
284 - Chain ID: 685685
285 - Contract: SwarmCoordinator v0.4.2
286 Swarm Communication:
287 - Framework: Hivemind P2P Backend
288 - Initial Peers: 3 bootnodes
289 - Beam Size: 10
290 Reward System:
291 - Manager: RewardManager (SwarmGameManager/CodeGenerationRewards)
292 - Reward Function: Adaptive with quality enhanced
293 - Evaluator: Ollama (qwen2.5-coder:1.5b-instruct)
294 - Judge API: https://codezero-judge.gensyn.ai
295 ```
296
297 ## Model Capabilities
298
299 This model excels at:
300
301 1. **Basic Python Programming**: Functions, loops, conditionals, data structures
302 2. **Algorithm Implementation**: Sorting, searching, graph algorithms
303 3. **String Manipulation**: Pattern matching, parsing, formatting
304 4. **Mathematical Functions**: Calculations, conversions, formulas
305 5. **Code Documentation**: Writing clear, commented functions
306 6. **Problem Solving**: Breaking down complex problems into manageable steps
307
308 ## Limitations
309
310 - **Specialized Domain**: Optimized for programming challenges; may underperform on creative writing
311 - **Training in Progress**: Weights update every 5-10 minutes; performance varies
312 - **Scale**: 0.5B parameters - suitable for edge but not SOTA for complex programming
313 - **Experimental**: Decentralized RL training; behavior less predictable than supervised models
314 - **Context**: Best performance within 4K tokens (full 32K supported)
315
316 ## Update Schedule
317
318 | Format | Frequency | Trigger |
319 |--------|-----------|---------|
320 | Safetensors (BF16) | Every 5-10 min | Automatic via RL-Swarm |
321 | GGUF (all formats) | Every 3 hour | Auto-conversion pipeline |
322
323 **Auto-Conversion Pipeline:**
324
325 1. Monitors repo for new training commits
326 2. Downloads latest `model.safetensors`
327 3. Converts to F16 GGUF base
328 4. Quantizes to Q3_K_M, Q4_K_M, Q5_K_M, Q6_K
329 5. Standar formats
330
331 Check commit history for exact timestamps.
332
333 ### Architecture Components
334
335 1. **Game Manager**: Orchestrates training rounds and swarm coordination
336 2. **Trainer**: GRPO implementation for policy optimization
337 3. **Data Manager**: Dataset loading with adaptive weighted sampling
338 4. **Reward Manager**: Computes rewards via Ollama evaluator with quality enhanced
339 5. **Coordinator**: Blockchain integration for swarm state
340 6. **P2P Backend**: Hivemind DHT for model sharing
341
342 ### Training Process
343
344 ```
345 1. Agent joins swarm via P2P network
346 2. Coordinator assigns round via smart contract
347 3. Agent samples data from adaptive weighted datasets
348 4. Model generates 2 responses
349 5. Ollama evaluator assesses and assigns rewards with quality enhanced
350 6. GRPO updates policy based on rewards
351 7. Updated model shared via DHT
352 8. Best checkpoint saved to HuggingFace
353 9. Repeat
354 ```
355
356 ### Decentralization Benefits
357
358 - **Fault Tolerance**: Multiple agents; no single point of failure
359 - **Diverse Exploration**: Different agents explore different strategies
360 - **Collective Intelligence**: Agents learn from each other
361 - **Transparent**: All rounds verified on-chain
362
363 ### Software Stack
364
365 - **Framework**: Gensyn RL-Swarm v0.7.0
366 - **Library**: transformers v4.57.1
367 - **P2P**: hivemind
368 - **Blockchain**: Gensyn testnet
369 - **Config**: Hydra + OmegaConf
370 - **Logging**: WandB integration
371
372 ### Hardware Requirements
373
374 **Training GPU:**
375 - GPU: NVIDIA 4090 24GB+ (BF16 training)
376 - RAM: 16GB+
377 - Cores: 10+
378 - Storage: 50GB SSD
379 - Network: High bandwidth for P2P
380
381 **Training CPU Optimize:**
382 - CPU: INTEL or AMD
383 - Cores: 10+
384 - RAM: 16GB+
385 - Storage: 50GB SSD
386 - Network: High bandwidth for P2P
387
388 **Inference:**
389 - Safetensors: 8GB VRAM (GPU) / 16GB RAM (CPU)
390 - GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU)
391 - GGUF Q3_K_M: 3GB RAM (CPU-only)
392
393 ### Training Progress Metrics
394
395 | Metric | Value | Target |
396 |--------|-------|--------|
397 | Completed Rounds | 13,533+ | 100,000 |
398 | Training Progress | 13.53% | 100% |
399 | Update Frequency | 5-10 min | Continuous |
400
401 **Note**: **average\@k:** Average performance across `k` attempts, measuring consistency. **pass\@k:** Probability of at least one correct solution in `k` attempts, measuring capability.Current metrics track training rounds completed in decentralized swarm.
402
403 ### Adaptive Reward Performance
404
405 Our adaptive reward system has shown approximately ~174% improvement in reward scores compared to the baseline system:
406
407 ```
408 Original:
409 Overall Avg Reward: 0.039
410 MBPP Avg Reward: 0.234
411 CodeContests Avg Reward: -0.156
412 Adaptive:
413 Overall Avg Reward: 0.107
414 MBPP Avg Reward: 0.312
415 CodeContests Avg Reward: -0.098
416 Improvement: 0.068 (~174% increase)
417 ```
418
419 ## Citation
420
421 ```
422 @misc{qwen2.5-coder-gensyn-swarm-2025,
423 author = {0xgrey},
424 title = {Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm: Continuous RL Training on Distributed Swarm with Adaptive Rewards},
425 year = {2025},
426 publisher = {HuggingFace},
427 howpublished = {\url{https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther}},
428 note = {Agent ID: tall\_tame\_panther}
429 }
430 @misc{gensyn-rl-swarm-2025,
431 title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework},
432 author = {Gensyn AI},
433 year = {2025},
434 url = {https://gensyn.ai}
435 }
436 @misc{codezero-2025,
437 title = {CodeZero: A Collaborative Coding Environment for Distributed RL},
438 author = {Gensyn AI},
439 year = {2025},
440 url = {https://docs.gensyn.ai/testnet/rl-swarm/how-it-works/codezero}
441 }
442 ```
443
444 ## References
445
446 - **Gensyn Documentation**: https://docs.gensyn.ai/
447 - **Gensyn GitHub**: https://github.com/gensyn-ai
448 - **RL-Swarm Contracts**: https://github.com/gensyn-ai/rl-swarm-contracts
449 - **Qwen2.5-Coder Model Card**: https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct
450 - **MBPP Dataset**: https://huggingface.co/datasets/google-research-datasets/mbpp
451 - **CodeContests Dataset**: https://huggingface.co/datasets/deepmind/code_contests
452 - **arXiv:1910.09700**: ML Carbon Emissions methodology
453
454
455 ## Contact
456
457 - **Developer**: 0xgrey
458 - **Agent ID**: tall_tame_panther
459 - **Community**: [Gensyn Discord](https://discord.gg/gensyn)
460
461
462 **⚠️ Important**: This is a continuously trained model. For reproducibility, specify commit hash:
463
464 ```
465 git clone https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther
466 cd Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther
467 git checkout <commit-hash>
468 ```
469
470 ---
471
472 <div align="center">
473
474 **Trained with 🩷 using Gensyn RL-Swarm**
475
476 [![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-pink?style=for-the-badge)](https://gensyn.ai)
477
478 </div>