README.md
| 1 | --- |
| 2 | library_name: transformers |
| 3 | tags: |
| 4 | - text-generation |
| 5 | - qwen2.5-coder |
| 6 | - rl-swarm |
| 7 | - genrl-swarm |
| 8 | - grpo |
| 9 | - gensyn |
| 10 | - trl |
| 11 | - code-generation |
| 12 | - programming |
| 13 | - continuous-training |
| 14 | - reinforcement-learning |
| 15 | - safetensors |
| 16 | - gguf |
| 17 | - math |
| 18 | - logic |
| 19 | - conversational |
| 20 | - text-generation-inference |
| 21 | - I am tall_tame_panther |
| 22 | - python |
| 23 | - agent |
| 24 | license: mit |
| 25 | language: |
| 26 | - en |
| 27 | base_model: |
| 28 | - Qwen/Qwen2.5-Coder-0.5B |
| 29 | --- |
| 30 | |
| 31 | <h1 align="center">Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm Agent-ID (tall_tame_panther)</h1> |
| 32 | |
| 33 | <h2 align="center">Gensyn RL-Swarm: Training & GGUF Quantized LLMs for Inference</h2> |
| 34 | |
| 35 | <p align="center"> |
| 36 | <a href="https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Model-blue" alt="Model"></a> |
| 37 | <a href="https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/tree/main"><img src="https://img.shields.io/badge/GGUF-Available-8A2BE2" alt="GGUF"></a> |
| 38 | <img src="https://img.shields.io/badge/LLama.cpp-Compatible-orange" alt="llama.cpp"> |
| 39 | <a href="https://gensyn.ai"><img src="https://img.shields.io/badge/Trained%20with-Gensyn%20RL--Swarm-pink" alt="Gensyn"></a> |
| 40 | <a href="https://github.com/gensyn-ai/rl-swarm/releases"><img src="https://img.shields.io/github/v/release/gensyn-ai/rl-swarm?label=Version&color=FF0069" alt="version"></a> |
| 41 | <a href="https://github.com/gensyn-ai/rl-swarm/blob/main/LICENSE.TXT"><img src="https://img.shields.io/badge/License-MIT-green" alt="License"></a> |
| 42 | </p> |
| 43 | |
| 44 | <div align="center"> |
| 45 | |
| 46 | [](https://gensyn.ai) |
| 47 | |
| 48 | </div> |
| 49 | |
| 50 | --- |
| 51 | |
| 52 | ## Model Overview |
| 53 | |
| 54 | Our pick an **experimental (advanced) mode** at this model a continuously trained `Qwen2.5-Coder-0.5B-Instruct` fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Group Relative Policy Optimization)** and supported format **GGUF (llama.cpp)** for enhanced code generation capabilities. **Note: Current training focuses on programming challenges with adaptive weighted sampling**. |
| 55 | |
| 56 | - **Agent ID:** `tall_tame_panther` |
| 57 | - **Training Status:** 🟢 LIVE - Model updates automatically every 5-10 minutes |
| 58 | - **Auto-Sync GGUF Pipeline Status:** 🟢 LIVE - Commits update automatically every hour |
| 59 | - **Current Progress:** Round 13,533+ / 100,000 (13.53%) |
| 60 | - **Framework Version:** Gensyn RL-Swarm v0.7.0 |
| 61 | - **Contract:** SwarmCoordinator v0.4.2 |
| 62 | |
| 63 | ## Key Features |
| 64 | |
| 65 | - **Real-time Training**: Continuous learning with distributed RL across Gensyn swarm network |
| 66 | - **Adaptive System**: Dynamic quality enhanced and dataset weighting for optimal learning |
| 67 | - **Multi-domain Coding**: Trained on MBPP and CodeContests datasets with adaptive sampling |
| 68 | - **GGUF Support**: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K) |
| 69 | - **llama.cpp Compatible**: Ready for edge deployment and local inference |
| 70 | - **BF16 Precision**: Trained with bfloat16 for optimal performance |
| 71 | - **TGI Compatible**: Supports Text Generation Inference for production deployment |
| 72 | - **Chat Format Support**: Inherits Qwen2.5 chat template for conversational use |
| 73 | |
| 74 | ## Training Data |
| 75 | |
| 76 | The model is trained on a composite dataset with adaptive weighted sampling strategy: |
| 77 | |
| 78 | | Dataset | Initial Weight | Adaptive Range | Focus Area | |
| 79 | |---------|----------------|----------------|------------| |
| 80 | | MBPP | 5 | 4-6 | Basic Python programming problems with test cases | |
| 81 | | CodeContests | 5 | 4-6 | Competitive programming challenges | |
| 82 | |
| 83 | **Total Dataset Size:** Streaming datasets with infinite iteration |
| 84 | **Training Samples per Round:** 2 |
| 85 | **Evaluation:** Real-time via Swarm Coordination with Ollama-based evaluator else Judge |
| 86 | |
| 87 | ## Adaptive Sampling Strategy |
| 88 | |
| 89 | > "When the solvers perform well, the proposer automatically increases the difficulty to keep challenging solvers to get better over time." - CodeZero-blog |
| 90 | |
| 91 | ```diff |
| 92 | The implementation features an adaptive sampling system that adjusts dataset weights based on performance |
| 93 | The system monitors performance metrics every 5 rounds and adjusts the dataset weights to maintain optimal learning balance |
| 94 | - Update dataset weights based on recent performance |
| 95 | - Calculate recent average performance for each dataset |
| 96 | - Adjust/use weighted sampling if adaptive, based on perform difference |
| 97 | - Performance better on MBPP (Mostly Basic Python Problems) |
| 98 | - Performance better on CodeContests |
| 99 | - Update dataset weights every rounds & keep balanced |
| 100 | ``` |
| 101 | |
| 102 | ## Adaptive Reward System |
| 103 | ### Quality Enhanced Implementation |
| 104 | |
| 105 | |
| 106 | > "Rewards are derived from multiple lightweight checks, ranging from code validity and formatting to alignment with the problem statement, combined into a single interpretable score." - CodeZero-blog |
| 107 | |
| 108 | ```diff |
| 109 | The reward system includes a quality data enhanced mechanism that evaluates code structure and documentation |
| 110 | - Calculate quality data enhanced for well-structured code |
| 111 | - Documentation enhanced |
| 112 | - Structure enhanced |
| 113 | - Algorithmic efficiency (simple heuristic) |
| 114 | - Scale with base reward to avoid inflation |
| 115 | ``` |
| 116 | |
| 117 | ### Adaptive Threshold System |
| 118 | |
| 119 | |
| 120 | ```diff |
| 121 | The system also includes an adaptive threshold mechanism that adjusts based on recent performance |
| 122 | - Function adaptive threshold based on recent performance |
| 123 | - Performance quality data is consistently high |
| 124 | ``` |
| 125 | |
| 126 | ## Quick Performance Simulation |
| 127 | ### Reward Comparison |
| 128 | |
| 129 | Based on our simulation with 1000 samples, the adaptive reward system shows significant improvement |
| 130 | |
| 131 | | System | MBPP Avg Reward | CodeContests Avg Reward | Overall Avg Reward | Improvement | |
| 132 | |---------|----------------|------------------------|-------------------|-------------| |
| 133 | | Original | 0.234 | -0.156 | 0.039 | - | |
| 134 | | Adaptive | 0.312 | -0.098 | 0.107 | ~174% | |
| 135 | |
| 136 | ### Training Progress |
| 137 | |
| 138 | Based on the logs provided, the model shows consistent progress: |
| 139 | |
| 140 | Metric data visualize train/loss by Weights & Biases (WanDB) |
| 141 | - Soon LIVE! |
| 142 | |
| 143 | ``` |
| 144 | [2025-11-14 04:22:50,632][genrl.logging_utils.global_defs][INFO] - __ Joining round: 13053 |
| 145 | [2025-11-14 04:23:50,633][genrl.logging_utils.global_defs][INFO] - Starting round: 13053/100000. |
| 146 | Map: 100%|______________________________________| 1/1 [00:00<00:00, 158.65 examples/s] |
| 147 | Map: 100%|______________________________________| 1/1 [00:00<00:00, 191.92 examples/s] |
| 148 | [2025-11-14 04:25:12,646][genrl.logging_utils.global_defs][INFO] - pushing model to huggingface |
| 149 | Processing Files (1 / 1) : 100%|___| 988MB / 988MB, 94.3MB/s |
| 150 | New Data Upload : 100%|___| 983MB / 983MB, 94.3MB/s |
| 151 | .....kpb5lid/model.safetensors: 100%|___| 988MB / 988MB, 94.3MB/s |
| 152 | [2025-11-14 04:27:01,877][genrl.logging_utils.global_defs][INFO] - Already finished round: 13053. Next check in 160.0s. |
| 153 | ``` |
| 154 | |
| 155 | ## Quick Start Inferences |
| 156 | |
| 157 | ### Standard Transformers |
| 158 | |
| 159 | ```bash |
| 160 | from transformers import AutoModelForCausalLM, AutoTokenizer |
| 161 | model = AutoModelForCausalLM.from_pretrained( |
| 162 | "0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther", |
| 163 | torch_dtype="auto", |
| 164 | device_map="auto" |
| 165 | ) |
| 166 | tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther") |
| 167 | prompt = "Write a function to calculate the factorial of a number." |
| 168 | inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| 169 | outputs = model.generate(**inputs, max_length=256, temperature=0.7, top_p=0.8) |
| 170 | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| 171 | ``` |
| 172 | |
| 173 | ### Chat Format (Conversational) |
| 174 | |
| 175 | ```bash |
| 176 | from transformers import AutoModelForCausalLM, AutoTokenizer |
| 177 | model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther") |
| 178 | tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther") |
| 179 | messages = [ |
| 180 | {"role": "system", "content": "You are an expert Python programmer."}, |
| 181 | {"role": "user", "content": "Write a function to check if a string is a palindrome."} |
| 182 | ] |
| 183 | text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| 184 | inputs = tokenizer(text, return_tensors="pt") |
| 185 | outputs = model.generate(**inputs, max_length=512) |
| 186 | print(tokenizer.decode(outputs[0])) |
| 187 | ``` |
| 188 | |
| 189 | ### Text Generation Inference (TGI) |
| 190 | |
| 191 | ```bash |
| 192 | docker run -d --gpus all \ |
| 193 | -p 8080:80 \ |
| 194 | -v $PWD/data:/data \ |
| 195 | ghcr.io/huggingface/text-generation-inference:latest \ |
| 196 | --model-id 0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther \ |
| 197 | --max-input-length 4096 \ |
| 198 | --max-total-tokens 8192 |
| 199 | ``` |
| 200 | |
| 201 | ### GGUF with LLAMA.CPP |
| 202 | |
| 203 | ```bash |
| 204 | # Download quantized model (recommended: Q4_K_M) |
| 205 | wget https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen2.5-Coder-0.5B-Q4_K_M.gguf |
| 206 | # Run inference |
| 207 | ./llama-cli -m Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf \ |
| 208 | -p "Write a function to implement binary search in Python." \ |
| 209 | --temp 0.7 --top-p 0.8 |
| 210 | ``` |
| 211 | |
| 212 | ### Ollama |
| 213 | |
| 214 | ```bash |
| 215 | # Create Modelfile |
| 216 | cat > Modelfile << 'EOF' |
| 217 | FROM ./0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/Qwen2.5-Coder-0.5B-Q4_K_M.gguf |
| 218 | PARAMETER temperature 0.7 |
| 219 | PARAMETER top_p 0.8 |
| 220 | PARAMETER top_k 20 |
| 221 | SYSTEM "You are an expert Python programmer who writes clean, documented code." |
| 222 | EOF |
| 223 | # Create and run |
| 224 | ollama create qwen2.5-coder-swarm -f Modelfile |
| 225 | ollama run qwen2.5-coder-swarm "Write a function to calculate the factorial of a number." |
| 226 | ``` |
| 227 | |
| 228 | ## Available GGUF Quantization |
| 229 | |
| 230 | | Format | Size | Precision | Use Case | Download | |
| 231 | |--------|------|-----------|----------|----------| |
| 232 | | Safetensors (BF16) | 988 MB | BF16 | Full precision training/fine-tuning | `model.safetensors` | |
| 233 | | GGUF F16 | 994 MB | FP16 | High quality inference | `Qwen2.5-Coder-0.5B-F16.gguf` | |
| 234 | | GGUF Q6_K | 506 MB | 6-bit | High quality compression | `Qwen2.5-Coder-0.5B-Q6_K.gguf` | |
| 235 | | GGUF Q5_K_M | 420 MB | 5-bit | Balanced quality/size | `Qwen2.5-Coder-0.5B-Q5_K_M.gguf` | |
| 236 | | GGUF Q4_K_M | 398 MB | 4-bit | **Recommended** for production | `Qwen2.5-Coder-0.5B-Q4_K_M.gguf` | |
| 237 | | GGUF Q3_K_M | 355 MB | 3-bit | Smallest, fastest | `Qwen2.5-Coder-0.5B-Q3_K_M.gguf` | |
| 238 | |
| 239 | > All GGUF formats are **llama.cpp is compatible** ready to use **Inferences chat** and auto-update be hourly. |
| 240 | |
| 241 | |
| 242 | ## Chat Format & Conversational |
| 243 | |
| 244 | This model inherits **Qwen2.5's chat template** for structured conversations. |
| 245 | |
| 246 | ### Format Structure |
| 247 | |
| 248 | ``` |
| 249 | <|im_start|>system |
| 250 | {system_message} |
| 251 | <|im_end|> |
| 252 | <|im_start|>user |
| 253 | {user_message} |
| 254 | <|im_end|> |
| 255 | <|im_start|>assistant |
| 256 | {assistant_response} |
| 257 | <|im_end|> |
| 258 | ``` |
| 259 | |
| 260 | ### Chat Template Features |
| 261 | |
| 262 | - **System Instructions**: Guide model behavior with system messages |
| 263 | - **Multi-turn Dialogue**: Maintains conversation context |
| 264 | - **Tool Calling**: Support function calling (if enabled in training) |
| 265 | - **Code Generation**: Optimized for generating Python code |
| 266 | |
| 267 | **Note**: While model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on **programming challenges**. |
| 268 | |
| 269 | ### Gensyn RL-Swarm Quick-Architecture |
| 270 | |
| 271 | ```diff |
| 272 | Training Framework: |
| 273 | - Method: GRPO (Group Relative Policy Optimization) |
| 274 | - Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct |
| 275 | - Training Regime: bfloat16 mixed precision |
| 276 | - Max Rounds: 100000 |
| 277 | - Update Frequency: Every 5-10 minutes |
| 278 | - Generations per Round: 2 |
| 279 | - Batch size: Combine |
| 280 | - Tree-based Model: 2 tree |
| 281 | - Seed: 42 |
| 282 | Blockchain Integration: |
| 283 | - Network: Gensyn Testnet |
| 284 | - Chain ID: 685685 |
| 285 | - Contract: SwarmCoordinator v0.4.2 |
| 286 | Swarm Communication: |
| 287 | - Framework: Hivemind P2P Backend |
| 288 | - Initial Peers: 3 bootnodes |
| 289 | - Beam Size: 10 |
| 290 | Reward System: |
| 291 | - Manager: RewardManager (SwarmGameManager/CodeGenerationRewards) |
| 292 | - Reward Function: Adaptive with quality enhanced |
| 293 | - Evaluator: Ollama (qwen2.5-coder:1.5b-instruct) |
| 294 | - Judge API: https://codezero-judge.gensyn.ai |
| 295 | ``` |
| 296 | |
| 297 | ## Model Capabilities |
| 298 | |
| 299 | This model excels at: |
| 300 | |
| 301 | 1. **Basic Python Programming**: Functions, loops, conditionals, data structures |
| 302 | 2. **Algorithm Implementation**: Sorting, searching, graph algorithms |
| 303 | 3. **String Manipulation**: Pattern matching, parsing, formatting |
| 304 | 4. **Mathematical Functions**: Calculations, conversions, formulas |
| 305 | 5. **Code Documentation**: Writing clear, commented functions |
| 306 | 6. **Problem Solving**: Breaking down complex problems into manageable steps |
| 307 | |
| 308 | ## Limitations |
| 309 | |
| 310 | - **Specialized Domain**: Optimized for programming challenges; may underperform on creative writing |
| 311 | - **Training in Progress**: Weights update every 5-10 minutes; performance varies |
| 312 | - **Scale**: 0.5B parameters - suitable for edge but not SOTA for complex programming |
| 313 | - **Experimental**: Decentralized RL training; behavior less predictable than supervised models |
| 314 | - **Context**: Best performance within 4K tokens (full 32K supported) |
| 315 | |
| 316 | ## Update Schedule |
| 317 | |
| 318 | | Format | Frequency | Trigger | |
| 319 | |--------|-----------|---------| |
| 320 | | Safetensors (BF16) | Every 5-10 min | Automatic via RL-Swarm | |
| 321 | | GGUF (all formats) | Every 3 hour | Auto-conversion pipeline | |
| 322 | |
| 323 | **Auto-Conversion Pipeline:** |
| 324 | |
| 325 | 1. Monitors repo for new training commits |
| 326 | 2. Downloads latest `model.safetensors` |
| 327 | 3. Converts to F16 GGUF base |
| 328 | 4. Quantizes to Q3_K_M, Q4_K_M, Q5_K_M, Q6_K |
| 329 | 5. Standar formats |
| 330 | |
| 331 | Check commit history for exact timestamps. |
| 332 | |
| 333 | ### Architecture Components |
| 334 | |
| 335 | 1. **Game Manager**: Orchestrates training rounds and swarm coordination |
| 336 | 2. **Trainer**: GRPO implementation for policy optimization |
| 337 | 3. **Data Manager**: Dataset loading with adaptive weighted sampling |
| 338 | 4. **Reward Manager**: Computes rewards via Ollama evaluator with quality enhanced |
| 339 | 5. **Coordinator**: Blockchain integration for swarm state |
| 340 | 6. **P2P Backend**: Hivemind DHT for model sharing |
| 341 | |
| 342 | ### Training Process |
| 343 | |
| 344 | ``` |
| 345 | 1. Agent joins swarm via P2P network |
| 346 | 2. Coordinator assigns round via smart contract |
| 347 | 3. Agent samples data from adaptive weighted datasets |
| 348 | 4. Model generates 2 responses |
| 349 | 5. Ollama evaluator assesses and assigns rewards with quality enhanced |
| 350 | 6. GRPO updates policy based on rewards |
| 351 | 7. Updated model shared via DHT |
| 352 | 8. Best checkpoint saved to HuggingFace |
| 353 | 9. Repeat |
| 354 | ``` |
| 355 | |
| 356 | ### Decentralization Benefits |
| 357 | |
| 358 | - **Fault Tolerance**: Multiple agents; no single point of failure |
| 359 | - **Diverse Exploration**: Different agents explore different strategies |
| 360 | - **Collective Intelligence**: Agents learn from each other |
| 361 | - **Transparent**: All rounds verified on-chain |
| 362 | |
| 363 | ### Software Stack |
| 364 | |
| 365 | - **Framework**: Gensyn RL-Swarm v0.7.0 |
| 366 | - **Library**: transformers v4.57.1 |
| 367 | - **P2P**: hivemind |
| 368 | - **Blockchain**: Gensyn testnet |
| 369 | - **Config**: Hydra + OmegaConf |
| 370 | - **Logging**: WandB integration |
| 371 | |
| 372 | ### Hardware Requirements |
| 373 | |
| 374 | **Training GPU:** |
| 375 | - GPU: NVIDIA 4090 24GB+ (BF16 training) |
| 376 | - RAM: 16GB+ |
| 377 | - Cores: 10+ |
| 378 | - Storage: 50GB SSD |
| 379 | - Network: High bandwidth for P2P |
| 380 | |
| 381 | **Training CPU Optimize:** |
| 382 | - CPU: INTEL or AMD |
| 383 | - Cores: 10+ |
| 384 | - RAM: 16GB+ |
| 385 | - Storage: 50GB SSD |
| 386 | - Network: High bandwidth for P2P |
| 387 | |
| 388 | **Inference:** |
| 389 | - Safetensors: 8GB VRAM (GPU) / 16GB RAM (CPU) |
| 390 | - GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU) |
| 391 | - GGUF Q3_K_M: 3GB RAM (CPU-only) |
| 392 | |
| 393 | ### Training Progress Metrics |
| 394 | |
| 395 | | Metric | Value | Target | |
| 396 | |--------|-------|--------| |
| 397 | | Completed Rounds | 13,533+ | 100,000 | |
| 398 | | Training Progress | 13.53% | 100% | |
| 399 | | Update Frequency | 5-10 min | Continuous | |
| 400 | |
| 401 | **Note**: **average\@k:** Average performance across `k` attempts, measuring consistency. **pass\@k:** Probability of at least one correct solution in `k` attempts, measuring capability.Current metrics track training rounds completed in decentralized swarm. |
| 402 | |
| 403 | ### Adaptive Reward Performance |
| 404 | |
| 405 | Our adaptive reward system has shown approximately ~174% improvement in reward scores compared to the baseline system: |
| 406 | |
| 407 | ``` |
| 408 | Original: |
| 409 | Overall Avg Reward: 0.039 |
| 410 | MBPP Avg Reward: 0.234 |
| 411 | CodeContests Avg Reward: -0.156 |
| 412 | Adaptive: |
| 413 | Overall Avg Reward: 0.107 |
| 414 | MBPP Avg Reward: 0.312 |
| 415 | CodeContests Avg Reward: -0.098 |
| 416 | Improvement: 0.068 (~174% increase) |
| 417 | ``` |
| 418 | |
| 419 | ## Citation |
| 420 | |
| 421 | ``` |
| 422 | @misc{qwen2.5-coder-gensyn-swarm-2025, |
| 423 | author = {0xgrey}, |
| 424 | title = {Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm: Continuous RL Training on Distributed Swarm with Adaptive Rewards}, |
| 425 | year = {2025}, |
| 426 | publisher = {HuggingFace}, |
| 427 | howpublished = {\url{https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther}}, |
| 428 | note = {Agent ID: tall\_tame\_panther} |
| 429 | } |
| 430 | @misc{gensyn-rl-swarm-2025, |
| 431 | title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework}, |
| 432 | author = {Gensyn AI}, |
| 433 | year = {2025}, |
| 434 | url = {https://gensyn.ai} |
| 435 | } |
| 436 | @misc{codezero-2025, |
| 437 | title = {CodeZero: A Collaborative Coding Environment for Distributed RL}, |
| 438 | author = {Gensyn AI}, |
| 439 | year = {2025}, |
| 440 | url = {https://docs.gensyn.ai/testnet/rl-swarm/how-it-works/codezero} |
| 441 | } |
| 442 | ``` |
| 443 | |
| 444 | ## References |
| 445 | |
| 446 | - **Gensyn Documentation**: https://docs.gensyn.ai/ |
| 447 | - **Gensyn GitHub**: https://github.com/gensyn-ai |
| 448 | - **RL-Swarm Contracts**: https://github.com/gensyn-ai/rl-swarm-contracts |
| 449 | - **Qwen2.5-Coder Model Card**: https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct |
| 450 | - **MBPP Dataset**: https://huggingface.co/datasets/google-research-datasets/mbpp |
| 451 | - **CodeContests Dataset**: https://huggingface.co/datasets/deepmind/code_contests |
| 452 | - **arXiv:1910.09700**: ML Carbon Emissions methodology |
| 453 | |
| 454 | |
| 455 | ## Contact |
| 456 | |
| 457 | - **Developer**: 0xgrey |
| 458 | - **Agent ID**: tall_tame_panther |
| 459 | - **Community**: [Gensyn Discord](https://discord.gg/gensyn) |
| 460 | |
| 461 | |
| 462 | **⚠️ Important**: This is a continuously trained model. For reproducibility, specify commit hash: |
| 463 | |
| 464 | ``` |
| 465 | git clone https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther |
| 466 | cd Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther |
| 467 | git checkout <commit-hash> |
| 468 | ``` |
| 469 | |
| 470 | --- |
| 471 | |
| 472 | <div align="center"> |
| 473 | |
| 474 | **Trained with 🩷 using Gensyn RL-Swarm** |
| 475 | |
| 476 | [](https://gensyn.ai) |
| 477 | |
| 478 | </div> |