README.md · Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther

README.md

17.1 KB · 478 lines · markdown Raw

1	`---`
2	`library_name: transformers`
3	`tags:`
4	`- text-generation`
5	`- qwen2.5-coder`
6	`- rl-swarm`
7	`- genrl-swarm`
8	`- grpo`
9	`- gensyn`
10	`- trl`
11	`- code-generation`
12	`- programming`
13	`- continuous-training`
14	`- reinforcement-learning`
15	`- safetensors`
16	`- gguf`
17	`- math`
18	`- logic`
19	`- conversational`
20	`- text-generation-inference`
21	`- I am tall_tame_panther`
22	`- python`
23	`- agent`
24	`license: mit`
25	`language:`
26	`- en`
27	`base_model:`
28	`- Qwen/Qwen2.5-Coder-0.5B`
29	`---`
30
31	`<h1 align="center">Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm Agent-ID (tall_tame_panther)</h1>`
32
33	`<h2 align="center">Gensyn RL-Swarm: Training & GGUF Quantized LLMs for Inference</h2>`
34
35	`<p align="center">`
36	`<a href="https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Model-blue" alt="Model"></a>`
37	`<a href="https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/tree/main"><img src="https://img.shields.io/badge/GGUF-Available-8A2BE2" alt="GGUF"></a>`
38	`<img src="https://img.shields.io/badge/LLama.cpp-Compatible-orange" alt="llama.cpp">`
39	`<a href="https://gensyn.ai"><img src="https://img.shields.io/badge/Trained%20with-Gensyn%20RL--Swarm-pink" alt="Gensyn"></a>`
40	`<a href="https://github.com/gensyn-ai/rl-swarm/releases"><img src="https://img.shields.io/github/v/release/gensyn-ai/rl-swarm?label=Version&color=FF0069" alt="version"></a>`
41	`<a href="https://github.com/gensyn-ai/rl-swarm/blob/main/LICENSE.TXT"><img src="https://img.shields.io/badge/License-MIT-green" alt="License"></a>`
42	`</p>`
43
44	`<div align="center">`
45
46	`[![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-pink?style=for-the-badge)](https://gensyn.ai)`
47
48	`</div>`
49
50	`---`
51
52	`## Model Overview`
53
54	Our pick an experimental (advanced) mode at this model a continuously trained `Qwen2.5-Coder-0.5B-Instruct` fine-tuned using Gensyn RL-Swarm framework with GRPO (Group Relative Policy Optimization) and supported format GGUF (llama.cpp) for enhanced code generation capabilities. Note: Current training focuses on programming challenges with adaptive weighted sampling.
55
56	- Agent ID: `tall_tame_panther`
57	`- Training Status: 🟢 LIVE - Model updates automatically every 5-10 minutes`
58	`- Auto-Sync GGUF Pipeline Status: 🟢 LIVE - Commits update automatically every hour`
59	`- Current Progress: Round 13,533+ / 100,000 (13.53%)`
60	`- Framework Version: Gensyn RL-Swarm v0.7.0`
61	`- Contract: SwarmCoordinator v0.4.2`
62
63	`## Key Features`
64
65	`- Real-time Training: Continuous learning with distributed RL across Gensyn swarm network`
66	`- Adaptive System: Dynamic quality enhanced and dataset weighting for optimal learning`
67	`- Multi-domain Coding: Trained on MBPP and CodeContests datasets with adaptive sampling`
68	`- GGUF Support: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K)`
69	`- llama.cpp Compatible: Ready for edge deployment and local inference`
70	`- BF16 Precision: Trained with bfloat16 for optimal performance`
71	`- TGI Compatible: Supports Text Generation Inference for production deployment`
72	`- Chat Format Support: Inherits Qwen2.5 chat template for conversational use`
73
74	`## Training Data`
75
76	`The model is trained on a composite dataset with adaptive weighted sampling strategy:`
77
78	`\| Dataset \| Initial Weight \| Adaptive Range \| Focus Area \|`
79	`\|---------\|----------------\|----------------\|------------\|`
80	`\| MBPP \| 5 \| 4-6 \| Basic Python programming problems with test cases \|`
81	`\| CodeContests \| 5 \| 4-6 \| Competitive programming challenges \|`
82
83	`Total Dataset Size: Streaming datasets with infinite iteration`
84	`Training Samples per Round: 2`
85	`Evaluation: Real-time via Swarm Coordination with Ollama-based evaluator else Judge`
86
87	`## Adaptive Sampling Strategy`
88
89	`> "When the solvers perform well, the proposer automatically increases the difficulty to keep challenging solvers to get better over time." - CodeZero-blog`
90
91	```diff
92	`The implementation features an adaptive sampling system that adjusts dataset weights based on performance`
93	`The system monitors performance metrics every 5 rounds and adjusts the dataset weights to maintain optimal learning balance`
94	`- Update dataset weights based on recent performance`
95	`- Calculate recent average performance for each dataset`
96	`- Adjust/use weighted sampling if adaptive, based on perform difference`
97	`- Performance better on MBPP (Mostly Basic Python Problems)`
98	`- Performance better on CodeContests`
99	`- Update dataset weights every rounds & keep balanced`
100	```
101
102	`## Adaptive Reward System`
103	`### Quality Enhanced Implementation`
104
105
106	`> "Rewards are derived from multiple lightweight checks, ranging from code validity and formatting to alignment with the problem statement, combined into a single interpretable score." - CodeZero-blog`
107
108	```diff
109	`The reward system includes a quality data enhanced mechanism that evaluates code structure and documentation`
110	`- Calculate quality data enhanced for well-structured code`
111	`- Documentation enhanced`
112	`- Structure enhanced`
113	`- Algorithmic efficiency (simple heuristic)`
114	`- Scale with base reward to avoid inflation`
115	```
116
117	`### Adaptive Threshold System`
118
119
120	```diff
121	`The system also includes an adaptive threshold mechanism that adjusts based on recent performance`
122	`- Function adaptive threshold based on recent performance`
123	`- Performance quality data is consistently high`
124	```
125
126	`## Quick Performance Simulation`
127	`### Reward Comparison`
128
129	`Based on our simulation with 1000 samples, the adaptive reward system shows significant improvement`
130
131	`\| System \| MBPP Avg Reward \| CodeContests Avg Reward \| Overall Avg Reward \| Improvement \|`
132	`\|---------\|----------------\|------------------------\|-------------------\|-------------\|`
133	`\| Original \| 0.234 \| -0.156 \| 0.039 \| - \|`
134	`\| Adaptive \| 0.312 \| -0.098 \| 0.107 \| ~174% \|`
135
136	`### Training Progress`
137
138	`Based on the logs provided, the model shows consistent progress:`
139
140	`Metric data visualize train/loss by Weights & Biases (WanDB)`
141	`- Soon LIVE!`
142
143	```
144	`[2025-11-14 04:22:50,632][genrl.logging_utils.global_defs][INFO] - __ Joining round: 13053`
145	`[2025-11-14 04:23:50,633][genrl.logging_utils.global_defs][INFO] - Starting round: 13053/100000.`
146	`Map: 100%\|______________________________________\| 1/1 [00:00<00:00, 158.65 examples/s]`
147	`Map: 100%\|______________________________________\| 1/1 [00:00<00:00, 191.92 examples/s]`
148	`[2025-11-14 04:25:12,646][genrl.logging_utils.global_defs][INFO] - pushing model to huggingface`
149	`Processing Files (1 / 1) : 100%\|___\| 988MB / 988MB, 94.3MB/s`
150	`New Data Upload : 100%\|___\| 983MB / 983MB, 94.3MB/s`
151	`.....kpb5lid/model.safetensors: 100%\|___\| 988MB / 988MB, 94.3MB/s`
152	`[2025-11-14 04:27:01,877][genrl.logging_utils.global_defs][INFO] - Already finished round: 13053. Next check in 160.0s.`
153	```
154
155	`## Quick Start Inferences`
156
157	`### Standard Transformers`
158
159	```bash
160	`from transformers import AutoModelForCausalLM, AutoTokenizer`
161	`model = AutoModelForCausalLM.from_pretrained(`
162	`"0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther",`
163	`torch_dtype="auto",`
164	`device_map="auto"`
165	`)`
166	`tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")`
167	`prompt = "Write a function to calculate the factorial of a number."`
168	`inputs = tokenizer(prompt, return_tensors="pt").to(model.device)`
169	`outputs = model.generate(**inputs, max_length=256, temperature=0.7, top_p=0.8)`
170	`print(tokenizer.decode(outputs[0], skip_special_tokens=True))`
171	```
172
173	`### Chat Format (Conversational)`
174
175	```bash
176	`from transformers import AutoModelForCausalLM, AutoTokenizer`
177	`model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")`
178	`tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")`
179	`messages = [`
180	`{"role": "system", "content": "You are an expert Python programmer."},`
181	`{"role": "user", "content": "Write a function to check if a string is a palindrome."}`
182	`]`
183	`text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)`
184	`inputs = tokenizer(text, return_tensors="pt")`
185	`outputs = model.generate(**inputs, max_length=512)`
186	`print(tokenizer.decode(outputs[0]))`
187	```
188
189	`### Text Generation Inference (TGI)`
190
191	```bash
192	`docker run -d --gpus all \`
193	`-p 8080:80 \`
194	`-v $PWD/data:/data \`
195	`ghcr.io/huggingface/text-generation-inference:latest \`
196	`--model-id 0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther \`
197	`--max-input-length 4096 \`
198	`--max-total-tokens 8192`
199	```
200
201	`### GGUF with LLAMA.CPP`
202
203	```bash
204	`# Download quantized model (recommended: Q4_K_M)`
205	`wget https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen2.5-Coder-0.5B-Q4_K_M.gguf`
206	`# Run inference`
207	`./llama-cli -m Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf \`
208	`-p "Write a function to implement binary search in Python." \`
209	`--temp 0.7 --top-p 0.8`
210	```
211
212	`### Ollama`
213
214	```bash
215	`# Create Modelfile`
216	`cat > Modelfile << 'EOF'`
217	`FROM ./0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/Qwen2.5-Coder-0.5B-Q4_K_M.gguf`
218	`PARAMETER temperature 0.7`
219	`PARAMETER top_p 0.8`
220	`PARAMETER top_k 20`
221	`SYSTEM "You are an expert Python programmer who writes clean, documented code."`
222	`EOF`
223	`# Create and run`
224	`ollama create qwen2.5-coder-swarm -f Modelfile`
225	`ollama run qwen2.5-coder-swarm "Write a function to calculate the factorial of a number."`
226	```
227
228	`## Available GGUF Quantization`
229
230	`\| Format \| Size \| Precision \| Use Case \| Download \|`
231	`\|--------\|------\|-----------\|----------\|----------\|`
232	\| Safetensors (BF16) \| 988 MB \| BF16 \| Full precision training/fine-tuning \| `model.safetensors` \|
233	\| GGUF F16 \| 994 MB \| FP16 \| High quality inference \| `Qwen2.5-Coder-0.5B-F16.gguf` \|
234	\| GGUF Q6_K \| 506 MB \| 6-bit \| High quality compression \| `Qwen2.5-Coder-0.5B-Q6_K.gguf` \|
235	\| GGUF Q5_K_M \| 420 MB \| 5-bit \| Balanced quality/size \| `Qwen2.5-Coder-0.5B-Q5_K_M.gguf` \|
236	\| GGUF Q4_K_M \| 398 MB \| 4-bit \| Recommended for production \| `Qwen2.5-Coder-0.5B-Q4_K_M.gguf` \|
237	\| GGUF Q3_K_M \| 355 MB \| 3-bit \| Smallest, fastest \| `Qwen2.5-Coder-0.5B-Q3_K_M.gguf` \|
238
239	`> All GGUF formats are llama.cpp is compatible ready to use Inferences chat and auto-update be hourly.`
240
241
242	`## Chat Format & Conversational`
243
244	`This model inherits Qwen2.5's chat template for structured conversations.`
245
246	`### Format Structure`
247
248	```
249	`<\|im_start\|>system`
250	`{system_message}`
251	`<\|im_end\|>`
252	`<\|im_start\|>user`
253	`{user_message}`
254	`<\|im_end\|>`
255	`<\|im_start\|>assistant`
256	`{assistant_response}`
257	`<\|im_end\|>`
258	```
259
260	`### Chat Template Features`
261
262	`- System Instructions: Guide model behavior with system messages`
263	`- Multi-turn Dialogue: Maintains conversation context`
264	`- Tool Calling: Support function calling (if enabled in training)`
265	`- Code Generation: Optimized for generating Python code`
266
267	`Note: While model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on programming challenges.`
268
269	`### Gensyn RL-Swarm Quick-Architecture`
270
271	```diff
272	`Training Framework:`
273	`- Method: GRPO (Group Relative Policy Optimization)`
274	`- Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct`
275	`- Training Regime: bfloat16 mixed precision`
276	`- Max Rounds: 100000`
277	`- Update Frequency: Every 5-10 minutes`
278	`- Generations per Round: 2`
279	`- Batch size: Combine`
280	`- Tree-based Model: 2 tree`
281	`- Seed: 42`
282	`Blockchain Integration:`
283	`- Network: Gensyn Testnet`
284	`- Chain ID: 685685`
285	`- Contract: SwarmCoordinator v0.4.2`
286	`Swarm Communication:`
287	`- Framework: Hivemind P2P Backend`
288	`- Initial Peers: 3 bootnodes`
289	`- Beam Size: 10`
290	`Reward System:`
291	`- Manager: RewardManager (SwarmGameManager/CodeGenerationRewards)`
292	`- Reward Function: Adaptive with quality enhanced`
293	`- Evaluator: Ollama (qwen2.5-coder:1.5b-instruct)`
294	`- Judge API: https://codezero-judge.gensyn.ai`
295	```
296
297	`## Model Capabilities`
298
299	`This model excels at:`
300
301	`1. Basic Python Programming: Functions, loops, conditionals, data structures`
302	`2. Algorithm Implementation: Sorting, searching, graph algorithms`
303	`3. String Manipulation: Pattern matching, parsing, formatting`
304	`4. Mathematical Functions: Calculations, conversions, formulas`
305	`5. Code Documentation: Writing clear, commented functions`
306	`6. Problem Solving: Breaking down complex problems into manageable steps`
307
308	`## Limitations`
309
310	`- Specialized Domain: Optimized for programming challenges; may underperform on creative writing`
311	`- Training in Progress: Weights update every 5-10 minutes; performance varies`
312	`- Scale: 0.5B parameters - suitable for edge but not SOTA for complex programming`
313	`- Experimental: Decentralized RL training; behavior less predictable than supervised models`
314	`- Context: Best performance within 4K tokens (full 32K supported)`
315
316	`## Update Schedule`
317
318	`\| Format \| Frequency \| Trigger \|`
319	`\|--------\|-----------\|---------\|`
320	`\| Safetensors (BF16) \| Every 5-10 min \| Automatic via RL-Swarm \|`
321	`\| GGUF (all formats) \| Every 3 hour \| Auto-conversion pipeline \|`
322
323	`Auto-Conversion Pipeline:`
324
325	`1. Monitors repo for new training commits`
326	2. Downloads latest `model.safetensors`
327	`3. Converts to F16 GGUF base`
328	`4. Quantizes to Q3_K_M, Q4_K_M, Q5_K_M, Q6_K`
329	`5. Standar formats`
330
331	`Check commit history for exact timestamps.`
332
333	`### Architecture Components`
334
335	`1. Game Manager: Orchestrates training rounds and swarm coordination`
336	`2. Trainer: GRPO implementation for policy optimization`
337	`3. Data Manager: Dataset loading with adaptive weighted sampling`
338	`4. Reward Manager: Computes rewards via Ollama evaluator with quality enhanced`
339	`5. Coordinator: Blockchain integration for swarm state`
340	`6. P2P Backend: Hivemind DHT for model sharing`
341
342	`### Training Process`
343
344	```
345	`1. Agent joins swarm via P2P network`
346	`2. Coordinator assigns round via smart contract`
347	`3. Agent samples data from adaptive weighted datasets`
348	`4. Model generates 2 responses`
349	`5. Ollama evaluator assesses and assigns rewards with quality enhanced`
350	`6. GRPO updates policy based on rewards`
351	`7. Updated model shared via DHT`
352	`8. Best checkpoint saved to HuggingFace`
353	`9. Repeat`
354	```
355
356	`### Decentralization Benefits`
357
358	`- Fault Tolerance: Multiple agents; no single point of failure`
359	`- Diverse Exploration: Different agents explore different strategies`
360	`- Collective Intelligence: Agents learn from each other`
361	`- Transparent: All rounds verified on-chain`
362
363	`### Software Stack`
364
365	`- Framework: Gensyn RL-Swarm v0.7.0`
366	`- Library: transformers v4.57.1`
367	`- P2P: hivemind`
368	`- Blockchain: Gensyn testnet`
369	`- Config: Hydra + OmegaConf`
370	`- Logging: WandB integration`
371
372	`### Hardware Requirements`
373
374	`Training GPU:`
375	`- GPU: NVIDIA 4090 24GB+ (BF16 training)`
376	`- RAM: 16GB+`
377	`- Cores: 10+`
378	`- Storage: 50GB SSD`
379	`- Network: High bandwidth for P2P`
380
381	`Training CPU Optimize:`
382	`- CPU: INTEL or AMD`
383	`- Cores: 10+`
384	`- RAM: 16GB+`
385	`- Storage: 50GB SSD`
386	`- Network: High bandwidth for P2P`
387
388	`Inference:`
389	`- Safetensors: 8GB VRAM (GPU) / 16GB RAM (CPU)`
390	`- GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU)`
391	`- GGUF Q3_K_M: 3GB RAM (CPU-only)`
392
393	`### Training Progress Metrics`
394
395	`\| Metric \| Value \| Target \|`
396	`\|--------\|-------\|--------\|`
397	`\| Completed Rounds \| 13,533+ \| 100,000 \|`
398	`\| Training Progress \| 13.53% \| 100% \|`
399	`\| Update Frequency \| 5-10 min \| Continuous \|`
400
401	Note: average\@k: Average performance across `k` attempts, measuring consistency. pass\@k: Probability of at least one correct solution in `k` attempts, measuring capability.Current metrics track training rounds completed in decentralized swarm.
402
403	`### Adaptive Reward Performance`
404
405	`Our adaptive reward system has shown approximately ~174% improvement in reward scores compared to the baseline system:`
406
407	```
408	`Original:`
409	`Overall Avg Reward: 0.039`
410	`MBPP Avg Reward: 0.234`
411	`CodeContests Avg Reward: -0.156`
412	`Adaptive:`
413	`Overall Avg Reward: 0.107`
414	`MBPP Avg Reward: 0.312`
415	`CodeContests Avg Reward: -0.098`
416	`Improvement: 0.068 (~174% increase)`
417	```
418
419	`## Citation`
420
421	```
422	`@misc{qwen2.5-coder-gensyn-swarm-2025,`
423	`author = {0xgrey},`
424	`title = {Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm: Continuous RL Training on Distributed Swarm with Adaptive Rewards},`
425	`year = {2025},`
426	`publisher = {HuggingFace},`
427	`howpublished = {\url{https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther}},`
428	`note = {Agent ID: tall\_tame\_panther}`
429	`}`
430	`@misc{gensyn-rl-swarm-2025,`
431	`title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework},`
432	`author = {Gensyn AI},`
433	`year = {2025},`
434	`url = {https://gensyn.ai}`
435	`}`
436	`@misc{codezero-2025,`
437	`title = {CodeZero: A Collaborative Coding Environment for Distributed RL},`
438	`author = {Gensyn AI},`
439	`year = {2025},`
440	`url = {https://docs.gensyn.ai/testnet/rl-swarm/how-it-works/codezero}`
441	`}`
442	```
443
444	`## References`
445
446	`- Gensyn Documentation: https://docs.gensyn.ai/`
447	`- Gensyn GitHub: https://github.com/gensyn-ai`
448	`- RL-Swarm Contracts: https://github.com/gensyn-ai/rl-swarm-contracts`
449	`- Qwen2.5-Coder Model Card: https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct`
450	`- MBPP Dataset: https://huggingface.co/datasets/google-research-datasets/mbpp`
451	`- CodeContests Dataset: https://huggingface.co/datasets/deepmind/code_contests`
452	`- arXiv:1910.09700: ML Carbon Emissions methodology`
453
454
455	`## Contact`
456
457	`- Developer: 0xgrey`
458	`- Agent ID: tall_tame_panther`
459	`- Community: [Gensyn Discord](https://discord.gg/gensyn)`
460
461
462	`⚠️ Important: This is a continuously trained model. For reproducibility, specify commit hash:`
463
464	```
465	`git clone https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther`
466	`cd Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther`
467	`git checkout <commit-hash>`
468	```
469
470	`---`
471
472	`<div align="center">`
473
474	`Trained with 🩷 using Gensyn RL-Swarm`
475
476	`[![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-pink?style=for-the-badge)](https://gensyn.ai)`
477
478	`</div>`