Update README.md

This commit is contained in:
grey
2025-11-14 11:07:43 +00:00
committed by system
parent b7f0190de1
commit 613011f18b

146
README.md
View File

@@ -28,20 +28,24 @@ base_model:
- Qwen/Qwen2.5-Coder-0.5B - Qwen/Qwen2.5-Coder-0.5B
--- ---
# Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm Agent-ID (tall_tame_panther) <h1 align="center">Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm Agent-ID (tall_tame_panther)</h1>
## Gensyn RL-Swarm: Training & GGUF Inference for Quantized LLMs <h2 align="center">Gensyn RL-Swarm: Training & GGUF Inference for Quantized LLMs</h2>
[![Model](https://img.shields.io/badge/🤗%20Hugging%20Face-Model-blue)](https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther) <p align="center">
[![GGUF](https://img.shields.io/badge/GGUF-Available-8A2BE2)](https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/tree/main) <a href="https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Model-blue" alt="Model"></a>
<a href="https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/tree/main"><img src="https://img.shields.io/badge/GGUF-Available-8A2BE2" alt="GGUF"></a>
<img src="https://img.shields.io/badge/LLama.cpp-Compatible-orange" alt="llama.cpp"> <img src="https://img.shields.io/badge/LLama.cpp-Compatible-orange" alt="llama.cpp">
[![Gensyn](https://img.shields.io/badge/Trained%20with-Gensyn%20RL--Swarm-pink)](https://gensyn.ai) <a href="https://gensyn.ai"><img src="https://img.shields.io/badge/Trained%20with-Gensyn%20RL--Swarm-pink" alt="Gensyn"></a>
<a href="https://github.com/gensyn-ai/rl-swarm/releases"><img src="https://img.shields.io/github/v/release/gensyn-ai/rl-swarm?label=Version&color=FF0069" alt="version"></a> <a href="https://github.com/gensyn-ai/rl-swarm/releases"><img src="https://img.shields.io/github/v/release/gensyn-ai/rl-swarm?label=Version&color=FF0069" alt="version"></a>
[![License](https://img.shields.io/badge/License-MIT-green)](https://github.com/gensyn-ai/rl-swarm/blob/main/LICENSE.TXT) <a href="https://github.com/gensyn-ai/rl-swarm/blob/main/LICENSE.TXT"><img src="https://img.shields.io/badge/License-MIT-green" alt="License"></a>
</p>
---
## Model Overview ## Model Overview
Our pick an experimental (advanced) mode this model a continuously trained **Qwen2.5-Coder-0.5B-Instruct** fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Group Relative Policy Optimization)** and supported format **GGUF (llama.cpp)** for enhanced code generation capabilities. **Note: Current training focuses on programming challenges with adaptive weighted sampling**. Our pick an experimental (advanced) mode at this model a continuously trained **Qwen2.5-Coder-0.5B-Instruct** fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Group Relative Policy Optimization)** and supported format **GGUF (llama.cpp)** for enhanced code generation capabilities. **Note: Current training focuses on programming challenges with adaptive weighted sampling**.
- **Agent ID:** `tall_tame_panther` - **Agent ID:** `tall_tame_panther`
- **Training Status:** 🟢 LIVE - Model updates automatically every 5-10 minutes - **Training Status:** 🟢 LIVE - Model updates automatically every 5-10 minutes
@@ -53,7 +57,7 @@ Our pick an experimental (advanced) mode this model a continuously trained **Qwe
## Key Features ## Key Features
- **Real-time Training**: Continuous learning with distributed RL across Gensyn swarm network - **Real-time Training**: Continuous learning with distributed RL across Gensyn swarm network
- **Adaptive Reward System**: Dynamic quality enhanced and dataset weighting for optimal learning - **Adaptive System**: Dynamic quality enhanced and dataset weighting for optimal learning
- **Multi-domain Coding**: Trained on MBPP and CodeContests datasets with adaptive sampling - **Multi-domain Coding**: Trained on MBPP and CodeContests datasets with adaptive sampling
- **GGUF Support**: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M) - **GGUF Support**: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M)
- **llama.cpp Compatible**: Ready for edge deployment and local inference - **llama.cpp Compatible**: Ready for edge deployment and local inference
@@ -72,34 +76,31 @@ The model is trained on a composite dataset with adaptive weighted sampling stra
**Total Dataset Size:** Streaming datasets with infinite iteration **Total Dataset Size:** Streaming datasets with infinite iteration
**Training Samples per Round:** 2 **Training Samples per Round:** 2
**Evaluation:** Real-time via swarm coordination with Ollama-based evaluator else judge **Evaluation:** Real-time via Swarm Coordination with Ollama-based evaluator else Judge
### Adaptive Sampling Strategy ## Adaptive Sampling Strategy
The implementation features an adaptive sampling system that adjusts dataset weights based on performance:
> "When the solvers perform well, the proposer automatically increases the difficulty to keep challenging solvers to get better over time." - CodeZero-blog > "When the solvers perform well, the proposer automatically increases the difficulty to keep challenging solvers to get better over time." - CodeZero-blog
The system monitors performance metrics every 5 rounds and adjusts the dataset weights to maintain optimal learning balance:
```diff ```diff
The implementation features an adaptive sampling system that adjusts dataset weights based on performance
The system monitors performance metrics every 5 rounds and adjusts the dataset weights to maintain optimal learning balance
- Update dataset weights based on recent performance - Update dataset weights based on recent performance
- Calculate recent average performance for each dataset - Calculate recent average performance for each dataset
- Adjust/use weighted sampling if adaptive, based on perform difference - Adjust/use weighted sampling if adaptive, based on perform difference
- Performance better on MBPP - Performance better on MBPP (Mostly Basic Python Problems)
- Performance better on CodeContests - Performance better on CodeContests
- Update dataset weights every rounds & keep balanced - Update dataset weights every rounds & keep balanced
``` ```
## Adaptive Reward System ## Adaptive Reward System
### Quality Enhanced Implementation ### Quality Enhanced Implementation
The reward system includes a quality data enhanced mechanism that evaluates code structure and documentation
> "Rewards are derived from multiple lightweight checks, ranging from code validity and formatting to alignment with the problem statement, combined into a single interpretable score." - CodeZero-blog > "Rewards are derived from multiple lightweight checks, ranging from code validity and formatting to alignment with the problem statement, combined into a single interpretable score." - CodeZero-blog
```diff ```diff
The reward system includes a quality data enhanced mechanism that evaluates code structure and documentation
- Calculate quality data enhanced for well-structured code - Calculate quality data enhanced for well-structured code
- Documentation enhanced - Documentation enhanced
- Structure enhanced - Structure enhanced
@@ -109,15 +110,14 @@ The reward system includes a quality data enhanced mechanism that evaluates code
### Adaptive Threshold System ### Adaptive Threshold System
The system also includes an adaptive threshold mechanism that adjusts based on recent performance:
```diff ```diff
The system also includes an adaptive threshold mechanism that adjusts based on recent performance
- Function adaptive threshold based on recent performance - Function adaptive threshold based on recent performance
- Performance quality data is consistently high - Performance quality data is consistently high
``` ```
## Performance Simulation ## Quick Performance Simulation
### Reward Comparison ### Reward Comparison
Based on our simulation with 1000 samples, the adaptive reward system shows significant improvement Based on our simulation with 1000 samples, the adaptive reward system shows significant improvement
@@ -132,7 +132,6 @@ Based on our simulation with 1000 samples, the adaptive reward system shows sign
Based on the logs provided, the model shows consistent progress: Based on the logs provided, the model shows consistent progress:
Metric data visualize train/loss by Weights & Biases (WanDB) Metric data visualize train/loss by Weights & Biases (WanDB)
- Soon LIVE! - Soon LIVE!
``` ```
@@ -147,39 +146,34 @@ New Data Upload : 100%|___| 983MB / 983MB, 94.3MB/s
[2025-11-14 04:27:01,877][genrl.logging_utils.global_defs][INFO] - Already finished round: 13053. Next check in 160.0s. [2025-11-14 04:27:01,877][genrl.logging_utils.global_defs][INFO] - Already finished round: 13053. Next check in 160.0s.
``` ```
## Quick Start ## Quick Start Inferences
### Standard Transformers ### Standard Transformers
``` ```bash
from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained( model = AutoModelForCausalLM.from_pretrained(
"0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther", "0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther",
torch_dtype="auto", torch_dtype="auto",
device_map="auto" device_map="auto"
) )
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther") tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
prompt = "Write a function to calculate the factorial of a number." prompt = "Write a function to calculate the factorial of a number."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=256, temperature=0.6, top_p=0.95) outputs = model.generate(**inputs, max_length=256, temperature=0.7, top_p=0.8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
``` ```
### Chat Format (Conversational) ### Chat Format (Conversational)
``` ```bash
from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther") model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther") tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
messages = [ messages = [
{"role": "system", "content": "You are an expert Python programmer."}, {"role": "system", "content": "You are an expert Python programmer."},
{"role": "user", "content": "Write a function to check if a string is a palindrome."} {"role": "user", "content": "Write a function to check if a string is a palindrome."}
] ]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt") inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512) outputs = model.generate(**inputs, max_length=512)
@@ -188,7 +182,7 @@ print(tokenizer.decode(outputs[0]))
### Text Generation Inference (TGI) ### Text Generation Inference (TGI)
``` ```bash
docker run -d --gpus all \ docker run -d --gpus all \
-p 8080:80 \ -p 8080:80 \
-v $PWD/data:/data \ -v $PWD/data:/data \
@@ -200,46 +194,44 @@ docker run -d --gpus all \
### GGUF with LLAMA.CPP ### GGUF with LLAMA.CPP
``` ```bash
# Download quantized model (recommended: Q4_K_M) # Download quantized model (recommended: Q4_K_M)
wget https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf wget https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen2.5-Coder-0.5B-Q4_K_M.gguf
# Run inference # Run inference
./llama-cli -m Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf \ ./llama-cli -m Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf \
-p "Write a function to implement binary search in Python." \ -p "Write a function to implement binary search in Python." \
--temp 0.6 --top-p 0.95 --temp 0.7 --top-p 0.8
``` ```
### Ollama ### Ollama
``` ```bash
# Create Modelfile # Create Modelfile
cat > Modelfile << 'EOF' cat > Modelfile << 'EOF'
FROM ./Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf FROM ./0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/Qwen2.5-Coder-0.5B-Q4_K_M.gguf
PARAMETER temperature 0.6 PARAMETER temperature 0.7
PARAMETER top_p 0.95 PARAMETER top_p 0.8
PARAMETER top_k 20 PARAMETER top_k 20
SYSTEM "You are an expert Python programmer who writes clean, documented code." SYSTEM "You are an expert Python programmer who writes clean, documented code."
EOF EOF
# Create and run # Create and run
ollama create qwen2.5-coder-swarm -f Modelfile ollama create qwen2.5-coder-swarm -f Modelfile
ollama run qwen2.5-coder-swarm "Write a function to calculate the factorial of a number." ollama run qwen2.5-coder-swarm "Write a function to calculate the factorial of a number."
``` ```
## Available Formats ## Available Quantization Formats
| Format | Size | Precision | Use Case | Download | | Format | Size | Precision | Use Case | Download |
|--------|------|-----------|----------|----------| |--------|------|-----------|----------|----------|
| Safetensors (BF16) | 988 MB | BF16 | Full precision training/fine-tuning | `model.safetensors` | | Safetensors (BF16) | 988 MB | BF16 | Full precision training/fine-tuning | `model.safetensors` |
| GGUF F16 | 994 MB | FP16 | High quality inference | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-F16.gguf` | | GGUF F16 | 994 MB | FP16 | High quality inference | `Qwen2.5-Coder-0.5B-F16.gguf` |
| GGUF Q5_K_M | 420 MB | 5-bit | Balanced quality/size | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q5_K_M.gguf` | | GGUF Q5_K_M | 420 MB | 5-bit | Balanced quality/size | `Qwen2.5-Coder-0.5B-Q5_K_M.gguf` |
| GGUF Q4_K_M | 398 MB | 4-bit | **Recommended** for production | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf` | | GGUF Q4_K_M | 398 MB | 4-bit | **Recommended** for production | `Qwen2.5-Coder-0.5B-Q4_K_M.gguf` |
| GGUF Q3_K_M | 355 MB | 3-bit | Smallest, fastest | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q3_K_M.gguf` | | GGUF Q3_K_M | 355 MB | 3-bit | Smallest, fastest | `Qwen2.5-Coder-0.5B-Q3_K_M.gguf` |
All GGUF formats are **llama.cpp compatible** and auto-updated hourly. All GGUF formats are **llama.cpp compatible** and auto-updated hourly.
## Chat Format & Conversational Use ## Chat Format & Conversational
This model inherits **Qwen2.5's chat template** for structured conversations. This model inherits **Qwen2.5's chat template** for structured conversations.
@@ -266,36 +258,32 @@ This model inherits **Qwen2.5's chat template** for structured conversations.
**Note**: While model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on **programming challenges**. **Note**: While model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on **programming challenges**.
## Training Configuration
### Gensyn RL-Swarm Quick-Architecture ### Gensyn RL-Swarm Quick-Architecture
``` ```diff
Training Framework: Training Framework:
Method: GRPO (Group Relative Policy Optimization) - Method: GRPO (Group Relative Policy Optimization)
Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct - Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
Training Regime: bfloat16 mixed precision - Training Regime: bfloat16 mixed precision
Max Rounds: 100000 - Max Rounds: 100000
Update Frequency: Every 5-10 minutes - Update Frequency: Every 5-10 minutes
Generations per Round: 2 - Generations per Round: 2
Tree-based Model: Default - Batch size: Combine
Seed: 42 - Tree-based Model: 2 tree
- Seed: 42
Blockchain Integration: Blockchain Integration:
Network: Gensyn Testnet - Network: Gensyn Testnet
Chain ID: 685685 - Chain ID: 685685
Contract: SwarmCoordinator v0.4.2 - Contract: SwarmCoordinator v0.4.2
Swarm Communication: Swarm Communication:
Framework: Hivemind P2P Backend - Framework: Hivemind P2P Backend
Initial Peers: 3 bootnodes - Initial Peers: 3 bootnodes
Beam Size: 10 - Beam Size: 10
Reward System: Reward System:
Manager: RewardManager (SwarmGameManager/CodeGenerationRewards) - Manager: RewardManager (SwarmGameManager/CodeGenerationRewards)
Reward Function: Adaptive with quality enhanced - Reward Function: Adaptive with quality enhanced
Evaluator: Ollama (qwen2.5-coder:1.5b-instruct) - Evaluator: Ollama (qwen2.5-coder:1.5b-instruct)
Judge API: https://codezero-judge.gensyn.ai - Judge API: https://codezero-judge.gensyn.ai
``` ```
## Model Capabilities ## Model Capabilities
@@ -325,6 +313,7 @@ This model excels at:
| GGUF (all formats) | Every 3 hour | Auto-conversion pipeline | | GGUF (all formats) | Every 3 hour | Auto-conversion pipeline |
**Auto-Conversion Pipeline:** **Auto-Conversion Pipeline:**
1. Monitors repo for new training commits 1. Monitors repo for new training commits
2. Downloads latest `model.safetensors` 2. Downloads latest `model.safetensors`
3. Converts to F16 GGUF base 3. Converts to F16 GGUF base
@@ -333,8 +322,6 @@ This model excels at:
Check commit history for exact timestamps. Check commit history for exact timestamps.
## CodeZero Technical Details
### Architecture Components ### Architecture Components
1. **Game Manager**: Orchestrates training rounds and swarm coordination 1. **Game Manager**: Orchestrates training rounds and swarm coordination
@@ -395,8 +382,6 @@ Check commit history for exact timestamps.
- GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU) - GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU)
- GGUF Q3_K_M: 3GB RAM (CPU-only) - GGUF Q3_K_M: 3GB RAM (CPU-only)
## Evaluation
### Training Progress Metrics ### Training Progress Metrics
| Metric | Value | Target | | Metric | Value | Target |
@@ -405,23 +390,21 @@ Check commit history for exact timestamps.
| Training Progress | 13.05% | 100% | | Training Progress | 13.05% | 100% |
| Update Frequency | 5-10 min | Continuous | | Update Frequency | 5-10 min | Continuous |
**Note**: * **average\@k:** Average performance across `k` attempts, measuring consistency. * **pass\@k:** Probability of at least one correct solution in `k` attempts, measuring capability.Current metrics track training rounds completed in decentralized swarm. **Note**: **average\@k:** Average performance across `k` attempts, measuring consistency. **pass\@k:** Probability of at least one correct solution in `k` attempts, measuring capability.Current metrics track training rounds completed in decentralized swarm.
### Adaptive Reward Performance ### Adaptive Reward Performance
Our adaptive reward system has shown approximately ~174% improvement in reward scores compared to the baseline system: Our adaptive reward system has shown approximately ~174% improvement in reward scores compared to the baseline system:
``` ```
Original System: Original:
Overall Avg Reward: 0.039 Overall Avg Reward: 0.039
MBPP Avg Reward: 0.234 MBPP Avg Reward: 0.234
CodeContests Avg Reward: -0.156 CodeContests Avg Reward: -0.156
Adaptive:
Adaptive System:
Overall Avg Reward: 0.107 Overall Avg Reward: 0.107
MBPP Avg Reward: 0.312 MBPP Avg Reward: 0.312
CodeContests Avg Reward: -0.098 CodeContests Avg Reward: -0.098
Improvement: 0.068 (~174% increase) Improvement: 0.068 (~174% increase)
``` ```
@@ -436,14 +419,12 @@ Improvement: 0.068 (~174% increase)
howpublished = {\url{https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther}}, howpublished = {\url{https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther}},
note = {Agent ID: tall\_tame\_panther} note = {Agent ID: tall\_tame\_panther}
} }
@misc{gensyn-rl-swarm-2025, @misc{gensyn-rl-swarm-2025,
title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework}, title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework},
author = {Gensyn AI}, author = {Gensyn AI},
year = {2025}, year = {2025},
url = {https://gensyn.ai} url = {https://gensyn.ai}
} }
@misc{codezero-2025, @misc{codezero-2025,
title = {CodeZero: A Collaborative Coding Environment for Distributed RL}, title = {CodeZero: A Collaborative Coding Environment for Distributed RL},
author = {Gensyn AI}, author = {Gensyn AI},
@@ -469,7 +450,6 @@ Improvement: 0.068 (~174% increase)
- **Agent ID**: tall_tame_panther - **Agent ID**: tall_tame_panther
- **Community**: [Gensyn Discord](https://discord.gg/gensyn) - **Community**: [Gensyn Discord](https://discord.gg/gensyn)
---
**⚠️ Important**: This is a continuously trained model. For reproducibility, specify commit hash: **⚠️ Important**: This is a continuously trained model. For reproducibility, specify commit hash:
@@ -485,6 +465,6 @@ git checkout <commit-hash>
**🤖 Trained with ❤️ using Gensyn RL-Swarm** **🤖 Trained with ❤️ using Gensyn RL-Swarm**
[![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-orange?style=for-the-badge)](https://gensyn.ai) [![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-pink?style=for-the-badge)](https://gensyn.ai)
</div> </div>