Update README.md
This commit is contained in:
146
README.md
146
README.md
@@ -28,20 +28,24 @@ base_model:
|
|||||||
- Qwen/Qwen2.5-Coder-0.5B
|
- Qwen/Qwen2.5-Coder-0.5B
|
||||||
---
|
---
|
||||||
|
|
||||||
# Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm Agent-ID (tall_tame_panther)
|
<h1 align="center">Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm Agent-ID (tall_tame_panther)</h1>
|
||||||
|
|
||||||
## Gensyn RL-Swarm: Training & GGUF Inference for Quantized LLMs
|
<h2 align="center">Gensyn RL-Swarm: Training & GGUF Inference for Quantized LLMs</h2>
|
||||||
|
|
||||||
[](https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther)
|
<p align="center">
|
||||||
[](https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/tree/main)
|
<a href="https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Model-blue" alt="Model"></a>
|
||||||
|
<a href="https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/tree/main"><img src="https://img.shields.io/badge/GGUF-Available-8A2BE2" alt="GGUF"></a>
|
||||||
<img src="https://img.shields.io/badge/LLama.cpp-Compatible-orange" alt="llama.cpp">
|
<img src="https://img.shields.io/badge/LLama.cpp-Compatible-orange" alt="llama.cpp">
|
||||||
[](https://gensyn.ai)
|
<a href="https://gensyn.ai"><img src="https://img.shields.io/badge/Trained%20with-Gensyn%20RL--Swarm-pink" alt="Gensyn"></a>
|
||||||
<a href="https://github.com/gensyn-ai/rl-swarm/releases"><img src="https://img.shields.io/github/v/release/gensyn-ai/rl-swarm?label=Version&color=FF0069" alt="version"></a>
|
<a href="https://github.com/gensyn-ai/rl-swarm/releases"><img src="https://img.shields.io/github/v/release/gensyn-ai/rl-swarm?label=Version&color=FF0069" alt="version"></a>
|
||||||
[](https://github.com/gensyn-ai/rl-swarm/blob/main/LICENSE.TXT)
|
<a href="https://github.com/gensyn-ai/rl-swarm/blob/main/LICENSE.TXT"><img src="https://img.shields.io/badge/License-MIT-green" alt="License"></a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Model Overview
|
## Model Overview
|
||||||
|
|
||||||
Our pick an experimental (advanced) mode this model a continuously trained **Qwen2.5-Coder-0.5B-Instruct** fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Group Relative Policy Optimization)** and supported format **GGUF (llama.cpp)** for enhanced code generation capabilities. **Note: Current training focuses on programming challenges with adaptive weighted sampling**.
|
Our pick an experimental (advanced) mode at this model a continuously trained **Qwen2.5-Coder-0.5B-Instruct** fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Group Relative Policy Optimization)** and supported format **GGUF (llama.cpp)** for enhanced code generation capabilities. **Note: Current training focuses on programming challenges with adaptive weighted sampling**.
|
||||||
|
|
||||||
- **Agent ID:** `tall_tame_panther`
|
- **Agent ID:** `tall_tame_panther`
|
||||||
- **Training Status:** 🟢 LIVE - Model updates automatically every 5-10 minutes
|
- **Training Status:** 🟢 LIVE - Model updates automatically every 5-10 minutes
|
||||||
@@ -53,7 +57,7 @@ Our pick an experimental (advanced) mode this model a continuously trained **Qwe
|
|||||||
## Key Features
|
## Key Features
|
||||||
|
|
||||||
- **Real-time Training**: Continuous learning with distributed RL across Gensyn swarm network
|
- **Real-time Training**: Continuous learning with distributed RL across Gensyn swarm network
|
||||||
- **Adaptive Reward System**: Dynamic quality enhanced and dataset weighting for optimal learning
|
- **Adaptive System**: Dynamic quality enhanced and dataset weighting for optimal learning
|
||||||
- **Multi-domain Coding**: Trained on MBPP and CodeContests datasets with adaptive sampling
|
- **Multi-domain Coding**: Trained on MBPP and CodeContests datasets with adaptive sampling
|
||||||
- **GGUF Support**: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M)
|
- **GGUF Support**: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M)
|
||||||
- **llama.cpp Compatible**: Ready for edge deployment and local inference
|
- **llama.cpp Compatible**: Ready for edge deployment and local inference
|
||||||
@@ -72,34 +76,31 @@ The model is trained on a composite dataset with adaptive weighted sampling stra
|
|||||||
|
|
||||||
**Total Dataset Size:** Streaming datasets with infinite iteration
|
**Total Dataset Size:** Streaming datasets with infinite iteration
|
||||||
**Training Samples per Round:** 2
|
**Training Samples per Round:** 2
|
||||||
**Evaluation:** Real-time via swarm coordination with Ollama-based evaluator else judge
|
**Evaluation:** Real-time via Swarm Coordination with Ollama-based evaluator else Judge
|
||||||
|
|
||||||
### Adaptive Sampling Strategy
|
## Adaptive Sampling Strategy
|
||||||
|
|
||||||
The implementation features an adaptive sampling system that adjusts dataset weights based on performance:
|
|
||||||
|
|
||||||
> "When the solvers perform well, the proposer automatically increases the difficulty to keep challenging solvers to get better over time." - CodeZero-blog
|
> "When the solvers perform well, the proposer automatically increases the difficulty to keep challenging solvers to get better over time." - CodeZero-blog
|
||||||
|
|
||||||
The system monitors performance metrics every 5 rounds and adjusts the dataset weights to maintain optimal learning balance:
|
|
||||||
|
|
||||||
```diff
|
```diff
|
||||||
|
The implementation features an adaptive sampling system that adjusts dataset weights based on performance
|
||||||
|
The system monitors performance metrics every 5 rounds and adjusts the dataset weights to maintain optimal learning balance
|
||||||
- Update dataset weights based on recent performance
|
- Update dataset weights based on recent performance
|
||||||
- Calculate recent average performance for each dataset
|
- Calculate recent average performance for each dataset
|
||||||
- Adjust/use weighted sampling if adaptive, based on perform difference
|
- Adjust/use weighted sampling if adaptive, based on perform difference
|
||||||
- Performance better on MBPP
|
- Performance better on MBPP (Mostly Basic Python Problems)
|
||||||
- Performance better on CodeContests
|
- Performance better on CodeContests
|
||||||
- Update dataset weights every rounds & keep balanced
|
- Update dataset weights every rounds & keep balanced
|
||||||
```
|
```
|
||||||
|
|
||||||
## Adaptive Reward System
|
## Adaptive Reward System
|
||||||
|
|
||||||
### Quality Enhanced Implementation
|
### Quality Enhanced Implementation
|
||||||
|
|
||||||
The reward system includes a quality data enhanced mechanism that evaluates code structure and documentation
|
|
||||||
|
|
||||||
> "Rewards are derived from multiple lightweight checks, ranging from code validity and formatting to alignment with the problem statement, combined into a single interpretable score." - CodeZero-blog
|
> "Rewards are derived from multiple lightweight checks, ranging from code validity and formatting to alignment with the problem statement, combined into a single interpretable score." - CodeZero-blog
|
||||||
|
|
||||||
```diff
|
```diff
|
||||||
|
The reward system includes a quality data enhanced mechanism that evaluates code structure and documentation
|
||||||
- Calculate quality data enhanced for well-structured code
|
- Calculate quality data enhanced for well-structured code
|
||||||
- Documentation enhanced
|
- Documentation enhanced
|
||||||
- Structure enhanced
|
- Structure enhanced
|
||||||
@@ -109,15 +110,14 @@ The reward system includes a quality data enhanced mechanism that evaluates code
|
|||||||
|
|
||||||
### Adaptive Threshold System
|
### Adaptive Threshold System
|
||||||
|
|
||||||
The system also includes an adaptive threshold mechanism that adjusts based on recent performance:
|
|
||||||
|
|
||||||
```diff
|
```diff
|
||||||
|
The system also includes an adaptive threshold mechanism that adjusts based on recent performance
|
||||||
- Function adaptive threshold based on recent performance
|
- Function adaptive threshold based on recent performance
|
||||||
- Performance quality data is consistently high
|
- Performance quality data is consistently high
|
||||||
```
|
```
|
||||||
|
|
||||||
## Performance Simulation
|
## Quick Performance Simulation
|
||||||
|
|
||||||
### Reward Comparison
|
### Reward Comparison
|
||||||
|
|
||||||
Based on our simulation with 1000 samples, the adaptive reward system shows significant improvement
|
Based on our simulation with 1000 samples, the adaptive reward system shows significant improvement
|
||||||
@@ -132,7 +132,6 @@ Based on our simulation with 1000 samples, the adaptive reward system shows sign
|
|||||||
Based on the logs provided, the model shows consistent progress:
|
Based on the logs provided, the model shows consistent progress:
|
||||||
|
|
||||||
Metric data visualize train/loss by Weights & Biases (WanDB)
|
Metric data visualize train/loss by Weights & Biases (WanDB)
|
||||||
|
|
||||||
- Soon LIVE!
|
- Soon LIVE!
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -147,39 +146,34 @@ New Data Upload : 100%|___| 983MB / 983MB, 94.3MB/s
|
|||||||
[2025-11-14 04:27:01,877][genrl.logging_utils.global_defs][INFO] - Already finished round: 13053. Next check in 160.0s.
|
[2025-11-14 04:27:01,877][genrl.logging_utils.global_defs][INFO] - Already finished round: 13053. Next check in 160.0s.
|
||||||
```
|
```
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start Inferences
|
||||||
|
|
||||||
### Standard Transformers
|
### Standard Transformers
|
||||||
|
|
||||||
```
|
```bash
|
||||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
|
||||||
model = AutoModelForCausalLM.from_pretrained(
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
"0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther",
|
"0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther",
|
||||||
torch_dtype="auto",
|
torch_dtype="auto",
|
||||||
device_map="auto"
|
device_map="auto"
|
||||||
)
|
)
|
||||||
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
|
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
|
||||||
|
|
||||||
prompt = "Write a function to calculate the factorial of a number."
|
prompt = "Write a function to calculate the factorial of a number."
|
||||||
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||||
outputs = model.generate(**inputs, max_length=256, temperature=0.6, top_p=0.95)
|
outputs = model.generate(**inputs, max_length=256, temperature=0.7, top_p=0.8)
|
||||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||||
```
|
```
|
||||||
|
|
||||||
### Chat Format (Conversational)
|
### Chat Format (Conversational)
|
||||||
|
|
||||||
```
|
```bash
|
||||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
|
||||||
model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
|
model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
|
||||||
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
|
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{"role": "system", "content": "You are an expert Python programmer."},
|
{"role": "system", "content": "You are an expert Python programmer."},
|
||||||
{"role": "user", "content": "Write a function to check if a string is a palindrome."}
|
{"role": "user", "content": "Write a function to check if a string is a palindrome."}
|
||||||
]
|
]
|
||||||
|
|
||||||
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||||
inputs = tokenizer(text, return_tensors="pt")
|
inputs = tokenizer(text, return_tensors="pt")
|
||||||
outputs = model.generate(**inputs, max_length=512)
|
outputs = model.generate(**inputs, max_length=512)
|
||||||
@@ -188,7 +182,7 @@ print(tokenizer.decode(outputs[0]))
|
|||||||
|
|
||||||
### Text Generation Inference (TGI)
|
### Text Generation Inference (TGI)
|
||||||
|
|
||||||
```
|
```bash
|
||||||
docker run -d --gpus all \
|
docker run -d --gpus all \
|
||||||
-p 8080:80 \
|
-p 8080:80 \
|
||||||
-v $PWD/data:/data \
|
-v $PWD/data:/data \
|
||||||
@@ -200,46 +194,44 @@ docker run -d --gpus all \
|
|||||||
|
|
||||||
### GGUF with LLAMA.CPP
|
### GGUF with LLAMA.CPP
|
||||||
|
|
||||||
```
|
```bash
|
||||||
# Download quantized model (recommended: Q4_K_M)
|
# Download quantized model (recommended: Q4_K_M)
|
||||||
wget https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf
|
wget https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen2.5-Coder-0.5B-Q4_K_M.gguf
|
||||||
|
|
||||||
# Run inference
|
# Run inference
|
||||||
./llama-cli -m Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf \
|
./llama-cli -m Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf \
|
||||||
-p "Write a function to implement binary search in Python." \
|
-p "Write a function to implement binary search in Python." \
|
||||||
--temp 0.6 --top-p 0.95
|
--temp 0.7 --top-p 0.8
|
||||||
```
|
```
|
||||||
|
|
||||||
### Ollama
|
### Ollama
|
||||||
|
|
||||||
```
|
```bash
|
||||||
# Create Modelfile
|
# Create Modelfile
|
||||||
cat > Modelfile << 'EOF'
|
cat > Modelfile << 'EOF'
|
||||||
FROM ./Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf
|
FROM ./0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/Qwen2.5-Coder-0.5B-Q4_K_M.gguf
|
||||||
PARAMETER temperature 0.6
|
PARAMETER temperature 0.7
|
||||||
PARAMETER top_p 0.95
|
PARAMETER top_p 0.8
|
||||||
PARAMETER top_k 20
|
PARAMETER top_k 20
|
||||||
SYSTEM "You are an expert Python programmer who writes clean, documented code."
|
SYSTEM "You are an expert Python programmer who writes clean, documented code."
|
||||||
EOF
|
EOF
|
||||||
|
|
||||||
# Create and run
|
# Create and run
|
||||||
ollama create qwen2.5-coder-swarm -f Modelfile
|
ollama create qwen2.5-coder-swarm -f Modelfile
|
||||||
ollama run qwen2.5-coder-swarm "Write a function to calculate the factorial of a number."
|
ollama run qwen2.5-coder-swarm "Write a function to calculate the factorial of a number."
|
||||||
```
|
```
|
||||||
|
|
||||||
## Available Formats
|
## Available Quantization Formats
|
||||||
|
|
||||||
| Format | Size | Precision | Use Case | Download |
|
| Format | Size | Precision | Use Case | Download |
|
||||||
|--------|------|-----------|----------|----------|
|
|--------|------|-----------|----------|----------|
|
||||||
| Safetensors (BF16) | 988 MB | BF16 | Full precision training/fine-tuning | `model.safetensors` |
|
| Safetensors (BF16) | 988 MB | BF16 | Full precision training/fine-tuning | `model.safetensors` |
|
||||||
| GGUF F16 | 994 MB | FP16 | High quality inference | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-F16.gguf` |
|
| GGUF F16 | 994 MB | FP16 | High quality inference | `Qwen2.5-Coder-0.5B-F16.gguf` |
|
||||||
| GGUF Q5_K_M | 420 MB | 5-bit | Balanced quality/size | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q5_K_M.gguf` |
|
| GGUF Q5_K_M | 420 MB | 5-bit | Balanced quality/size | `Qwen2.5-Coder-0.5B-Q5_K_M.gguf` |
|
||||||
| GGUF Q4_K_M | 398 MB | 4-bit | **Recommended** for production | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf` |
|
| GGUF Q4_K_M | 398 MB | 4-bit | **Recommended** for production | `Qwen2.5-Coder-0.5B-Q4_K_M.gguf` |
|
||||||
| GGUF Q3_K_M | 355 MB | 3-bit | Smallest, fastest | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q3_K_M.gguf` |
|
| GGUF Q3_K_M | 355 MB | 3-bit | Smallest, fastest | `Qwen2.5-Coder-0.5B-Q3_K_M.gguf` |
|
||||||
|
|
||||||
All GGUF formats are **llama.cpp compatible** and auto-updated hourly.
|
All GGUF formats are **llama.cpp compatible** and auto-updated hourly.
|
||||||
|
|
||||||
## Chat Format & Conversational Use
|
## Chat Format & Conversational
|
||||||
|
|
||||||
This model inherits **Qwen2.5's chat template** for structured conversations.
|
This model inherits **Qwen2.5's chat template** for structured conversations.
|
||||||
|
|
||||||
@@ -266,36 +258,32 @@ This model inherits **Qwen2.5's chat template** for structured conversations.
|
|||||||
|
|
||||||
**Note**: While model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on **programming challenges**.
|
**Note**: While model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on **programming challenges**.
|
||||||
|
|
||||||
## Training Configuration
|
|
||||||
|
|
||||||
### Gensyn RL-Swarm Quick-Architecture
|
### Gensyn RL-Swarm Quick-Architecture
|
||||||
|
|
||||||
```
|
```diff
|
||||||
Training Framework:
|
Training Framework:
|
||||||
Method: GRPO (Group Relative Policy Optimization)
|
- Method: GRPO (Group Relative Policy Optimization)
|
||||||
Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
|
- Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
|
||||||
Training Regime: bfloat16 mixed precision
|
- Training Regime: bfloat16 mixed precision
|
||||||
Max Rounds: 100000
|
- Max Rounds: 100000
|
||||||
Update Frequency: Every 5-10 minutes
|
- Update Frequency: Every 5-10 minutes
|
||||||
Generations per Round: 2
|
- Generations per Round: 2
|
||||||
Tree-based Model: Default
|
- Batch size: Combine
|
||||||
Seed: 42
|
- Tree-based Model: 2 tree
|
||||||
|
- Seed: 42
|
||||||
Blockchain Integration:
|
Blockchain Integration:
|
||||||
Network: Gensyn Testnet
|
- Network: Gensyn Testnet
|
||||||
Chain ID: 685685
|
- Chain ID: 685685
|
||||||
Contract: SwarmCoordinator v0.4.2
|
- Contract: SwarmCoordinator v0.4.2
|
||||||
|
|
||||||
Swarm Communication:
|
Swarm Communication:
|
||||||
Framework: Hivemind P2P Backend
|
- Framework: Hivemind P2P Backend
|
||||||
Initial Peers: 3 bootnodes
|
- Initial Peers: 3 bootnodes
|
||||||
Beam Size: 10
|
- Beam Size: 10
|
||||||
|
|
||||||
Reward System:
|
Reward System:
|
||||||
Manager: RewardManager (SwarmGameManager/CodeGenerationRewards)
|
- Manager: RewardManager (SwarmGameManager/CodeGenerationRewards)
|
||||||
Reward Function: Adaptive with quality enhanced
|
- Reward Function: Adaptive with quality enhanced
|
||||||
Evaluator: Ollama (qwen2.5-coder:1.5b-instruct)
|
- Evaluator: Ollama (qwen2.5-coder:1.5b-instruct)
|
||||||
Judge API: https://codezero-judge.gensyn.ai
|
- Judge API: https://codezero-judge.gensyn.ai
|
||||||
```
|
```
|
||||||
|
|
||||||
## Model Capabilities
|
## Model Capabilities
|
||||||
@@ -325,6 +313,7 @@ This model excels at:
|
|||||||
| GGUF (all formats) | Every 3 hour | Auto-conversion pipeline |
|
| GGUF (all formats) | Every 3 hour | Auto-conversion pipeline |
|
||||||
|
|
||||||
**Auto-Conversion Pipeline:**
|
**Auto-Conversion Pipeline:**
|
||||||
|
|
||||||
1. Monitors repo for new training commits
|
1. Monitors repo for new training commits
|
||||||
2. Downloads latest `model.safetensors`
|
2. Downloads latest `model.safetensors`
|
||||||
3. Converts to F16 GGUF base
|
3. Converts to F16 GGUF base
|
||||||
@@ -333,8 +322,6 @@ This model excels at:
|
|||||||
|
|
||||||
Check commit history for exact timestamps.
|
Check commit history for exact timestamps.
|
||||||
|
|
||||||
## CodeZero Technical Details
|
|
||||||
|
|
||||||
### Architecture Components
|
### Architecture Components
|
||||||
|
|
||||||
1. **Game Manager**: Orchestrates training rounds and swarm coordination
|
1. **Game Manager**: Orchestrates training rounds and swarm coordination
|
||||||
@@ -395,8 +382,6 @@ Check commit history for exact timestamps.
|
|||||||
- GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU)
|
- GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU)
|
||||||
- GGUF Q3_K_M: 3GB RAM (CPU-only)
|
- GGUF Q3_K_M: 3GB RAM (CPU-only)
|
||||||
|
|
||||||
## Evaluation
|
|
||||||
|
|
||||||
### Training Progress Metrics
|
### Training Progress Metrics
|
||||||
|
|
||||||
| Metric | Value | Target |
|
| Metric | Value | Target |
|
||||||
@@ -405,23 +390,21 @@ Check commit history for exact timestamps.
|
|||||||
| Training Progress | 13.05% | 100% |
|
| Training Progress | 13.05% | 100% |
|
||||||
| Update Frequency | 5-10 min | Continuous |
|
| Update Frequency | 5-10 min | Continuous |
|
||||||
|
|
||||||
**Note**: * **average\@k:** Average performance across `k` attempts, measuring consistency. * **pass\@k:** Probability of at least one correct solution in `k` attempts, measuring capability.Current metrics track training rounds completed in decentralized swarm.
|
**Note**: **average\@k:** Average performance across `k` attempts, measuring consistency. **pass\@k:** Probability of at least one correct solution in `k` attempts, measuring capability.Current metrics track training rounds completed in decentralized swarm.
|
||||||
|
|
||||||
### Adaptive Reward Performance
|
### Adaptive Reward Performance
|
||||||
|
|
||||||
Our adaptive reward system has shown approximately ~174% improvement in reward scores compared to the baseline system:
|
Our adaptive reward system has shown approximately ~174% improvement in reward scores compared to the baseline system:
|
||||||
|
|
||||||
```
|
```
|
||||||
Original System:
|
Original:
|
||||||
Overall Avg Reward: 0.039
|
Overall Avg Reward: 0.039
|
||||||
MBPP Avg Reward: 0.234
|
MBPP Avg Reward: 0.234
|
||||||
CodeContests Avg Reward: -0.156
|
CodeContests Avg Reward: -0.156
|
||||||
|
Adaptive:
|
||||||
Adaptive System:
|
|
||||||
Overall Avg Reward: 0.107
|
Overall Avg Reward: 0.107
|
||||||
MBPP Avg Reward: 0.312
|
MBPP Avg Reward: 0.312
|
||||||
CodeContests Avg Reward: -0.098
|
CodeContests Avg Reward: -0.098
|
||||||
|
|
||||||
Improvement: 0.068 (~174% increase)
|
Improvement: 0.068 (~174% increase)
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -436,14 +419,12 @@ Improvement: 0.068 (~174% increase)
|
|||||||
howpublished = {\url{https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther}},
|
howpublished = {\url{https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther}},
|
||||||
note = {Agent ID: tall\_tame\_panther}
|
note = {Agent ID: tall\_tame\_panther}
|
||||||
}
|
}
|
||||||
|
|
||||||
@misc{gensyn-rl-swarm-2025,
|
@misc{gensyn-rl-swarm-2025,
|
||||||
title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework},
|
title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework},
|
||||||
author = {Gensyn AI},
|
author = {Gensyn AI},
|
||||||
year = {2025},
|
year = {2025},
|
||||||
url = {https://gensyn.ai}
|
url = {https://gensyn.ai}
|
||||||
}
|
}
|
||||||
|
|
||||||
@misc{codezero-2025,
|
@misc{codezero-2025,
|
||||||
title = {CodeZero: A Collaborative Coding Environment for Distributed RL},
|
title = {CodeZero: A Collaborative Coding Environment for Distributed RL},
|
||||||
author = {Gensyn AI},
|
author = {Gensyn AI},
|
||||||
@@ -469,7 +450,6 @@ Improvement: 0.068 (~174% increase)
|
|||||||
- **Agent ID**: tall_tame_panther
|
- **Agent ID**: tall_tame_panther
|
||||||
- **Community**: [Gensyn Discord](https://discord.gg/gensyn)
|
- **Community**: [Gensyn Discord](https://discord.gg/gensyn)
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**⚠️ Important**: This is a continuously trained model. For reproducibility, specify commit hash:
|
**⚠️ Important**: This is a continuously trained model. For reproducibility, specify commit hash:
|
||||||
|
|
||||||
@@ -485,6 +465,6 @@ git checkout <commit-hash>
|
|||||||
|
|
||||||
**🤖 Trained with ❤️ using Gensyn RL-Swarm**
|
**🤖 Trained with ❤️ using Gensyn RL-Swarm**
|
||||||
|
|
||||||
[](https://gensyn.ai)
|
[](https://gensyn.ai)
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
Reference in New Issue
Block a user