Update README.md

This commit is contained in:
grey
2025-11-14 09:30:23 +00:00
committed by system
parent c3e7063e9c
commit 94c9bd551d

530
README.md
View File

@@ -1,204 +1,490 @@
---
library_name: transformers
tags:
- text-generation
- qwen2.5-coder
- rl-swarm
- genrl-swarm
- grpo
- gensyn
- trl
- code-generation
- programming
- continuous-training
- reinforcement-learning
- safetensors
- gguf
- math
- logic
- conversational
- text-generation-inference
- I am tall_tame_panther
- python
- agent
license: mit
language:
- en
base_model:
- Qwen/Qwen2.5-Coder-0.5B
---
# Model Card for Model ID
# Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm Agent-ID (tall_tame_panther)
<!-- Provide a quick summary of what the model is/does. -->
## Gensyn RL-Swarm: Training & GGUF Inference for Quantized LLMs
[![Model](https://img.shields.io/badge/🤗%20Hugging%20Face-Model-blue)](https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther)
[![GGUF](https://img.shields.io/badge/GGUF-Available-8A2BE2)](https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/tree/main)
<img src="https://img.shields.io/badge/LLama.cpp-Compatible-orange" alt="llama.cpp">
[![Gensyn](https://img.shields.io/badge/Trained%20with-Gensyn%20RL--Swarm-pink)](https://gensyn.ai)
<a href="https://github.com/gensyn-ai/rl-swarm/releases"><img src="https://img.shields.io/github/v/release/gensyn-ai/rl-swarm?label=Version&color=FF0069" alt="version"></a>
[![License](https://img.shields.io/badge/License-MIT-green)](https://github.com/gensyn-ai/rl-swarm/blob/main/LICENSE.TXT)
## Model Overview
## Model Details
Our pick an experimental (advanced) mode this model a continuously trained **Qwen2.5-Coder-0.5B-Instruct** fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Group Relative Policy Optimization)** and supported format **GGUF (llama.cpp)** for enhanced code generation capabilities. **Note: Current training focuses on programming challenges with adaptive weighted sampling**.
### Model Description
- **Agent ID:** `tall_tame_panther`
- **Training Status:** 🟢 LIVE - Model updates automatically every 5-10 minutes
- **Auto-Sync GGUF Pipeline Status:** 🟢 LIVE - Commits update automatically every hour
- **Current Progress:** Round 13,054+ / 100,000 (13.05%)
- **Framework Version:** Gensyn RL-Swarm v0.7.0
- **Contract:** SwarmCoordinator v0.4.2
<!-- Provide a longer summary of what this model is. -->
## Key Features
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- **Real-time Training**: Continuous learning with distributed RL across Gensyn swarm network
- **Adaptive Reward System**: Dynamic quality enhanced and dataset weighting for optimal learning
- **Multi-domain Coding**: Trained on MBPP and CodeContests datasets with adaptive sampling
- **GGUF Support**: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M)
- **llama.cpp Compatible**: Ready for edge deployment and local inference
- **BF16 Precision**: Trained with bfloat16 for optimal performance
- **TGI Compatible**: Supports Text Generation Inference for production deployment
- **Chat Format Support**: Inherits Qwen2.5 chat template for conversational use
- **Developed by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
## Training Data
### Model Sources [optional]
The model is trained on a composite dataset with adaptive weighted sampling strategy:
<!-- Provide the basic links for the model. -->
| Dataset | Initial Weight | Adaptive Range | Focus Area |
|---------|----------------|----------------|------------|
| MBPP | 5 | 4-6 | Basic Python programming problems with test cases |
| CodeContests | 5 | 4-6 | Competitive programming challenges |
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
**Total Dataset Size:** Streaming datasets with infinite iteration
**Training Samples per Round:** 2
**Evaluation:** Real-time via swarm coordination with Ollama-based evaluator else judge
## Uses
### Adaptive Sampling Strategy
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
The implementation features an adaptive sampling system that adjusts dataset weights based on performance:
### Direct Use
> "When the solvers perform well, the proposer automatically increases the difficulty to keep challenging solvers to get better over time." - CodeZero-blog
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
The system monitors performance metrics every 5 rounds and adjusts the dataset weights to maintain optimal learning balance:
[More Information Needed]
```diff
- Update dataset weights based on recent performance
- Calculate recent average performance for each dataset
- Adjust/use weighted sampling if adaptive, based on perform difference
- Performance better on MBPP
- Performance better on CodeContests
- Update dataset weights every rounds & keep balanced
```
### Downstream Use [optional]
## Adaptive Reward System
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
### Quality Enhanced Implementation
[More Information Needed]
The reward system includes a quality data enhanced mechanism that evaluates code structure and documentation
### Out-of-Scope Use
> "Rewards are derived from multiple lightweight checks, ranging from code validity and formatting to alignment with the problem statement, combined into a single interpretable score." - CodeZero-blog
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
```diff
- Calculate quality data enhanced for well-structured code
- Documentation enhanced
- Structure enhanced
- Algorithmic efficiency (simple heuristic)
- Scale with base reward to avoid inflation
```
[More Information Needed]
### Adaptive Threshold System
## Bias, Risks, and Limitations
The system also includes an adaptive threshold mechanism that adjusts based on recent performance:
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
```diff
- Function adaptive threshold based on recent performance
- Performance quality data is consistently high
```
[More Information Needed]
## Performance Simulation
### Recommendations
### Reward Comparison
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Based on our simulation with 1000 samples, the adaptive reward system shows significant improvement
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
| System | MBPP Avg Reward | CodeContests Avg Reward | Overall Avg Reward | Improvement |
|---------|----------------|------------------------|-------------------|-------------|
| Original | 0.234 | -0.156 | 0.039 | - |
| Adaptive | 0.312 | -0.098 | 0.107 | ~174% |
## How to Get Started with the Model
### Training Progress
Use the code below to get started with the model.
Based on the logs provided, the model shows consistent progress:
[More Information Needed]
Metric data visualize train/loss by Weights & Biases (WanDB)
## Training Details
- Soon LIVE!
### Training Data
```
[2025-11-14 04:22:50,632][genrl.logging_utils.global_defs][INFO] - __ Joining round: 13053
[2025-11-14 04:23:50,633][genrl.logging_utils.global_defs][INFO] - Starting round: 13053/100000.
Map: 100%|______________________________________| 1/1 [00:00<00:00, 158.65 examples/s]
Map: 100%|______________________________________| 1/1 [00:00<00:00, 191.92 examples/s]
[2025-11-14 04:25:12,646][genrl.logging_utils.global_defs][INFO] - pushing model to huggingface
Processing Files (1 / 1) : 100%|___| 988MB / 988MB, 94.3MB/s
New Data Upload : 100%|___| 983MB / 983MB, 94.3MB/s
.....kpb5lid/model.safetensors: 100%|___| 988MB / 988MB, 94.3MB/s
[2025-11-14 04:27:01,877][genrl.logging_utils.global_defs][INFO] - Already finished round: 13053. Next check in 160.0s.
```
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
## Quick Start
[More Information Needed]
### Standard Transformers
### Training Procedure
```
from transformers import AutoModelForCausalLM, AutoTokenizer
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
model = AutoModelForCausalLM.from_pretrained(
"0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
#### Preprocessing [optional]
prompt = "Write a function to calculate the factorial of a number."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=256, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
[More Information Needed]
### Chat Format (Conversational)
```
from transformers import AutoModelForCausalLM, AutoTokenizer
#### Training Hyperparameters
model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
messages = [
{"role": "system", "content": "You are an expert Python programmer."},
{"role": "user", "content": "Write a function to check if a string is a palindrome."}
]
#### Speeds, Sizes, Times [optional]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs[0]))
```
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
### Text Generation Inference (TGI)
[More Information Needed]
```
docker run -d --gpus all \
-p 8080:80 \
-v $PWD/data:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id 0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther \
--max-input-length 4096 \
--max-total-tokens 8192
```
### GGUF with LLAMA.CPP
```
# Download quantized model (recommended: Q4_K_M)
wget https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf
# Run inference
./llama-cli -m Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf \
-p "Write a function to implement binary search in Python." \
--temp 0.6 --top-p 0.95
```
### Ollama
```
# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
SYSTEM "You are an expert Python programmer who writes clean, documented code."
EOF
# Create and run
ollama create qwen2.5-coder-swarm -f Modelfile
ollama run qwen2.5-coder-swarm "Write a function to calculate the factorial of a number."
```
## Available Formats
| Format | Size | Precision | Use Case | Download |
|--------|------|-----------|----------|----------|
| Safetensors (BF16) | 988 MB | BF16 | Full precision training/fine-tuning | `model.safetensors` |
| GGUF F16 | 994 MB | FP16 | High quality inference | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-F16.gguf` |
| GGUF Q5_K_M | 420 MB | 5-bit | Balanced quality/size | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q5_K_M.gguf` |
| GGUF Q4_K_M | 398 MB | 4-bit | **Recommended** for production | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf` |
| GGUF Q3_K_M | 355 MB | 3-bit | Smallest, fastest | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q3_K_M.gguf` |
All GGUF formats are **llama.cpp compatible** and auto-updated hourly.
## Chat Format & Conversational Use
This model inherits **Qwen2.5's chat template** for structured conversations.
### Format Structure
```
<|im_start|>system
{system_message}
<|im_end|>
<|im_start|>user
{user_message}
<|im_end|>
<|im_start|>assistant
{assistant_response}
<|im_end|>
```
### Chat Template Features
- **System Instructions**: Guide model behavior with system messages
- **Multi-turn Dialogue**: Maintains conversation context
- **Tool Calling**: Support function calling (if enabled in training)
- **Code Generation**: Optimized for generating Python code
**Note**: While model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on **programming challenges**.
## Training Configuration
### Gensyn RL-Swarm Quick-Architecture
```
Training Framework:
Method: GRPO (Group Relative Policy Optimization)
Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
Training Regime: bfloat16 mixed precision
Max Rounds: 100000
Update Frequency: Every 5-10 minutes
Generations per Round: 2
Tree-based Model: Default
Seed: 42
Blockchain Integration:
Network: Gensyn Testnet
Chain ID: 685685
Contract: SwarmCoordinator v0.4.2
Swarm Communication:
Framework: Hivemind P2P Backend
Initial Peers: 3 bootnodes
Beam Size: 10
Reward System:
Manager: RewardManager (SwarmGameManager/CodeGenerationRewards)
Reward Function: Adaptive with quality enhanced
Evaluator: Ollama (qwen2.5-coder:1.5b-instruct)
Judge API: https://codezero-judge.gensyn.ai
```
## Model Capabilities
This model excels at:
1. **Basic Python Programming**: Functions, loops, conditionals, data structures
2. **Algorithm Implementation**: Sorting, searching, graph algorithms
3. **String Manipulation**: Pattern matching, parsing, formatting
4. **Mathematical Functions**: Calculations, conversions, formulas
5. **Code Documentation**: Writing clear, commented functions
6. **Problem Solving**: Breaking down complex problems into manageable steps
## Limitations
- **Specialized Domain**: Optimized for programming challenges; may underperform on creative writing
- **Training in Progress**: Weights update every 5-10 minutes; performance varies
- **Scale**: 0.5B parameters - suitable for edge but not SOTA for complex programming
- **Experimental**: Decentralized RL training; behavior less predictable than supervised models
- **Context**: Best performance within 4K tokens (full 32K supported)
## Update Schedule
| Format | Frequency | Trigger |
|--------|-----------|---------|
| Safetensors (BF16) | Every 5-10 min | Automatic via RL-Swarm |
| GGUF (all formats) | Every 3 hour | Auto-conversion pipeline |
**Auto-Conversion Pipeline:**
1. Monitors repo for new training commits
2. Downloads latest `model.safetensors`
3. Converts to F16 GGUF base
4. Quantizes to Q3_K_M, Q4_K_M, Q5_K_M, Q6_K
5. Standar formats
Check commit history for exact timestamps.
## CodeZero Technical Details
### Architecture Components
1. **Game Manager**: Orchestrates training rounds and swarm coordination
2. **Trainer**: GRPO implementation for policy optimization
3. **Data Manager**: Dataset loading with adaptive weighted sampling
4. **Reward Manager**: Computes rewards via Ollama evaluator with quality enhanced
5. **Coordinator**: Blockchain integration for swarm state
6. **P2P Backend**: Hivemind DHT for model sharing
### Training Process
```
1. Agent joins swarm via P2P network
2. Coordinator assigns round via smart contract
3. Agent samples data from adaptive weighted datasets
4. Model generates 2 responses
5. Ollama evaluator assesses and assigns rewards with quality enhanced
6. GRPO updates policy based on rewards
7. Updated model shared via DHT
8. Best checkpoint saved to HuggingFace
9. Repeat
```
### Decentralization Benefits
- **Fault Tolerance**: Multiple agents; no single point of failure
- **Diverse Exploration**: Different agents explore different strategies
- **Collective Intelligence**: Agents learn from each other
- **Transparent**: All rounds verified on-chain
### Software Stack
- **Framework**: Gensyn RL-Swarm v0.7.0
- **Library**: transformers v4.57.1
- **P2P**: hivemind
- **Blockchain**: Gensyn testnet
- **Config**: Hydra + OmegaConf
- **Logging**: WandB integration
### Hardware Requirements
**Training GPU:**
- GPU: NVIDIA 4090 24GB+ (BF16 training)
- RAM: 16GB+
- Cores: 10+
- Storage: 50GB SSD
- Network: High bandwidth for P2P
**Training CPU Optimize:**
- CPU: INTEL or AMD
- Cores: 10+
- RAM: 16GB+
- Storage: 50GB SSD
- Network: High bandwidth for P2P
**Inference:**
- Safetensors: 8GB VRAM (GPU) / 16GB RAM (CPU)
- GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU)
- GGUF Q3_K_M: 3GB RAM (CPU-only)
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Training Progress Metrics
### Testing Data, Factors & Metrics
| Metric | Value | Target |
|--------|-------|--------|
| Completed Rounds | 13,054+ | 100,000 |
| Training Progress | 13.05% | 100% |
| Update Frequency | 5-10 min | Continuous |
#### Testing Data
**Note**: * **average\@k:** Average performance across `k` attempts, measuring consistency. * **pass\@k:** Probability of at least one correct solution in `k` attempts, measuring capability.Current metrics track training rounds completed in decentralized swarm.
<!-- This should link to a Dataset Card if possible. -->
### Adaptive Reward Performance
[More Information Needed]
Our adaptive reward system has shown approximately ~174% improvement in reward scores compared to the baseline system:
#### Factors
```
Original System:
Overall Avg Reward: 0.039
MBPP Avg Reward: 0.234
CodeContests Avg Reward: -0.156
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
Adaptive System:
Overall Avg Reward: 0.107
MBPP Avg Reward: 0.312
CodeContests Avg Reward: -0.098
[More Information Needed]
Improvement: 0.068 (~174% increase)
```
#### Metrics
## Citation
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
```
@misc{qwen2.5-coder-gensyn-swarm-2025,
author = {0xgrey},
title = {Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm: Continuous RL Training on Distributed Swarm with Adaptive Rewards},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther}},
note = {Agent ID: tall\_tame\_panther}
}
[More Information Needed]
@misc{gensyn-rl-swarm-2025,
title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework},
author = {Gensyn AI},
year = {2025},
url = {https://gensyn.ai}
}
### Results
@misc{codezero-2025,
title = {CodeZero: A Collaborative Coding Environment for Distributed RL},
author = {Gensyn AI},
year = {2025},
url = {https://docs.gensyn.ai/testnet/rl-swarm/how-it-works/codezero}
}
```
[More Information Needed]
## References
#### Summary
- **Gensyn Documentation**: https://docs.gensyn.ai/
- **Gensyn GitHub**: https://github.com/gensyn-ai
- **RL-Swarm Contracts**: https://github.com/gensyn-ai/rl-swarm-contracts
- **Qwen2.5-Coder Model Card**: https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct
- **MBPP Dataset**: https://huggingface.co/datasets/google-research-datasets/mbpp
- **CodeContests Dataset**: https://huggingface.co/datasets/deepmind/code_contests
- **arXiv:1910.09700**: ML Carbon Emissions methodology
## Contact
## Model Examination [optional]
- **Developer**: 0xgrey
- **Agent ID**: tall_tame_panther
- **Community**: [Gensyn Discord](https://discord.gg/gensyn)
<!-- Relevant interpretability work for the model goes here -->
---
[More Information Needed]
**⚠️ Important**: This is a continuously trained model. For reproducibility, specify commit hash:
## Environmental Impact
```
git clone https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther
cd Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther
git checkout <commit-hash>
```
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
---
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
<div align="center">
- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
**🤖 Trained with ❤️ using Gensyn RL-Swarm**
## Technical Specifications [optional]
[![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-orange?style=for-the-badge)](https://gensyn.ai)
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
[More Information Needed]
#### Hardware
[More Information Needed]
#### Software
[More Information Needed]
## Citation [optional]
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Model Card Authors [optional]
[More Information Needed]
## Model Card Contact
[More Information Needed]
</div>