diff --git a/README.md b/README.md index 34ad813..67079a6 100644 --- a/README.md +++ b/README.md @@ -1,204 +1,490 @@ --- library_name: transformers tags: +- text-generation +- qwen2.5-coder - rl-swarm - genrl-swarm - grpo - gensyn +- trl +- code-generation +- programming +- continuous-training +- reinforcement-learning +- safetensors +- gguf +- math +- logic +- conversational +- text-generation-inference - I am tall_tame_panther +- python +- agent +license: mit +language: +- en +base_model: +- Qwen/Qwen2.5-Coder-0.5B --- -# Model Card for Model ID +# Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm Agent-ID (tall_tame_panther) - +## Gensyn RL-Swarm: Training & GGUF Inference for Quantized LLMs +[![Model](https://img.shields.io/badge/🤗%20Hugging%20Face-Model-blue)](https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther) +[![GGUF](https://img.shields.io/badge/GGUF-Available-8A2BE2)](https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/tree/main) +llama.cpp +[![Gensyn](https://img.shields.io/badge/Trained%20with-Gensyn%20RL--Swarm-pink)](https://gensyn.ai) +version +[![License](https://img.shields.io/badge/License-MIT-green)](https://github.com/gensyn-ai/rl-swarm/blob/main/LICENSE.TXT) +## Model Overview -## Model Details +Our pick an experimental (advanced) mode this model a continuously trained **Qwen2.5-Coder-0.5B-Instruct** fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Group Relative Policy Optimization)** and supported format **GGUF (llama.cpp)** for enhanced code generation capabilities. **Note: Current training focuses on programming challenges with adaptive weighted sampling**. -### Model Description +- **Agent ID:** `tall_tame_panther` +- **Training Status:** 🟢 LIVE - Model updates automatically every 5-10 minutes +- **Auto-Sync GGUF Pipeline Status:** 🟢 LIVE - Commits update automatically every hour +- **Current Progress:** Round 13,054+ / 100,000 (13.05%) +- **Framework Version:** Gensyn RL-Swarm v0.7.0 +- **Contract:** SwarmCoordinator v0.4.2 - +## Key Features -This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. +- **Real-time Training**: Continuous learning with distributed RL across Gensyn swarm network +- **Adaptive Reward System**: Dynamic quality enhanced and dataset weighting for optimal learning +- **Multi-domain Coding**: Trained on MBPP and CodeContests datasets with adaptive sampling +- **GGUF Support**: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M) +- **llama.cpp Compatible**: Ready for edge deployment and local inference +- **BF16 Precision**: Trained with bfloat16 for optimal performance +- **TGI Compatible**: Supports Text Generation Inference for production deployment +- **Chat Format Support**: Inherits Qwen2.5 chat template for conversational use -- **Developed by:** [More Information Needed] -- **Funded by [optional]:** [More Information Needed] -- **Shared by [optional]:** [More Information Needed] -- **Model type:** [More Information Needed] -- **Language(s) (NLP):** [More Information Needed] -- **License:** [More Information Needed] -- **Finetuned from model [optional]:** [More Information Needed] +## Training Data -### Model Sources [optional] +The model is trained on a composite dataset with adaptive weighted sampling strategy: - +| Dataset | Initial Weight | Adaptive Range | Focus Area | +|---------|----------------|----------------|------------| +| MBPP | 5 | 4-6 | Basic Python programming problems with test cases | +| CodeContests | 5 | 4-6 | Competitive programming challenges | -- **Repository:** [More Information Needed] -- **Paper [optional]:** [More Information Needed] -- **Demo [optional]:** [More Information Needed] +**Total Dataset Size:** Streaming datasets with infinite iteration +**Training Samples per Round:** 2 +**Evaluation:** Real-time via swarm coordination with Ollama-based evaluator else judge -## Uses +### Adaptive Sampling Strategy - +The implementation features an adaptive sampling system that adjusts dataset weights based on performance: -### Direct Use +> "When the solvers perform well, the proposer automatically increases the difficulty to keep challenging solvers to get better over time." - CodeZero-blog - +The system monitors performance metrics every 5 rounds and adjusts the dataset weights to maintain optimal learning balance: -[More Information Needed] +```diff +- Update dataset weights based on recent performance +- Calculate recent average performance for each dataset +- Adjust/use weighted sampling if adaptive, based on perform difference +- Performance better on MBPP +- Performance better on CodeContests +- Update dataset weights every rounds & keep balanced +``` -### Downstream Use [optional] +## Adaptive Reward System - +### Quality Enhanced Implementation -[More Information Needed] +The reward system includes a quality data enhanced mechanism that evaluates code structure and documentation -### Out-of-Scope Use +> "Rewards are derived from multiple lightweight checks, ranging from code validity and formatting to alignment with the problem statement, combined into a single interpretable score." - CodeZero-blog - +```diff +- Calculate quality data enhanced for well-structured code +- Documentation enhanced +- Structure enhanced +- Algorithmic efficiency (simple heuristic) +- Scale with base reward to avoid inflation +``` -[More Information Needed] +### Adaptive Threshold System -## Bias, Risks, and Limitations +The system also includes an adaptive threshold mechanism that adjusts based on recent performance: - +```diff +- Function adaptive threshold based on recent performance +- Performance quality data is consistently high +``` -[More Information Needed] +## Performance Simulation -### Recommendations +### Reward Comparison - +Based on our simulation with 1000 samples, the adaptive reward system shows significant improvement -Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. +| System | MBPP Avg Reward | CodeContests Avg Reward | Overall Avg Reward | Improvement | +|---------|----------------|------------------------|-------------------|-------------| +| Original | 0.234 | -0.156 | 0.039 | - | +| Adaptive | 0.312 | -0.098 | 0.107 | ~174% | -## How to Get Started with the Model +### Training Progress -Use the code below to get started with the model. +Based on the logs provided, the model shows consistent progress: -[More Information Needed] +Metric data visualize train/loss by Weights & Biases (WanDB) -## Training Details +- Soon LIVE! -### Training Data +``` +[2025-11-14 04:22:50,632][genrl.logging_utils.global_defs][INFO] - __ Joining round: 13053 +[2025-11-14 04:23:50,633][genrl.logging_utils.global_defs][INFO] - Starting round: 13053/100000. +Map: 100%|______________________________________| 1/1 [00:00<00:00, 158.65 examples/s] +Map: 100%|______________________________________| 1/1 [00:00<00:00, 191.92 examples/s] +[2025-11-14 04:25:12,646][genrl.logging_utils.global_defs][INFO] - pushing model to huggingface +Processing Files (1 / 1) : 100%|___| 988MB / 988MB, 94.3MB/s +New Data Upload : 100%|___| 983MB / 983MB, 94.3MB/s +.....kpb5lid/model.safetensors: 100%|___| 988MB / 988MB, 94.3MB/s +[2025-11-14 04:27:01,877][genrl.logging_utils.global_defs][INFO] - Already finished round: 13053. Next check in 160.0s. +``` - +## Quick Start -[More Information Needed] +### Standard Transformers -### Training Procedure +``` +from transformers import AutoModelForCausalLM, AutoTokenizer - +model = AutoModelForCausalLM.from_pretrained( + "0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther", + torch_dtype="auto", + device_map="auto" +) +tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther") -#### Preprocessing [optional] +prompt = "Write a function to calculate the factorial of a number." +inputs = tokenizer(prompt, return_tensors="pt").to(model.device) +outputs = model.generate(**inputs, max_length=256, temperature=0.6, top_p=0.95) +print(tokenizer.decode(outputs[0], skip_special_tokens=True)) +``` -[More Information Needed] +### Chat Format (Conversational) +``` +from transformers import AutoModelForCausalLM, AutoTokenizer -#### Training Hyperparameters +model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther") +tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther") -- **Training regime:** [More Information Needed] +messages = [ + {"role": "system", "content": "You are an expert Python programmer."}, + {"role": "user", "content": "Write a function to check if a string is a palindrome."} +] -#### Speeds, Sizes, Times [optional] +text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) +inputs = tokenizer(text, return_tensors="pt") +outputs = model.generate(**inputs, max_length=512) +print(tokenizer.decode(outputs[0])) +``` - +### Text Generation Inference (TGI) -[More Information Needed] +``` +docker run -d --gpus all \ + -p 8080:80 \ + -v $PWD/data:/data \ + ghcr.io/huggingface/text-generation-inference:latest \ + --model-id 0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther \ + --max-input-length 4096 \ + --max-total-tokens 8192 +``` + +### GGUF with LLAMA.CPP + +``` +# Download quantized model (recommended: Q4_K_M) +wget https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf + +# Run inference +./llama-cli -m Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf \ + -p "Write a function to implement binary search in Python." \ + --temp 0.6 --top-p 0.95 +``` + +### Ollama + +``` +# Create Modelfile +cat > Modelfile << 'EOF' +FROM ./Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf +PARAMETER temperature 0.6 +PARAMETER top_p 0.95 +PARAMETER top_k 20 +SYSTEM "You are an expert Python programmer who writes clean, documented code." +EOF + +# Create and run +ollama create qwen2.5-coder-swarm -f Modelfile +ollama run qwen2.5-coder-swarm "Write a function to calculate the factorial of a number." +``` + +## Available Formats + +| Format | Size | Precision | Use Case | Download | +|--------|------|-----------|----------|----------| +| Safetensors (BF16) | 988 MB | BF16 | Full precision training/fine-tuning | `model.safetensors` | +| GGUF F16 | 994 MB | FP16 | High quality inference | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-F16.gguf` | +| GGUF Q5_K_M | 420 MB | 5-bit | Balanced quality/size | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q5_K_M.gguf` | +| GGUF Q4_K_M | 398 MB | 4-bit | **Recommended** for production | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf` | +| GGUF Q3_K_M | 355 MB | 3-bit | Smallest, fastest | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q3_K_M.gguf` | + +All GGUF formats are **llama.cpp compatible** and auto-updated hourly. + +## Chat Format & Conversational Use + +This model inherits **Qwen2.5's chat template** for structured conversations. + +### Format Structure + +``` +<|im_start|>system +{system_message} +<|im_end|> +<|im_start|>user +{user_message} +<|im_end|> +<|im_start|>assistant +{assistant_response} +<|im_end|> +``` + +### Chat Template Features + +- **System Instructions**: Guide model behavior with system messages +- **Multi-turn Dialogue**: Maintains conversation context +- **Tool Calling**: Support function calling (if enabled in training) +- **Code Generation**: Optimized for generating Python code + +**Note**: While model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on **programming challenges**. + +## Training Configuration + +### Gensyn RL-Swarm Quick-Architecture + +``` +Training Framework: + Method: GRPO (Group Relative Policy Optimization) + Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct + Training Regime: bfloat16 mixed precision + Max Rounds: 100000 + Update Frequency: Every 5-10 minutes + Generations per Round: 2 + Tree-based Model: Default + Seed: 42 + +Blockchain Integration: + Network: Gensyn Testnet + Chain ID: 685685 + Contract: SwarmCoordinator v0.4.2 + +Swarm Communication: + Framework: Hivemind P2P Backend + Initial Peers: 3 bootnodes + Beam Size: 10 + +Reward System: + Manager: RewardManager (SwarmGameManager/CodeGenerationRewards) + Reward Function: Adaptive with quality enhanced + Evaluator: Ollama (qwen2.5-coder:1.5b-instruct) + Judge API: https://codezero-judge.gensyn.ai +``` + +## Model Capabilities + +This model excels at: + +1. **Basic Python Programming**: Functions, loops, conditionals, data structures +2. **Algorithm Implementation**: Sorting, searching, graph algorithms +3. **String Manipulation**: Pattern matching, parsing, formatting +4. **Mathematical Functions**: Calculations, conversions, formulas +5. **Code Documentation**: Writing clear, commented functions +6. **Problem Solving**: Breaking down complex problems into manageable steps + +## Limitations + +- **Specialized Domain**: Optimized for programming challenges; may underperform on creative writing +- **Training in Progress**: Weights update every 5-10 minutes; performance varies +- **Scale**: 0.5B parameters - suitable for edge but not SOTA for complex programming +- **Experimental**: Decentralized RL training; behavior less predictable than supervised models +- **Context**: Best performance within 4K tokens (full 32K supported) + +## Update Schedule + +| Format | Frequency | Trigger | +|--------|-----------|---------| +| Safetensors (BF16) | Every 5-10 min | Automatic via RL-Swarm | +| GGUF (all formats) | Every 3 hour | Auto-conversion pipeline | + +**Auto-Conversion Pipeline:** +1. Monitors repo for new training commits +2. Downloads latest `model.safetensors` +3. Converts to F16 GGUF base +4. Quantizes to Q3_K_M, Q4_K_M, Q5_K_M, Q6_K +5. Standar formats + +Check commit history for exact timestamps. + +## CodeZero Technical Details + +### Architecture Components + +1. **Game Manager**: Orchestrates training rounds and swarm coordination +2. **Trainer**: GRPO implementation for policy optimization +3. **Data Manager**: Dataset loading with adaptive weighted sampling +4. **Reward Manager**: Computes rewards via Ollama evaluator with quality enhanced +5. **Coordinator**: Blockchain integration for swarm state +6. **P2P Backend**: Hivemind DHT for model sharing + +### Training Process + +``` +1. Agent joins swarm via P2P network +2. Coordinator assigns round via smart contract +3. Agent samples data from adaptive weighted datasets +4. Model generates 2 responses +5. Ollama evaluator assesses and assigns rewards with quality enhanced +6. GRPO updates policy based on rewards +7. Updated model shared via DHT +8. Best checkpoint saved to HuggingFace +9. Repeat +``` + +### Decentralization Benefits + +- **Fault Tolerance**: Multiple agents; no single point of failure +- **Diverse Exploration**: Different agents explore different strategies +- **Collective Intelligence**: Agents learn from each other +- **Transparent**: All rounds verified on-chain + +### Software Stack + +- **Framework**: Gensyn RL-Swarm v0.7.0 +- **Library**: transformers v4.57.1 +- **P2P**: hivemind +- **Blockchain**: Gensyn testnet +- **Config**: Hydra + OmegaConf +- **Logging**: WandB integration + +### Hardware Requirements + +**Training GPU:** +- GPU: NVIDIA 4090 24GB+ (BF16 training) +- RAM: 16GB+ +- Cores: 10+ +- Storage: 50GB SSD +- Network: High bandwidth for P2P + +**Training CPU Optimize:** +- CPU: INTEL or AMD +- Cores: 10+ +- RAM: 16GB+ +- Storage: 50GB SSD +- Network: High bandwidth for P2P + +**Inference:** +- Safetensors: 8GB VRAM (GPU) / 16GB RAM (CPU) +- GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU) +- GGUF Q3_K_M: 3GB RAM (CPU-only) ## Evaluation - +### Training Progress Metrics -### Testing Data, Factors & Metrics +| Metric | Value | Target | +|--------|-------|--------| +| Completed Rounds | 13,054+ | 100,000 | +| Training Progress | 13.05% | 100% | +| Update Frequency | 5-10 min | Continuous | -#### Testing Data +**Note**: * **average\@k:** Average performance across `k` attempts, measuring consistency. * **pass\@k:** Probability of at least one correct solution in `k` attempts, measuring capability.Current metrics track training rounds completed in decentralized swarm. - +### Adaptive Reward Performance -[More Information Needed] +Our adaptive reward system has shown approximately ~174% improvement in reward scores compared to the baseline system: -#### Factors +``` +Original System: + Overall Avg Reward: 0.039 + MBPP Avg Reward: 0.234 + CodeContests Avg Reward: -0.156 - +Adaptive System: + Overall Avg Reward: 0.107 + MBPP Avg Reward: 0.312 + CodeContests Avg Reward: -0.098 -[More Information Needed] +Improvement: 0.068 (~174% increase) +``` -#### Metrics +## Citation - +``` +@misc{qwen2.5-coder-gensyn-swarm-2025, + author = {0xgrey}, + title = {Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm: Continuous RL Training on Distributed Swarm with Adaptive Rewards}, + year = {2025}, + publisher = {HuggingFace}, + howpublished = {\url{https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther}}, + note = {Agent ID: tall\_tame\_panther} +} -[More Information Needed] +@misc{gensyn-rl-swarm-2025, + title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework}, + author = {Gensyn AI}, + year = {2025}, + url = {https://gensyn.ai} +} -### Results +@misc{codezero-2025, + title = {CodeZero: A Collaborative Coding Environment for Distributed RL}, + author = {Gensyn AI}, + year = {2025}, + url = {https://docs.gensyn.ai/testnet/rl-swarm/how-it-works/codezero} +} +``` -[More Information Needed] +## References -#### Summary +- **Gensyn Documentation**: https://docs.gensyn.ai/ +- **Gensyn GitHub**: https://github.com/gensyn-ai +- **RL-Swarm Contracts**: https://github.com/gensyn-ai/rl-swarm-contracts +- **Qwen2.5-Coder Model Card**: https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct +- **MBPP Dataset**: https://huggingface.co/datasets/google-research-datasets/mbpp +- **CodeContests Dataset**: https://huggingface.co/datasets/deepmind/code_contests +- **arXiv:1910.09700**: ML Carbon Emissions methodology +## Contact -## Model Examination [optional] +- **Developer**: 0xgrey +- **Agent ID**: tall_tame_panther +- **Community**: [Gensyn Discord](https://discord.gg/gensyn) - +--- -[More Information Needed] +**⚠️ Important**: This is a continuously trained model. For reproducibility, specify commit hash: -## Environmental Impact +``` +git clone https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther +cd Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther +git checkout +``` - +--- -Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). +
-- **Hardware Type:** [More Information Needed] -- **Hours used:** [More Information Needed] -- **Cloud Provider:** [More Information Needed] -- **Compute Region:** [More Information Needed] -- **Carbon Emitted:** [More Information Needed] +**🤖 Trained with ❤️ using Gensyn RL-Swarm** -## Technical Specifications [optional] +[![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-orange?style=for-the-badge)](https://gensyn.ai) -### Model Architecture and Objective - -[More Information Needed] - -### Compute Infrastructure - -[More Information Needed] - -#### Hardware - -[More Information Needed] - -#### Software - -[More Information Needed] - -## Citation [optional] - - - -**BibTeX:** - -[More Information Needed] - -**APA:** - -[More Information Needed] - -## Glossary [optional] - - - -[More Information Needed] - -## More Information [optional] - -[More Information Needed] - -## Model Card Authors [optional] - -[More Information Needed] - -## Model Card Contact - -[More Information Needed] \ No newline at end of file +
\ No newline at end of file