From 613011f18bf10da88227478335358ca90ab87335 Mon Sep 17 00:00:00 2001 From: grey <0xgr3y@users.noreply.huggingface.co> Date: Fri, 14 Nov 2025 11:07:43 +0000 Subject: [PATCH] Update README.md --- README.md | 146 +++++++++++++++++++++++------------------------------- 1 file changed, 63 insertions(+), 83 deletions(-) diff --git a/README.md b/README.md index 67079a6..4b269af 100644 --- a/README.md +++ b/README.md @@ -28,20 +28,24 @@ base_model: - Qwen/Qwen2.5-Coder-0.5B --- -# Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm Agent-ID (tall_tame_panther) +

Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm Agent-ID (tall_tame_panther)

-## Gensyn RL-Swarm: Training & GGUF Inference for Quantized LLMs +

Gensyn RL-Swarm: Training & GGUF Inference for Quantized LLMs

-[![Model](https://img.shields.io/badge/🤗%20Hugging%20Face-Model-blue)](https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther) -[![GGUF](https://img.shields.io/badge/GGUF-Available-8A2BE2)](https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/tree/main) +

+Model +GGUF llama.cpp -[![Gensyn](https://img.shields.io/badge/Trained%20with-Gensyn%20RL--Swarm-pink)](https://gensyn.ai) +Gensyn version -[![License](https://img.shields.io/badge/License-MIT-green)](https://github.com/gensyn-ai/rl-swarm/blob/main/LICENSE.TXT) +License +

+ +--- ## Model Overview -Our pick an experimental (advanced) mode this model a continuously trained **Qwen2.5-Coder-0.5B-Instruct** fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Group Relative Policy Optimization)** and supported format **GGUF (llama.cpp)** for enhanced code generation capabilities. **Note: Current training focuses on programming challenges with adaptive weighted sampling**. +Our pick an experimental (advanced) mode at this model a continuously trained **Qwen2.5-Coder-0.5B-Instruct** fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Group Relative Policy Optimization)** and supported format **GGUF (llama.cpp)** for enhanced code generation capabilities. **Note: Current training focuses on programming challenges with adaptive weighted sampling**. - **Agent ID:** `tall_tame_panther` - **Training Status:** 🟢 LIVE - Model updates automatically every 5-10 minutes @@ -53,7 +57,7 @@ Our pick an experimental (advanced) mode this model a continuously trained **Qwe ## Key Features - **Real-time Training**: Continuous learning with distributed RL across Gensyn swarm network -- **Adaptive Reward System**: Dynamic quality enhanced and dataset weighting for optimal learning +- **Adaptive System**: Dynamic quality enhanced and dataset weighting for optimal learning - **Multi-domain Coding**: Trained on MBPP and CodeContests datasets with adaptive sampling - **GGUF Support**: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M) - **llama.cpp Compatible**: Ready for edge deployment and local inference @@ -72,34 +76,31 @@ The model is trained on a composite dataset with adaptive weighted sampling stra **Total Dataset Size:** Streaming datasets with infinite iteration **Training Samples per Round:** 2 -**Evaluation:** Real-time via swarm coordination with Ollama-based evaluator else judge +**Evaluation:** Real-time via Swarm Coordination with Ollama-based evaluator else Judge -### Adaptive Sampling Strategy - -The implementation features an adaptive sampling system that adjusts dataset weights based on performance: +## Adaptive Sampling Strategy > "When the solvers perform well, the proposer automatically increases the difficulty to keep challenging solvers to get better over time." - CodeZero-blog -The system monitors performance metrics every 5 rounds and adjusts the dataset weights to maintain optimal learning balance: - ```diff +The implementation features an adaptive sampling system that adjusts dataset weights based on performance +The system monitors performance metrics every 5 rounds and adjusts the dataset weights to maintain optimal learning balance - Update dataset weights based on recent performance - Calculate recent average performance for each dataset - Adjust/use weighted sampling if adaptive, based on perform difference -- Performance better on MBPP +- Performance better on MBPP (Mostly Basic Python Problems) - Performance better on CodeContests - Update dataset weights every rounds & keep balanced ``` ## Adaptive Reward System - ### Quality Enhanced Implementation -The reward system includes a quality data enhanced mechanism that evaluates code structure and documentation > "Rewards are derived from multiple lightweight checks, ranging from code validity and formatting to alignment with the problem statement, combined into a single interpretable score." - CodeZero-blog ```diff +The reward system includes a quality data enhanced mechanism that evaluates code structure and documentation - Calculate quality data enhanced for well-structured code - Documentation enhanced - Structure enhanced @@ -109,15 +110,14 @@ The reward system includes a quality data enhanced mechanism that evaluates code ### Adaptive Threshold System -The system also includes an adaptive threshold mechanism that adjusts based on recent performance: ```diff +The system also includes an adaptive threshold mechanism that adjusts based on recent performance - Function adaptive threshold based on recent performance - Performance quality data is consistently high ``` -## Performance Simulation - +## Quick Performance Simulation ### Reward Comparison Based on our simulation with 1000 samples, the adaptive reward system shows significant improvement @@ -132,7 +132,6 @@ Based on our simulation with 1000 samples, the adaptive reward system shows sign Based on the logs provided, the model shows consistent progress: Metric data visualize train/loss by Weights & Biases (WanDB) - - Soon LIVE! ``` @@ -147,39 +146,34 @@ New Data Upload : 100%|___| 983MB / 983MB, 94.3MB/s [2025-11-14 04:27:01,877][genrl.logging_utils.global_defs][INFO] - Already finished round: 13053. Next check in 160.0s. ``` -## Quick Start +## Quick Start Inferences ### Standard Transformers -``` +```bash from transformers import AutoModelForCausalLM, AutoTokenizer - model = AutoModelForCausalLM.from_pretrained( "0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther", torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther") - prompt = "Write a function to calculate the factorial of a number." inputs = tokenizer(prompt, return_tensors="pt").to(model.device) -outputs = model.generate(**inputs, max_length=256, temperature=0.6, top_p=0.95) +outputs = model.generate(**inputs, max_length=256, temperature=0.7, top_p=0.8) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Chat Format (Conversational) -``` +```bash from transformers import AutoModelForCausalLM, AutoTokenizer - model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther") tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther") - messages = [ {"role": "system", "content": "You are an expert Python programmer."}, {"role": "user", "content": "Write a function to check if a string is a palindrome."} ] - text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs, max_length=512) @@ -188,7 +182,7 @@ print(tokenizer.decode(outputs[0])) ### Text Generation Inference (TGI) -``` +```bash docker run -d --gpus all \ -p 8080:80 \ -v $PWD/data:/data \ @@ -200,46 +194,44 @@ docker run -d --gpus all \ ### GGUF with LLAMA.CPP -``` +```bash # Download quantized model (recommended: Q4_K_M) -wget https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf - +wget https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen2.5-Coder-0.5B-Q4_K_M.gguf # Run inference ./llama-cli -m Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf \ -p "Write a function to implement binary search in Python." \ - --temp 0.6 --top-p 0.95 + --temp 0.7 --top-p 0.8 ``` ### Ollama -``` +```bash # Create Modelfile cat > Modelfile << 'EOF' -FROM ./Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf -PARAMETER temperature 0.6 -PARAMETER top_p 0.95 +FROM ./0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/Qwen2.5-Coder-0.5B-Q4_K_M.gguf +PARAMETER temperature 0.7 +PARAMETER top_p 0.8 PARAMETER top_k 20 SYSTEM "You are an expert Python programmer who writes clean, documented code." EOF - # Create and run ollama create qwen2.5-coder-swarm -f Modelfile ollama run qwen2.5-coder-swarm "Write a function to calculate the factorial of a number." ``` -## Available Formats +## Available Quantization Formats | Format | Size | Precision | Use Case | Download | |--------|------|-----------|----------|----------| | Safetensors (BF16) | 988 MB | BF16 | Full precision training/fine-tuning | `model.safetensors` | -| GGUF F16 | 994 MB | FP16 | High quality inference | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-F16.gguf` | -| GGUF Q5_K_M | 420 MB | 5-bit | Balanced quality/size | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q5_K_M.gguf` | -| GGUF Q4_K_M | 398 MB | 4-bit | **Recommended** for production | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf` | -| GGUF Q3_K_M | 355 MB | 3-bit | Smallest, fastest | `Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q3_K_M.gguf` | +| GGUF F16 | 994 MB | FP16 | High quality inference | `Qwen2.5-Coder-0.5B-F16.gguf` | +| GGUF Q5_K_M | 420 MB | 5-bit | Balanced quality/size | `Qwen2.5-Coder-0.5B-Q5_K_M.gguf` | +| GGUF Q4_K_M | 398 MB | 4-bit | **Recommended** for production | `Qwen2.5-Coder-0.5B-Q4_K_M.gguf` | +| GGUF Q3_K_M | 355 MB | 3-bit | Smallest, fastest | `Qwen2.5-Coder-0.5B-Q3_K_M.gguf` | All GGUF formats are **llama.cpp compatible** and auto-updated hourly. -## Chat Format & Conversational Use +## Chat Format & Conversational This model inherits **Qwen2.5's chat template** for structured conversations. @@ -266,36 +258,32 @@ This model inherits **Qwen2.5's chat template** for structured conversations. **Note**: While model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on **programming challenges**. -## Training Configuration - ### Gensyn RL-Swarm Quick-Architecture -``` +```diff Training Framework: - Method: GRPO (Group Relative Policy Optimization) - Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct - Training Regime: bfloat16 mixed precision - Max Rounds: 100000 - Update Frequency: Every 5-10 minutes - Generations per Round: 2 - Tree-based Model: Default - Seed: 42 - +- Method: GRPO (Group Relative Policy Optimization) +- Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct +- Training Regime: bfloat16 mixed precision +- Max Rounds: 100000 +- Update Frequency: Every 5-10 minutes +- Generations per Round: 2 +- Batch size: Combine +- Tree-based Model: 2 tree +- Seed: 42 Blockchain Integration: - Network: Gensyn Testnet - Chain ID: 685685 - Contract: SwarmCoordinator v0.4.2 - +- Network: Gensyn Testnet +- Chain ID: 685685 +- Contract: SwarmCoordinator v0.4.2 Swarm Communication: - Framework: Hivemind P2P Backend - Initial Peers: 3 bootnodes - Beam Size: 10 - +- Framework: Hivemind P2P Backend +- Initial Peers: 3 bootnodes +- Beam Size: 10 Reward System: - Manager: RewardManager (SwarmGameManager/CodeGenerationRewards) - Reward Function: Adaptive with quality enhanced - Evaluator: Ollama (qwen2.5-coder:1.5b-instruct) - Judge API: https://codezero-judge.gensyn.ai +- Manager: RewardManager (SwarmGameManager/CodeGenerationRewards) +- Reward Function: Adaptive with quality enhanced +- Evaluator: Ollama (qwen2.5-coder:1.5b-instruct) +- Judge API: https://codezero-judge.gensyn.ai ``` ## Model Capabilities @@ -325,6 +313,7 @@ This model excels at: | GGUF (all formats) | Every 3 hour | Auto-conversion pipeline | **Auto-Conversion Pipeline:** + 1. Monitors repo for new training commits 2. Downloads latest `model.safetensors` 3. Converts to F16 GGUF base @@ -333,8 +322,6 @@ This model excels at: Check commit history for exact timestamps. -## CodeZero Technical Details - ### Architecture Components 1. **Game Manager**: Orchestrates training rounds and swarm coordination @@ -395,8 +382,6 @@ Check commit history for exact timestamps. - GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU) - GGUF Q3_K_M: 3GB RAM (CPU-only) -## Evaluation - ### Training Progress Metrics | Metric | Value | Target | @@ -405,23 +390,21 @@ Check commit history for exact timestamps. | Training Progress | 13.05% | 100% | | Update Frequency | 5-10 min | Continuous | -**Note**: * **average\@k:** Average performance across `k` attempts, measuring consistency. * **pass\@k:** Probability of at least one correct solution in `k` attempts, measuring capability.Current metrics track training rounds completed in decentralized swarm. +**Note**: **average\@k:** Average performance across `k` attempts, measuring consistency. **pass\@k:** Probability of at least one correct solution in `k` attempts, measuring capability.Current metrics track training rounds completed in decentralized swarm. ### Adaptive Reward Performance Our adaptive reward system has shown approximately ~174% improvement in reward scores compared to the baseline system: ``` -Original System: +Original: Overall Avg Reward: 0.039 MBPP Avg Reward: 0.234 CodeContests Avg Reward: -0.156 - -Adaptive System: +Adaptive: Overall Avg Reward: 0.107 MBPP Avg Reward: 0.312 CodeContests Avg Reward: -0.098 - Improvement: 0.068 (~174% increase) ``` @@ -436,14 +419,12 @@ Improvement: 0.068 (~174% increase) howpublished = {\url{https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther}}, note = {Agent ID: tall\_tame\_panther} } - @misc{gensyn-rl-swarm-2025, title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework}, author = {Gensyn AI}, year = {2025}, url = {https://gensyn.ai} } - @misc{codezero-2025, title = {CodeZero: A Collaborative Coding Environment for Distributed RL}, author = {Gensyn AI}, @@ -469,7 +450,6 @@ Improvement: 0.068 (~174% increase) - **Agent ID**: tall_tame_panther - **Community**: [Gensyn Discord](https://discord.gg/gensyn) ---- **⚠️ Important**: This is a continuously trained model. For reproducibility, specify commit hash: @@ -485,6 +465,6 @@ git checkout **🤖 Trained with ❤️ using Gensyn RL-Swarm** -[![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-orange?style=for-the-badge)](https://gensyn.ai) +[![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-pink?style=for-the-badge)](https://gensyn.ai) \ No newline at end of file