From 3cb39a634fac933076d93505b4167749f1674c72 Mon Sep 17 00:00:00 2001 From: grey <0xgr3y@users.noreply.huggingface.co> Date: Mon, 10 Nov 2025 08:08:36 +0000 Subject: [PATCH] Update: Detail opt-use --- README.md | 383 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 202 insertions(+), 181 deletions(-) diff --git a/README.md b/README.md index 474353c..dd25ec6 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,6 @@ tags: - gguf - conversational - text-generation-inference -- I am tall_tame_panther pipeline_tag: text-generation license: apache-2.0 language: @@ -31,16 +30,32 @@ datasets: - fraction_simplification - basic_arithmetic inference: true -widget: -- text: What is 15 * 23? - example_title: Basic Arithmetic -- text: Convert decimal 255 to hexadecimal. - example_title: Base Conversion -- text: Simplify the fraction 24/36. - example_title: Fraction Simplification model-index: - name: Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther - results: [] + results: + - task: + type: text-generation + name: Mathematical Reasoning + dataset: + name: Composite Reasoning Dataset + type: custom + metrics: + - type: training_rounds + value: 43610 + name: Completed Training Rounds + - type: total_rounds + value: 100000 + name: Target Rounds + - type: progress + value: 43.61 + name: Training Progress (%) +widget: +- text: "What is 15 * 23?" + example_title: "Basic Arithmetic" +- text: "Convert decimal 255 to hexadecimal." + example_title: "Base Conversion" +- text: "Simplify the fraction 24/36." + example_title: "Fraction Simplification" --- # Qwen3-0.6B-Gensyn-Swarm (tall_tame_panther) @@ -52,12 +67,12 @@ model-index: ## Model Overview -This model is a continuously trained Qwen3-0.6B fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Generalized Reward Policy Optimization)** for enhanced reasoning and mathematical capabilities. +This model is a continuously trained Qwen3-0.6B fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Generalized Reward Policy Optimization)** for enhanced reasoning and mathematical capabilities. **Note: Current training focuses on math/reasoning tasks**. **Agent ID:** `tall_tame_panther` -**Training Status:** 🔴 LIVE - Model updates automatically every 5-10 minutes -**Current Progress:** Round 43610+ / 1,000,000 -**Framework Version:** Gensyn RL-Swarm v0.4.2 +**Training Status:** 🟢 LIVE - Model updates automatically every 5-10 minutes +**Current Progress:** Round 43,610+ / 100,000 (43,61%) +**Framework Version:** Gensyn RL-Swarm v0.6.4 **Contract:** SwarmCoordinator v0.4.2 ## Key Features @@ -68,60 +83,24 @@ This model is a continuously trained Qwen3-0.6B fine-tuned using **Gensyn RL-Swa - **llama.cpp Compatible**: Ready for edge deployment and local inference - **BF16 Precision**: Trained with bfloat16 for optimal performance - **TGI Compatible**: Supports Text Generation Inference for production deployment -- **Conversational**: Can be used for interactive reasoning tasks +- **Chat Format Support**: Inherits Qwen3 chat template for conversational use ## Training Data -The model is trained on a composite dataset (1,000 samples) with weighted sampling strategy defined in `datasets.yaml`: +The model is trained on a composite dataset (1,000 samples) with weighted sampling strategy: -| Dataset | Weight | Samples | Focus Area | -|---------|--------|---------|------------| -| Propositional Logic | 7 | 500 | Logical reasoning, truth tables, Boolean operations | -| Calendar Arithmetic | 6 | 500 | Date calculations, leap years, recurring events | -| Decimal Arithmetic | 5 | 500 | Multi-term decimal operations with precision | -| Base Conversion | 4 | 500 | Number system conversions (base 2-16) | -| Fraction Simplification | 4 | 500 | GCD/LCM, fraction reduction | -| Basic Arithmetic | 2 | 500 | Foundation operations with parentheses | +| Dataset | Weight | Focus Area | +|---------|--------|------------| +| Propositional Logic | 7 | Logical reasoning, truth tables, Boolean operations | +| Calendar Arithmetic | 6 | Date calculations, leap years, recurring events | +| Decimal Arithmetic | 5 | Multi-term decimal operations with precision | +| Base Conversion | 4 | Number system conversions (base 2-16) | +| Fraction Simplification | 4 | GCD/LCM, fraction reduction | +| Basic Arithmetic | 2 | Foundation operations with parentheses | **Total Dataset Size:** 1,000 composite samples **Training Samples per Round:** 2 -**Evaluation Samples:** Real-time via swarm coordination - -### Dataset Configuration Details - -``` -# From rgym_exp/src/datasets.yaml -Propositional Logic: - - Variables: 2-4 - - Statements: 2-4 - - Complexity: 1-3 - -Calendar Arithmetic: - - Year: 2023 - - Offset: up to 100 days - - Leap year range: 200 years - - Tasks: count_days, weekday_of_date, is_leap_year, recurring_event_day - -Decimal Arithmetic: - - Terms: 2-6 - - Decimal places: 1-3 - - Precision: 5 - -Base Conversion: - - Base range: 2-16 - - Value range: 0-1000 - -Fraction Simplification: - - Value range: 1-100 - - Factor range: 2-100 - - Styles: plain, latex_frac, latex_dfrac - -Basic Arithmetic: - - Terms: 2-6 - - Digits: 1-4 - - Operators: +, -, *, / - - Parentheses: enabled -``` +**Evaluation:** Real-time via swarm coordination ## Quick Start @@ -137,13 +116,31 @@ model = AutoModelForCausalLM.from_pretrained( ) tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther") -# Example: Math reasoning prompt = "What is 3/4 simplified to lowest terms?" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_length=256, temperature=0.6, top_p=0.95) print(tokenizer.decode(outputs, skip_special_tokens=True)) ``` +### Chat Format (Conversational) + +``` +from transformers import AutoModelForCausalLM, AutoTokenizer + +model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther") +tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther") + +messages = [ + {"role": "system", "content": "You are a helpful math tutor."}, + {"role": "user", "content": "Explain how to simplify 24/36 step by step."} +] + +text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) +inputs = tokenizer(text, return_tensors="pt") +outputs = model.generate(**inputs, max_length=512) +print(tokenizer.decode(outputs)) +``` + ### Text Generation Inference (TGI) ``` @@ -197,205 +194,239 @@ ollama run qwen3-swarm "What is 15 multiplied by 23?" All GGUF formats are **llama.cpp compatible** and auto-updated hourly. +### GGUF Quantization Strategy + +The Q5_K_M format uses mixed precision for optimal quality: + +- **Token Embeddings**: Q6_K (high quality vocab representation) +- **Attention Weights**: Q5_K (balanced quality/size) +- **Feed-Forward**: Q5_K/Q6_K (mixed for optimal performance) +- **Layer Norms**: F32 (full precision for stability) + +This strategy ensures minimal quality loss while maintaining small file size. + +## Chat Format & Conversational Use + +This model inherits **Qwen3's chat template** for structured conversations. + +### Format Structure + +``` +<|im_start|>system +{system_message} +<|im_end|> +<|im_start|>user +{user_message} +<|im_end|> +<|im_start|>assistant +{assistant_response} +<|im_end|> +``` + +### Chat Template Features + +- **System Instructions**: Guide model behavior with system messages +- **Multi-turn Dialogue**: Maintains conversation context +- **Tool Calling**: Support function calling (if enabled in training) +- **Reasoning Mode**: `` tags for chain-of-thought (experimental) + +**Note**: While the model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on **math/reasoning tasks**. + ## Training Configuration ### Gensyn RL-Swarm Architecture -The model is trained using a decentralized reinforcement learning framework with the following components: - ``` -# From rgym_exp/config/rg-swarm.yaml - Training Framework: Method: GRPO (Generalized Reward Policy Optimization) Base Model: Qwen/Qwen3-0.6B Training Regime: bfloat16 mixed precision - Max Rounds: 1,000,000 - Max Stage: 1 + Max Rounds: 100,000 Update Frequency: Every 5-10 minutes Generations per Round: 2 - Transplant Trees: 1 Seed: 42 Blockchain Integration: Network: Gensyn Testnet Chain ID: 685685 - RPC: https://gensyn-testnet.g.alchemy.com/public Contract: SwarmCoordinator v0.4.2 - Modal Proxy: http://localhost:3000/api/ Swarm Communication: Framework: Hivemind P2P Backend Initial Peers: 3 bootnodes - Bootnodes: - - /ip4/38.101.215.12/tcp/30011/p2p/QmQ2gEXoPJg6iMBSUFWGzAabS2VhnzuS782Y637hGjfsRJ - - /ip4/38.101.215.13/tcp/30012/p2p/QmWhiaLrx3HRZfgXc2i7KW5nMUNK7P9tRc71yFJdGEZKkC - - /ip4/38.101.215.14/tcp/30013/p2p/QmQa1SCfYTxx7RvU7qJJRo79Zm1RAwPpkeLueDVJuBBmFp - Startup Timeout: 120s - Beam Size: 25 + Beam Size: 30 Reward System: Manager: DefaultRewardManager - Function Store: RoundRewardFnStore - Reward Function: RGRewards (Reasoning Gym Rewards) - Judge: Swarm Judge API (https://swarm-judge.internal-apps-central1.clusters.gensyn.ai) + Reward Function: RGRewards (Reasoning Gym) + Judge API: https://swarm-judge.internal-apps-central1.clusters.gensyn.ai ``` -### Training Hyperparameters +### Model Hyperparameters ``` -Model Architecture: +Architecture: Hidden Size: 1024 Intermediate Size: 3072 - Num Hidden Layers: 28 - Num Attention Heads: 16 - Num Key-Value Heads: 8 + Layers: 28 + Attention Heads: 16 + KV Heads: 8 Head Dimension: 128 - Max Position Embeddings: 40,960 - RMS Norm Epsilon: 1e-06 - Rope Theta: 1,000,000 - Vocabulary Size: 151,936 + Context Length: 40,960 tokens + Vocabulary: 151,936 tokens -GRPO Trainer Config: +GRPO Config: Epsilon: 0.2 Epsilon High: 0.28 - Generations: 2 Gradient Checkpointing: Enabled - Learning Rate: Adaptive -Generation Config: +Generation: Temperature: 0.6 Top-K: 20 Top-P: 0.95 - BOS Token: 151643 - EOS Token: 151645 - Pad Token: 151643 ``` ## Model Capabilities This model excels at: -1. **Logical Reasoning**: Propositional logic, truth evaluation, Boolean algebra, logical equivalences -2. **Mathematical Operations**: Multi-precision arithmetic, decimal calculations, fraction manipulation -3. **Number Systems**: Base conversion between binary, octal, decimal, hexadecimal -4. **Date/Time Calculations**: Calendar arithmetic, leap year detection, day-of-week calculations -5. **Step-by-step Problem Solving**: Chain-of-thought reasoning for complex multi-step tasks -6. **Conversational Math Tutoring**: Interactive problem-solving guidance +1. **Logical Reasoning**: Propositional logic, truth evaluation, Boolean algebra +2. **Mathematical Operations**: Multi-precision arithmetic, decimal calculations, fractions +3. **Number Systems**: Base conversion (binary, octal, decimal, hexadecimal) +4. **Date/Time Calculations**: Calendar arithmetic, leap years, day-of-week +5. **Step-by-step Problem Solving**: Chain-of-thought reasoning +6. **Conversational Tutoring**: Interactive problem-solving (via chat format) ## Limitations -- **Specialized Domain**: Optimized for reasoning/math tasks; may underperform on creative writing or general chat -- **Training in Progress**: Model weights update every 5-10 minutes; performance may vary between checkpoints -- **Scale**: 0.6B parameters - suitable for edge devices but not state-of-the-art for complex reasoning -- **Experimental**: Trained via decentralized RL swarm; behavior may be less predictable than supervised models -- **Context Length**: 40K tokens supported but best performance within 4K tokens +- **Specialized Domain**: Optimized for reasoning/math; may underperform on creative writing +- **Training in Progress**: Weights update every 5-10 minutes; performance varies +- **Scale**: 0.6B parameters - suitable for edge but not SOTA for complex reasoning +- **Experimental**: Decentralized RL training; behavior less predictable than supervised models +- **Context**: Best performance within 4K tokens (full 40K supported) ## Update Schedule -| Format | Update Frequency | Trigger | -|--------|------------------|---------| -| Safetensors (BF16) | Every 5-10 minutes | Automatic via RL-Swarm training | -| GGUF variants (all) | Every 1 hour | Automatic conversion from latest checkpoint | +| Format | Frequency | Trigger | +|--------|-----------|---------| +| Safetensors (BF16) | Every 5-10 min | Automatic via RL-Swarm | +| GGUF (all formats) | Every 1 hour | Auto-conversion pipeline | **Auto-Conversion Pipeline:** -- Monitors repo for new training commits -- Downloads latest `model.safetensors` -- Converts to F16 GGUF base -- Quantizes to Q3_K_M, Q4_K_M, Q5_K_M -- Uploads all formats to repo +1. Monitors repo for new training commits +2. Downloads latest `model.safetensors` +3. Converts to F16 GGUF base +4. Quantizes to Q3_K_M, Q4_K_M, Q5_K_M +5. Uploads all formats -Check commit history for exact timestamps of each update. +Check commit history for exact timestamps. ## Gensyn RL-Swarm Technical Details -This model is trained using [Gensyn RL-Swarm](https://gensyn.ai), a decentralized reinforcement learning framework: - ### Architecture Components -1. **Game Manager** (`rgym_exp/src/manager.py`): Orchestrates training rounds and swarm coordination -2. **Trainer** (`rgym_exp/src/trainer.py`): GRPO implementation for policy optimization -3. **Data Manager** (`rgym_exp/src/data.py`): Handles dataset loading and sampling -4. **Reward Manager** (`rgym_exp/src/rewards.py`): Computes rewards using judge API -5. **Coordinator** (`rgym_exp/src/coordinator.py`): Blockchain integration for swarm state -6. **Communication Backend**: Hivemind DHT for peer-to-peer model sharing +1. **Game Manager**: Orchestrates training rounds and swarm coordination +2. **Trainer**: GRPO implementation for policy optimization +3. **Data Manager**: Dataset loading and weighted sampling +4. **Reward Manager**: Computes rewards via judge API +5. **Coordinator**: Blockchain integration for swarm state +6. **P2P Backend**: Hivemind DHT for model sharing ### Training Process ``` 1. Agent joins swarm via P2P network -2. Coordinator assigns training round via smart contract +2. Coordinator assigns round via smart contract 3. Agent samples data from weighted datasets -4. Model generates responses (2 generations) -5. Judge API evaluates quality and assigns rewards +4. Model generates 2 responses +5. Judge API evaluates and assigns rewards 6. GRPO updates policy based on rewards -7. Updated model shared via DHT to swarm -8. Best model checkpoint saved to HuggingFace -9. Repeat for next round +7. Updated model shared via DHT +8. Best checkpoint saved to HuggingFace +9. Repeat ``` ### Decentralization Benefits -- **Fault Tolerance**: Multiple agents contribute; single node failure doesn't stop training +- **Fault Tolerance**: Multiple agents; no single point of failure - **Diverse Exploration**: Different agents explore different strategies -- **Collective Intelligence**: Agents learn from each other's experiences -- **Transparent Verification**: All training rounds verified on-chain +- **Collective Intelligence**: Agents learn from each other +- **Transparent**: All rounds verified on-chain **Swarm Agent:** `tall_tame_panther` -**Contract:** SwarmCoordinator v0.4.2 -**Testnet Explorer:** https://gensyn-testnet.explorer.com +**Contract:** SwarmCoordinator v0.4.2 ## Technical Specifications ### Software Stack -- **Training Framework**: Gensyn RL-Swarm v0.4.2 -- **Base Library**: transformers v4.51.3 -- **Communication**: hivemind (P2P backend) -- **Blockchain**: Web3.py (Gensyn testnet) -- **Configuration**: Hydra + OmegaConf +- **Framework**: Gensyn RL-Swarm v0.6.4 +- **Library**: transformers v4.51+ +- **P2P**: hivemind +- **Blockchain**: Gensyn testnet +- **Config**: Hydra + OmegaConf - **Logging**: WandB integration ### Hardware Requirements -**Training Node:** -- GPU: NVIDIA A100 40GB or equivalent (for BF16 training) -- RAM: 32GB+ system memory +**Training GPU:** +- GPU: NVIDIA 4090 24GB+ (BF16 training) +- RAM: 16GB+ +- Cores: 10+ - Storage: 50GB SSD -- Network: High bandwidth for P2P swarm communication +- Network: High bandwidth for P2P +**Training CPU Optimize:** +- CPU: INTEL or AMD +- Cores: 10+ +- RAM: 16GB+ +- Storage: 50GB SSD +- Network: High bandwidth for P2P + **Inference:** -- Safetensors: 8GB+ VRAM (GPU), 16GB+ RAM (CPU) -- GGUF Q4_K_M: 4GB RAM (CPU), 2GB VRAM (GPU) -- GGUF Q3_K_M: 3GB RAM (CPU-only compatible) +- Safetensors: 8GB VRAM (GPU) / 16GB RAM (CPU) +- GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU) +- GGUF Q3_K_M: 3GB RAM (CPU-only) + +## Evaluation + +### Training Progress Metrics + +| Metric | Value | Target | +|--------|-------|--------| +| Completed Rounds | 43,610+ | 100,000 | +| Training Progress | 43.61% | 100% | +| Update Frequency | 5-10 min | Continuous | + +**Note**: Formal evaluation benchmarks (GSM8K, MATH, etc.) will be added as training progresses. Current metrics track training rounds completed in the decentralized swarm. ## Reproducibility -To reproduce training results: +To reproduce training: 1. Clone Gensyn RL-Swarm repository -2. Install dependencies: `pip install -r requirements.txt` -3. Configure `rgym_exp/config/rg-swarm.yaml` with your settings -4. Set environment variables: - ``` - export HUGGINGFACE_ACCESS_TOKEN= - export MODEL_NAME=Qwen/Qwen3-0.6B - export ORG_ID= - export SWARM_CONTRACT= - ``` -5. Run: `bash run_rl_swarm.sh` +2. Install: `pip install -r requirements.txt` +3. Configure `rgym_exp/config/rg-swarm.yaml` +4. Configure `rgym_exp/src/datasets.yaml` +5. Set environment variables: +``` +export HUGGINGFACE_ACCESS_TOKEN= +export MODEL_NAME=Qwen/Qwen3-0.6B +export ORG_ID= +export SWARM_CONTRACT= +``` +6. Run: `bash run_rl_swarm.sh` -**Note:** Exact reproduction requires same seed (42), dataset configuration, and swarm coordination state. +**Note**: Exact reproduction requires same seed (42), dataset config, and swarm state. ## Citation ``` @misc{qwen3-gensyn-swarm-2025, - author = {0xgr3y}, + author = {0xgrey}, title = {Qwen3-0.6B-Gensyn-Swarm: Continuous RL Training on Distributed Swarm}, year = {2025}, publisher = {HuggingFace}, - journal = {HuggingFace Model Hub}, howpublished = {\url{https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther}}, note = {Agent ID: tall\_tame\_panther} } @@ -404,42 +435,33 @@ To reproduce training results: title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework}, author = {Gensyn AI}, year = {2025}, - url = {https://gensyn.ai}, - note = {SwarmCoordinator v0.4.2} -} - -@article{lacoste2019quantifying, - title={Quantifying the Carbon Emissions of Machine Learning}, - author={Lacoste, Alexandre and others}, - journal={arXiv preprint arXiv:1910.09700}, - year={2019} + url = {https://gensyn.ai} } ``` ## References -- **arXiv:1910.09700** - ML Carbon Emissions methodology -- **Gensyn Documentation**: https://docs.gensyn.ai +- **Gensyn Documentation**: https://docs.gensyn.ai/ +- **Gensyn GitHub**: https://github.com/gensyn-ai +- **RL-Swarm Contracts**: https://github.com/gensyn-ai/rl-swarm-contracts - **Qwen3 Model Card**: https://huggingface.co/Qwen/Qwen3-0.6B -- **Technical Report**: See `technical_report.pdf` in training repository +- **arXiv:1910.09700**: ML Carbon Emissions methodology ## License -Apache 2.0 - See [LICENSE](LICENSE) for details +Apache 2.0 - See [LICENSE](LICENSE) -## Contact & Support +## Contact -- **Developer**: 0xgr3y +- **Developer**: 0xgrey - **Agent ID**: tall_tame_panther -- **Issues**: Open an issue on this repo - **Community**: [Gensyn Discord](https://discord.gg/gensyn) --- -**⚠️ Important Note**: This is a continuously trained model. For reproducibility, always specify the exact commit hash: +**⚠️ Important**: This is a continuously trained model. For reproducibility, specify commit hash: ``` -# Download specific checkpoint git clone https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther cd Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther git checkout @@ -453,5 +475,4 @@ git checkout [![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-orange?style=for-the-badge)](https://gensyn.ai) - -``` \ No newline at end of file + \ No newline at end of file