Update: Detail opt-use

2025-11-10 08:08:36 +00:00
parent fe4f136df4
commit 3cb39a634f
1 changed files with 202 additions and 181 deletions
--- a/README.md
+++ b/README.md
@@ -17,7 +17,6 @@ tags:
 - gguf
 - conversational
 - text-generation-inference
- I am tall_tame_panther
 pipeline_tag: text-generation
 license: apache-2.0
 language:
@@ -31,16 +30,32 @@ datasets:
 - fraction_simplification
 - basic_arithmetic
 inference: true
-widget:
- text: What is 15 * 23?
-  example_title: Basic Arithmetic
- text: Convert decimal 255 to hexadecimal.
-  example_title: Base Conversion
- text: Simplify the fraction 24/36.
-  example_title: Fraction Simplification
 model-index:
 - name: Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther
-  results: []
+  results:
+  - task:
+      type: text-generation
+      name: Mathematical Reasoning
+    dataset:
+      name: Composite Reasoning Dataset
+      type: custom
+    metrics:
+    - type: training_rounds
+      value: 43610
+      name: Completed Training Rounds
+    - type: total_rounds
+      value: 100000
+      name: Target Rounds
+    - type: progress
+      value: 43.61
+      name: Training Progress (%)
+widget:
+- text: "What is 15 * 23?"
+  example_title: "Basic Arithmetic"
+- text: "Convert decimal 255 to hexadecimal."
+  example_title: "Base Conversion"
+- text: "Simplify the fraction 24/36."
+  example_title: "Fraction Simplification"
 ---

 # Qwen3-0.6B-Gensyn-Swarm (tall_tame_panther)
@@ -52,12 +67,12 @@ model-index:

 ## Model Overview

-This model is a continuously trained Qwen3-0.6B fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Generalized Reward Policy Optimization)** for enhanced reasoning and mathematical capabilities.
+This model is a continuously trained Qwen3-0.6B fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Generalized Reward Policy Optimization)** for enhanced reasoning and mathematical capabilities. **Note: Current training focuses on math/reasoning tasks**.

 **Agent ID:** `tall_tame_panther`  
-**Training Status:** 🔴 LIVE - Model updates automatically every 5-10 minutes  
-**Current Progress:** Round 43610+ / 1,000,000  
-**Framework Version:** Gensyn RL-Swarm v0.4.2  
+**Training Status:** 🟢 LIVE - Model updates automatically every 5-10 minutes  
+**Current Progress:** Round 43,610+ / 100,000 (43,61%)  
+**Framework Version:** Gensyn RL-Swarm v0.6.4 
 **Contract:** SwarmCoordinator v0.4.2

 ## Key Features
@@ -68,60 +83,24 @@ This model is a continuously trained Qwen3-0.6B fine-tuned using **Gensyn RL-Swa
 - **llama.cpp Compatible**: Ready for edge deployment and local inference
 - **BF16 Precision**: Trained with bfloat16 for optimal performance
 - **TGI Compatible**: Supports Text Generation Inference for production deployment
- **Conversational**: Can be used for interactive reasoning tasks
+- **Chat Format Support**: Inherits Qwen3 chat template for conversational use

 ## Training Data

-The model is trained on a composite dataset (1,000 samples) with weighted sampling strategy defined in `datasets.yaml`:
+The model is trained on a composite dataset (1,000 samples) with weighted sampling strategy:

-| Dataset | Weight | Samples | Focus Area |
-|---------|--------|---------|------------|
-| Propositional Logic | 7 | 500 | Logical reasoning, truth tables, Boolean operations |
-| Calendar Arithmetic | 6 | 500 | Date calculations, leap years, recurring events |
-| Decimal Arithmetic | 5 | 500 | Multi-term decimal operations with precision |
-| Base Conversion | 4 | 500 | Number system conversions (base 2-16) |
-| Fraction Simplification | 4 | 500 | GCD/LCM, fraction reduction |
-| Basic Arithmetic | 2 | 500 | Foundation operations with parentheses |
+| Dataset | Weight | Focus Area |
+|---------|--------|------------|
+| Propositional Logic | 7 | Logical reasoning, truth tables, Boolean operations |
+| Calendar Arithmetic | 6 | Date calculations, leap years, recurring events |
+| Decimal Arithmetic | 5 | Multi-term decimal operations with precision |
+| Base Conversion | 4 | Number system conversions (base 2-16) |
+| Fraction Simplification | 4 | GCD/LCM, fraction reduction |
+| Basic Arithmetic | 2 | Foundation operations with parentheses |

 **Total Dataset Size:** 1,000 composite samples  
 **Training Samples per Round:** 2  
-**Evaluation Samples:** Real-time via swarm coordination
-
-### Dataset Configuration Details
-
-```
-# From rgym_exp/src/datasets.yaml
-Propositional Logic:
-  - Variables: 2-4
-  - Statements: 2-4
-  - Complexity: 1-3
-
-Calendar Arithmetic:
-  - Year: 2023
-  - Offset: up to 100 days
-  - Leap year range: 200 years
-  - Tasks: count_days, weekday_of_date, is_leap_year, recurring_event_day
-
-Decimal Arithmetic:
-  - Terms: 2-6
-  - Decimal places: 1-3
-  - Precision: 5
-
-Base Conversion:
-  - Base range: 2-16
-  - Value range: 0-1000
-
-Fraction Simplification:
-  - Value range: 1-100
-  - Factor range: 2-100
-  - Styles: plain, latex_frac, latex_dfrac
-
-Basic Arithmetic:
-  - Terms: 2-6
-  - Digits: 1-4
-  - Operators: +, -, *, /
-  - Parentheses: enabled
-```
+**Evaluation:** Real-time via swarm coordination

 ## Quick Start

@@ -137,13 +116,31 @@ model = AutoModelForCausalLM.from_pretrained(
 )
 tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther")

-# Example: Math reasoning
 prompt = "What is 3/4 simplified to lowest terms?"
 inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 outputs = model.generate(**inputs, max_length=256, temperature=0.6, top_p=0.95)
 print(tokenizer.decode(outputs, skip_special_tokens=True))
 ```

+### Chat Format (Conversational)
+
+```
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther")
+tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther")
+
+messages = [
+    {"role": "system", "content": "You are a helpful math tutor."},
+    {"role": "user", "content": "Explain how to simplify 24/36 step by step."}
+]
+
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=512)
+print(tokenizer.decode(outputs))
+```
+
 ### Text Generation Inference (TGI)

 ```
@@ -197,205 +194,239 @@ ollama run qwen3-swarm "What is 15 multiplied by 23?"

 All GGUF formats are **llama.cpp compatible** and auto-updated hourly.

+### GGUF Quantization Strategy
+
+The Q5_K_M format uses mixed precision for optimal quality:
+
+- **Token Embeddings**: Q6_K (high quality vocab representation)
+- **Attention Weights**: Q5_K (balanced quality/size)
+- **Feed-Forward**: Q5_K/Q6_K (mixed for optimal performance)
+- **Layer Norms**: F32 (full precision for stability)
+
+This strategy ensures minimal quality loss while maintaining small file size.
+
+## Chat Format & Conversational Use
+
+This model inherits **Qwen3's chat template** for structured conversations.
+
+### Format Structure
+
+```
+<|im_start|>system
+{system_message}
+<|im_end|>
+<|im_start|>user
+{user_message}
+<|im_end|>
+<|im_start|>assistant
+{assistant_response}
+<|im_end|>
+```
+
+### Chat Template Features
+
+- **System Instructions**: Guide model behavior with system messages
+- **Multi-turn Dialogue**: Maintains conversation context
+- **Tool Calling**: Support function calling (if enabled in training)
+- **Reasoning Mode**: `<think>` tags for chain-of-thought (experimental)
+
+**Note**: While the model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on **math/reasoning tasks**.
+
 ## Training Configuration

 ### Gensyn RL-Swarm Architecture

-The model is trained using a decentralized reinforcement learning framework with the following components:
-
 ```
-# From rgym_exp/config/rg-swarm.yaml
-
 Training Framework:
  Method: GRPO (Generalized Reward Policy Optimization)
  Base Model: Qwen/Qwen3-0.6B
  Training Regime: bfloat16 mixed precision
-  Max Rounds: 1,000,000
-  Max Stage: 1
+  Max Rounds: 100,000
  Update Frequency: Every 5-10 minutes
  Generations per Round: 2
-  Transplant Trees: 1
  Seed: 42

 Blockchain Integration:
  Network: Gensyn Testnet
  Chain ID: 685685
-  RPC: https://gensyn-testnet.g.alchemy.com/public
  Contract: SwarmCoordinator v0.4.2
-  Modal Proxy: http://localhost:3000/api/

 Swarm Communication:
  Framework: Hivemind P2P Backend
  Initial Peers: 3 bootnodes
-  Bootnodes:
-    - /ip4/38.101.215.12/tcp/30011/p2p/QmQ2gEXoPJg6iMBSUFWGzAabS2VhnzuS782Y637hGjfsRJ
-    - /ip4/38.101.215.13/tcp/30012/p2p/QmWhiaLrx3HRZfgXc2i7KW5nMUNK7P9tRc71yFJdGEZKkC
-    - /ip4/38.101.215.14/tcp/30013/p2p/QmQa1SCfYTxx7RvU7qJJRo79Zm1RAwPpkeLueDVJuBBmFp
-  Startup Timeout: 120s
-  Beam Size: 25
+  Beam Size: 30

 Reward System:
  Manager: DefaultRewardManager
-  Function Store: RoundRewardFnStore
-  Reward Function: RGRewards (Reasoning Gym Rewards)
-  Judge: Swarm Judge API (https://swarm-judge.internal-apps-central1.clusters.gensyn.ai)
+  Reward Function: RGRewards (Reasoning Gym)
+  Judge API: https://swarm-judge.internal-apps-central1.clusters.gensyn.ai
 ```

-### Training Hyperparameters
+### Model Hyperparameters

 ```
-Model Architecture:
+Architecture:
  Hidden Size: 1024
  Intermediate Size: 3072
-  Num Hidden Layers: 28
-  Num Attention Heads: 16
-  Num Key-Value Heads: 8
+  Layers: 28
+  Attention Heads: 16
+  KV Heads: 8
  Head Dimension: 128
-  Max Position Embeddings: 40,960
-  RMS Norm Epsilon: 1e-06
-  Rope Theta: 1,000,000
-  Vocabulary Size: 151,936
+  Context Length: 40,960 tokens
+  Vocabulary: 151,936 tokens

-GRPO Trainer Config:
+GRPO Config:
  Epsilon: 0.2
  Epsilon High: 0.28
-  Generations: 2
  Gradient Checkpointing: Enabled
-  Learning Rate: Adaptive
  
-Generation Config:
+Generation:
  Temperature: 0.6
  Top-K: 20
  Top-P: 0.95
-  BOS Token: 151643
-  EOS Token: 151645
-  Pad Token: 151643
 ```

 ## Model Capabilities

 This model excels at:

-1. **Logical Reasoning**: Propositional logic, truth evaluation, Boolean algebra, logical equivalences
-2. **Mathematical Operations**: Multi-precision arithmetic, decimal calculations, fraction manipulation
-3. **Number Systems**: Base conversion between binary, octal, decimal, hexadecimal
-4. **Date/Time Calculations**: Calendar arithmetic, leap year detection, day-of-week calculations
-5. **Step-by-step Problem Solving**: Chain-of-thought reasoning for complex multi-step tasks
-6. **Conversational Math Tutoring**: Interactive problem-solving guidance
+1. **Logical Reasoning**: Propositional logic, truth evaluation, Boolean algebra
+2. **Mathematical Operations**: Multi-precision arithmetic, decimal calculations, fractions
+3. **Number Systems**: Base conversion (binary, octal, decimal, hexadecimal)
+4. **Date/Time Calculations**: Calendar arithmetic, leap years, day-of-week
+5. **Step-by-step Problem Solving**: Chain-of-thought reasoning
+6. **Conversational Tutoring**: Interactive problem-solving (via chat format)

 ## Limitations

- **Specialized Domain**: Optimized for reasoning/math tasks; may underperform on creative writing or general chat
- **Training in Progress**: Model weights update every 5-10 minutes; performance may vary between checkpoints
- **Scale**: 0.6B parameters - suitable for edge devices but not state-of-the-art for complex reasoning
- **Experimental**: Trained via decentralized RL swarm; behavior may be less predictable than supervised models
- **Context Length**: 40K tokens supported but best performance within 4K tokens
+- **Specialized Domain**: Optimized for reasoning/math; may underperform on creative writing
+- **Training in Progress**: Weights update every 5-10 minutes; performance varies
+- **Scale**: 0.6B parameters - suitable for edge but not SOTA for complex reasoning
+- **Experimental**: Decentralized RL training; behavior less predictable than supervised models
+- **Context**: Best performance within 4K tokens (full 40K supported)

 ## Update Schedule

-| Format | Update Frequency | Trigger |
-|--------|------------------|---------|
-| Safetensors (BF16) | Every 5-10 minutes | Automatic via RL-Swarm training |
-| GGUF variants (all) | Every 1 hour | Automatic conversion from latest checkpoint |
+| Format | Frequency | Trigger |
+|--------|-----------|---------|
+| Safetensors (BF16) | Every 5-10 min | Automatic via RL-Swarm |
+| GGUF (all formats) | Every 1 hour | Auto-conversion pipeline |

 **Auto-Conversion Pipeline:**
- Monitors repo for new training commits
- Downloads latest `model.safetensors`
- Converts to F16 GGUF base
- Quantizes to Q3_K_M, Q4_K_M, Q5_K_M
- Uploads all formats to repo
+1. Monitors repo for new training commits
+2. Downloads latest `model.safetensors`
+3. Converts to F16 GGUF base
+4. Quantizes to Q3_K_M, Q4_K_M, Q5_K_M
+5. Uploads all formats

-Check commit history for exact timestamps of each update.
+Check commit history for exact timestamps.

 ## Gensyn RL-Swarm Technical Details

-This model is trained using [Gensyn RL-Swarm](https://gensyn.ai), a decentralized reinforcement learning framework:
-
 ### Architecture Components

-1. **Game Manager** (`rgym_exp/src/manager.py`): Orchestrates training rounds and swarm coordination
-2. **Trainer** (`rgym_exp/src/trainer.py`): GRPO implementation for policy optimization
-3. **Data Manager** (`rgym_exp/src/data.py`): Handles dataset loading and sampling
-4. **Reward Manager** (`rgym_exp/src/rewards.py`): Computes rewards using judge API
-5. **Coordinator** (`rgym_exp/src/coordinator.py`): Blockchain integration for swarm state
-6. **Communication Backend**: Hivemind DHT for peer-to-peer model sharing
+1. **Game Manager**: Orchestrates training rounds and swarm coordination
+2. **Trainer**: GRPO implementation for policy optimization
+3. **Data Manager**: Dataset loading and weighted sampling
+4. **Reward Manager**: Computes rewards via judge API
+5. **Coordinator**: Blockchain integration for swarm state
+6. **P2P Backend**: Hivemind DHT for model sharing

 ### Training Process

 ```
 1. Agent joins swarm via P2P network
-2. Coordinator assigns training round via smart contract
+2. Coordinator assigns round via smart contract
 3. Agent samples data from weighted datasets
-4. Model generates responses (2 generations)
-5. Judge API evaluates quality and assigns rewards
+4. Model generates 2 responses
+5. Judge API evaluates and assigns rewards
 6. GRPO updates policy based on rewards
-7. Updated model shared via DHT to swarm
-8. Best model checkpoint saved to HuggingFace
-9. Repeat for next round
+7. Updated model shared via DHT
+8. Best checkpoint saved to HuggingFace
+9. Repeat
 ```

 ### Decentralization Benefits

- **Fault Tolerance**: Multiple agents contribute; single node failure doesn't stop training
+- **Fault Tolerance**: Multiple agents; no single point of failure
 - **Diverse Exploration**: Different agents explore different strategies
- **Collective Intelligence**: Agents learn from each other's experiences
- **Transparent Verification**: All training rounds verified on-chain
+- **Collective Intelligence**: Agents learn from each other
+- **Transparent**: All rounds verified on-chain

 **Swarm Agent:** `tall_tame_panther`  
-**Contract:** SwarmCoordinator v0.4.2  
-**Testnet Explorer:** https://gensyn-testnet.explorer.com
+**Contract:** SwarmCoordinator v0.4.2

 ## Technical Specifications

 ### Software Stack

- **Training Framework**: Gensyn RL-Swarm v0.4.2
- **Base Library**: transformers v4.51.3
- **Communication**: hivemind (P2P backend)
- **Blockchain**: Web3.py (Gensyn testnet)
- **Configuration**: Hydra + OmegaConf
+- **Framework**: Gensyn RL-Swarm v0.6.4
+- **Library**: transformers v4.51+
+- **P2P**: hivemind
+- **Blockchain**: Gensyn testnet
+- **Config**: Hydra + OmegaConf
 - **Logging**: WandB integration

 ### Hardware Requirements

-**Training Node:**
- GPU: NVIDIA A100 40GB or equivalent (for BF16 training)
- RAM: 32GB+ system memory
+**Training GPU:**
+- GPU: NVIDIA 4090 24GB+ (BF16 training)
+- RAM: 16GB+
+- Cores: 10+
 - Storage: 50GB SSD
- Network: High bandwidth for P2P swarm communication
+- Network: High bandwidth for P2P

+**Training CPU Optimize:**
+- CPU: INTEL or AMD
+- Cores: 10+
+- RAM: 16GB+
+- Storage: 50GB SSD
+- Network: High bandwidth for P2P
+ 
 **Inference:**
- Safetensors: 8GB+ VRAM (GPU), 16GB+ RAM (CPU)
- GGUF Q4_K_M: 4GB RAM (CPU), 2GB VRAM (GPU)
- GGUF Q3_K_M: 3GB RAM (CPU-only compatible)
+- Safetensors: 8GB VRAM (GPU) / 16GB RAM (CPU)
+- GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU)
+- GGUF Q3_K_M: 3GB RAM (CPU-only)
+
+## Evaluation
+
+### Training Progress Metrics
+
+| Metric | Value | Target |
+|--------|-------|--------|
+| Completed Rounds | 43,610+ | 100,000 |
+| Training Progress | 43.61% | 100% |
+| Update Frequency | 5-10 min | Continuous |
+
+**Note**: Formal evaluation benchmarks (GSM8K, MATH, etc.) will be added as training progresses. Current metrics track training rounds completed in the decentralized swarm.

 ## Reproducibility

-To reproduce training results:
+To reproduce training:

 1. Clone Gensyn RL-Swarm repository
-2. Install dependencies: `pip install -r requirements.txt`
-3. Configure `rgym_exp/config/rg-swarm.yaml` with your settings
-4. Set environment variables:
-   ```
-   export HUGGINGFACE_ACCESS_TOKEN=<your-token>
-   export MODEL_NAME=Qwen/Qwen3-0.6B
-   export ORG_ID=<your-org-id>
-   export SWARM_CONTRACT=<contract-address>
-   ```
-5. Run: `bash run_rl_swarm.sh`
+2. Install: `pip install -r requirements.txt`
+3. Configure `rgym_exp/config/rg-swarm.yaml`
+4. Configure `rgym_exp/src/datasets.yaml`
+5. Set environment variables:
+```
+export HUGGINGFACE_ACCESS_TOKEN=<token>
+export MODEL_NAME=Qwen/Qwen3-0.6B
+export ORG_ID=<org-id>
+export SWARM_CONTRACT=<contract-address>
+```
+6. Run: `bash run_rl_swarm.sh`

-**Note:** Exact reproduction requires same seed (42), dataset configuration, and swarm coordination state.
+**Note**: Exact reproduction requires same seed (42), dataset config, and swarm state.

 ## Citation

 ```
@misc{qwen3-gensyn-swarm-2025,
-  author = {0xgr3y},
+  author = {0xgrey},
  title = {Qwen3-0.6B-Gensyn-Swarm: Continuous RL Training on Distributed Swarm},
  year = {2025},
  publisher = {HuggingFace},
-  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther}},
  note = {Agent ID: tall\_tame\_panther}
 }
@@ -404,42 +435,33 @@ To reproduce training results:
  title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework},
  author = {Gensyn AI},
  year = {2025},
-  url = {https://gensyn.ai},
-  note = {SwarmCoordinator v0.4.2}
-}
-
-@article{lacoste2019quantifying,
-  title={Quantifying the Carbon Emissions of Machine Learning},
-  author={Lacoste, Alexandre and others},
-  journal={arXiv preprint arXiv:1910.09700},
-  year={2019}
+  url = {https://gensyn.ai}
 }
 ```

 ## References

- **arXiv:1910.09700** - ML Carbon Emissions methodology
- **Gensyn Documentation**: https://docs.gensyn.ai
+- **Gensyn Documentation**: https://docs.gensyn.ai/
+- **Gensyn GitHub**: https://github.com/gensyn-ai
+- **RL-Swarm Contracts**: https://github.com/gensyn-ai/rl-swarm-contracts
 - **Qwen3 Model Card**: https://huggingface.co/Qwen/Qwen3-0.6B
- **Technical Report**: See `technical_report.pdf` in training repository
+- **arXiv:1910.09700**: ML Carbon Emissions methodology

 ## License

-Apache 2.0 - See [LICENSE](LICENSE) for details
+Apache 2.0 - See [LICENSE](LICENSE)

-## Contact & Support
+## Contact

- **Developer**: 0xgr3y
+- **Developer**: 0xgrey
 - **Agent ID**: tall_tame_panther
- **Issues**: Open an issue on this repo
 - **Community**: [Gensyn Discord](https://discord.gg/gensyn)

 ---

-**⚠️ Important Note**: This is a continuously trained model. For reproducibility, always specify the exact commit hash:
+**⚠️ Important**: This is a continuously trained model. For reproducibility, specify commit hash:

 ```
-# Download specific checkpoint
 git clone https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther
 cd Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther
 git checkout <commit-hash>
@@ -453,5 +475,4 @@ git checkout <commit-hash>

 [![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-orange?style=for-the-badge)](https://gensyn.ai)

-</div>
-```
+</div>