Files
Qwen3-0.6B-Gensyn-Swarm-tal…/README.md

14 KiB

library_name, tags, pipeline_tag, license, language, base_model, datasets, inference, widget, model-index
library_name tags pipeline_tag license language base_model datasets inference widget model-index
transformers
text-generation
qwen3
rl-swarm
genrl-swarm
grpo
gensyn
trl
reasoning
math
logic
continuous-training
reinforcement-learning
safetensors
gguf
conversational
text-generation-inference
I am tall_tame_panther
text-generation apache-2.0
en
Qwen/Qwen3-0.6B
propositional_logic
calendar_arithmetic
decimal_arithmetic
base_conversion
fraction_simplification
basic_arithmetic
true
text example_title
What is 15 * 23? Basic Arithmetic
text example_title
Convert decimal 255 to hexadecimal. Base Conversion
text example_title
Simplify the fraction 24/36. Fraction Simplification
name results
Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther

Qwen3-0.6B-Gensyn-Swarm (tall_tame_panther)

Model GGUF Gensyn License

Model Overview

This model is a continuously trained Qwen3-0.6B fine-tuned using Gensyn RL-Swarm framework with GRPO (Generalized Reward Policy Optimization) for enhanced reasoning and mathematical capabilities.

Agent ID: tall_tame_panther
Training Status: 🔴 LIVE - Model updates automatically every 5-10 minutes
Current Progress: Round 43610+ / 1,000,000
Framework Version: Gensyn RL-Swarm v0.4.2
Contract: SwarmCoordinator v0.4.2

Key Features

  • Real-time Training: Continuous learning with distributed RL across Gensyn swarm network
  • Multi-domain Reasoning: Trained on logic, arithmetic, and mathematical problem-solving
  • GGUF Support: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M)
  • llama.cpp Compatible: Ready for edge deployment and local inference
  • BF16 Precision: Trained with bfloat16 for optimal performance
  • TGI Compatible: Supports Text Generation Inference for production deployment
  • Conversational: Can be used for interactive reasoning tasks

Training Data

The model is trained on a composite dataset (1,000 samples) with weighted sampling strategy defined in datasets.yaml:

Dataset Weight Samples Focus Area
Propositional Logic 7 500 Logical reasoning, truth tables, Boolean operations
Calendar Arithmetic 6 500 Date calculations, leap years, recurring events
Decimal Arithmetic 5 500 Multi-term decimal operations with precision
Base Conversion 4 500 Number system conversions (base 2-16)
Fraction Simplification 4 500 GCD/LCM, fraction reduction
Basic Arithmetic 2 500 Foundation operations with parentheses

Total Dataset Size: 1,000 composite samples
Training Samples per Round: 2
Evaluation Samples: Real-time via swarm coordination

Dataset Configuration Details

# From rgym_exp/src/datasets.yaml
Propositional Logic:
  - Variables: 2-4
  - Statements: 2-4
  - Complexity: 1-3

Calendar Arithmetic:
  - Year: 2023
  - Offset: up to 100 days
  - Leap year range: 200 years
  - Tasks: count_days, weekday_of_date, is_leap_year, recurring_event_day

Decimal Arithmetic:
  - Terms: 2-6
  - Decimal places: 1-3
  - Precision: 5

Base Conversion:
  - Base range: 2-16
  - Value range: 0-1000

Fraction Simplification:
  - Value range: 1-100
  - Factor range: 2-100
  - Styles: plain, latex_frac, latex_dfrac

Basic Arithmetic:
  - Terms: 2-6
  - Digits: 1-4
  - Operators: +, -, *, /
  - Parentheses: enabled

Quick Start

Standard Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther")

# Example: Math reasoning
prompt = "What is 3/4 simplified to lowest terms?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=256, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs, skip_special_tokens=True))

Text Generation Inference (TGI)

docker run -d --gpus all \
  -p 8080:80 \
  -v $PWD/data:/data \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id 0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther \
  --max-input-length 4096 \
  --max-total-tokens 8192

GGUF with llama.cpp

# Download quantized model (recommended: Q4_K_M)
wget https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen3-0.6B-Gensyn-Swarm-Q4_K_M.gguf

# Run inference
./llama-cli -m Qwen3-0.6B-Gensyn-Swarm-Q4_K_M.gguf \
  -p "Solve: (5 + 3) * 2 = ?" \
  --temp 0.6 --top-p 0.95

Ollama

# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./Qwen3-0.6B-Gensyn-Swarm-Q4_K_M.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
SYSTEM "You are a helpful assistant specialized in mathematical reasoning and logic."
EOF

# Create and run
ollama create qwen3-swarm -f Modelfile
ollama run qwen3-swarm "What is 15 multiplied by 23?"

Available Formats

Format Size Precision Use Case Download
Safetensors (BF16) 1.19 GB BF16 Full precision training/fine-tuning model.safetensors
GGUF F16 1.14 GB FP16 High quality inference Qwen3-0.6B-Gensyn-Swarm-F16.gguf
GGUF Q5_K_M 444 MB 5-bit Balanced quality/size Qwen3-0.6B-Gensyn-Swarm-Q5_K_M.gguf
GGUF Q4_K_M 397 MB 4-bit Recommended for production Qwen3-0.6B-Gensyn-Swarm-Q4_K_M.gguf
GGUF Q3_K_M 347 MB 3-bit Smallest, fastest Qwen3-0.6B-Gensyn-Swarm-Q3_K_M.gguf

All GGUF formats are llama.cpp compatible and auto-updated hourly.

Training Configuration

Gensyn RL-Swarm Architecture

The model is trained using a decentralized reinforcement learning framework with the following components:

# From rgym_exp/config/rg-swarm.yaml

Training Framework:
  Method: GRPO (Generalized Reward Policy Optimization)
  Base Model: Qwen/Qwen3-0.6B
  Training Regime: bfloat16 mixed precision
  Max Rounds: 1,000,000
  Max Stage: 1
  Update Frequency: Every 5-10 minutes
  Generations per Round: 2
  Transplant Trees: 1
  Seed: 42

Blockchain Integration:
  Network: Gensyn Testnet
  Chain ID: 685685
  RPC: https://gensyn-testnet.g.alchemy.com/public
  Contract: SwarmCoordinator v0.4.2
  Modal Proxy: http://localhost:3000/api/

Swarm Communication:
  Framework: Hivemind P2P Backend
  Initial Peers: 3 bootnodes
  Bootnodes:
    - /ip4/38.101.215.12/tcp/30011/p2p/QmQ2gEXoPJg6iMBSUFWGzAabS2VhnzuS782Y637hGjfsRJ
    - /ip4/38.101.215.13/tcp/30012/p2p/QmWhiaLrx3HRZfgXc2i7KW5nMUNK7P9tRc71yFJdGEZKkC
    - /ip4/38.101.215.14/tcp/30013/p2p/QmQa1SCfYTxx7RvU7qJJRo79Zm1RAwPpkeLueDVJuBBmFp
  Startup Timeout: 120s
  Beam Size: 25

Reward System:
  Manager: DefaultRewardManager
  Function Store: RoundRewardFnStore
  Reward Function: RGRewards (Reasoning Gym Rewards)
  Judge: Swarm Judge API (https://swarm-judge.internal-apps-central1.clusters.gensyn.ai)

Training Hyperparameters

Model Architecture:
  Hidden Size: 1024
  Intermediate Size: 3072
  Num Hidden Layers: 28
  Num Attention Heads: 16
  Num Key-Value Heads: 8
  Head Dimension: 128
  Max Position Embeddings: 40,960
  RMS Norm Epsilon: 1e-06
  Rope Theta: 1,000,000
  Vocabulary Size: 151,936

GRPO Trainer Config:
  Epsilon: 0.2
  Epsilon High: 0.28
  Generations: 2
  Gradient Checkpointing: Enabled
  Learning Rate: Adaptive
  
Generation Config:
  Temperature: 0.6
  Top-K: 20
  Top-P: 0.95
  BOS Token: 151643
  EOS Token: 151645
  Pad Token: 151643

Model Capabilities

This model excels at:

  1. Logical Reasoning: Propositional logic, truth evaluation, Boolean algebra, logical equivalences
  2. Mathematical Operations: Multi-precision arithmetic, decimal calculations, fraction manipulation
  3. Number Systems: Base conversion between binary, octal, decimal, hexadecimal
  4. Date/Time Calculations: Calendar arithmetic, leap year detection, day-of-week calculations
  5. Step-by-step Problem Solving: Chain-of-thought reasoning for complex multi-step tasks
  6. Conversational Math Tutoring: Interactive problem-solving guidance

Limitations

  • Specialized Domain: Optimized for reasoning/math tasks; may underperform on creative writing or general chat
  • Training in Progress: Model weights update every 5-10 minutes; performance may vary between checkpoints
  • Scale: 0.6B parameters - suitable for edge devices but not state-of-the-art for complex reasoning
  • Experimental: Trained via decentralized RL swarm; behavior may be less predictable than supervised models
  • Context Length: 40K tokens supported but best performance within 4K tokens

Update Schedule

Format Update Frequency Trigger
Safetensors (BF16) Every 5-10 minutes Automatic via RL-Swarm training
GGUF variants (all) Every 1 hour Automatic conversion from latest checkpoint

Auto-Conversion Pipeline:

  • Monitors repo for new training commits
  • Downloads latest model.safetensors
  • Converts to F16 GGUF base
  • Quantizes to Q3_K_M, Q4_K_M, Q5_K_M
  • Uploads all formats to repo

Check commit history for exact timestamps of each update.

Gensyn RL-Swarm Technical Details

This model is trained using Gensyn RL-Swarm, a decentralized reinforcement learning framework:

Architecture Components

  1. Game Manager (rgym_exp/src/manager.py): Orchestrates training rounds and swarm coordination
  2. Trainer (rgym_exp/src/trainer.py): GRPO implementation for policy optimization
  3. Data Manager (rgym_exp/src/data.py): Handles dataset loading and sampling
  4. Reward Manager (rgym_exp/src/rewards.py): Computes rewards using judge API
  5. Coordinator (rgym_exp/src/coordinator.py): Blockchain integration for swarm state
  6. Communication Backend: Hivemind DHT for peer-to-peer model sharing

Training Process

1. Agent joins swarm via P2P network
2. Coordinator assigns training round via smart contract
3. Agent samples data from weighted datasets
4. Model generates responses (2 generations)
5. Judge API evaluates quality and assigns rewards
6. GRPO updates policy based on rewards
7. Updated model shared via DHT to swarm
8. Best model checkpoint saved to HuggingFace
9. Repeat for next round

Decentralization Benefits

  • Fault Tolerance: Multiple agents contribute; single node failure doesn't stop training
  • Diverse Exploration: Different agents explore different strategies
  • Collective Intelligence: Agents learn from each other's experiences
  • Transparent Verification: All training rounds verified on-chain

Swarm Agent: tall_tame_panther
Contract: SwarmCoordinator v0.4.2
Testnet Explorer: https://gensyn-testnet.explorer.com

Technical Specifications

Software Stack

  • Training Framework: Gensyn RL-Swarm v0.4.2
  • Base Library: transformers v4.51.3
  • Communication: hivemind (P2P backend)
  • Blockchain: Web3.py (Gensyn testnet)
  • Configuration: Hydra + OmegaConf
  • Logging: WandB integration

Hardware Requirements

Training Node:

  • GPU: NVIDIA A100 40GB or equivalent (for BF16 training)
  • RAM: 32GB+ system memory
  • Storage: 50GB SSD
  • Network: High bandwidth for P2P swarm communication

Inference:

  • Safetensors: 8GB+ VRAM (GPU), 16GB+ RAM (CPU)
  • GGUF Q4_K_M: 4GB RAM (CPU), 2GB VRAM (GPU)
  • GGUF Q3_K_M: 3GB RAM (CPU-only compatible)

Reproducibility

To reproduce training results:

  1. Clone Gensyn RL-Swarm repository
  2. Install dependencies: pip install -r requirements.txt
  3. Configure rgym_exp/config/rg-swarm.yaml with your settings
  4. Set environment variables:
    export HUGGINGFACE_ACCESS_TOKEN=<your-token>
    export MODEL_NAME=Qwen/Qwen3-0.6B
    export ORG_ID=<your-org-id>
    export SWARM_CONTRACT=<contract-address>
    
  5. Run: bash run_rl_swarm.sh

Note: Exact reproduction requires same seed (42), dataset configuration, and swarm coordination state.

Citation

@misc{qwen3-gensyn-swarm-2025,
  author = {0xgr3y},
  title = {Qwen3-0.6B-Gensyn-Swarm: Continuous RL Training on Distributed Swarm},
  year = {2025},
  publisher = {HuggingFace},
  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther}},
  note = {Agent ID: tall\_tame\_panther}
}

@misc{gensyn-rl-swarm-2025,
  title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework},
  author = {Gensyn AI},
  year = {2025},
  url = {https://gensyn.ai},
  note = {SwarmCoordinator v0.4.2}
}

@article{lacoste2019quantifying,
  title={Quantifying the Carbon Emissions of Machine Learning},
  author={Lacoste, Alexandre and others},
  journal={arXiv preprint arXiv:1910.09700},
  year={2019}
}

References

License

Apache 2.0 - See LICENSE for details

Contact & Support

  • Developer: 0xgr3y
  • Agent ID: tall_tame_panther
  • Issues: Open an issue on this repo
  • Community: Gensyn Discord

⚠️ Important Note: This is a continuously trained model. For reproducibility, always specify the exact commit hash:

# Download specific checkpoint
git clone https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther
cd Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther
git checkout <commit-hash>

🤖 Trained with ❤️ using Gensyn RL-Swarm

Gensyn

```