Update README.md

This commit is contained in:
grey
2025-11-10 05:51:51 +00:00
committed by system
parent c8072a861d
commit e6fe4e7209

616
README.md
View File

@@ -1,204 +1,456 @@
---
library_name: transformers
tags:
- text-generation
- qwen3
- rl-swarm
- genrl-swarm
- grpo
- gensyn
- I am tall_tame_panther
- trl
- reasoning
- math
- logic
- continuous-training
- reinforcement-learning
- safetensors
- gguf
- conversational
- text-generation-inference
pipeline_tag: text-generation
license: apache-2.0
language:
- en
base_model: Qwen/Qwen3-0.6B
datasets:
- propositional_logic
- calendar_arithmetic
- decimal_arithmetic
- base_conversion
- fraction_simplification
- basic_arithmetic
inference: true
model-index:
- name: Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther
results: []
widget:
- text: "What is 15 * 23?"
example_title: "Basic Arithmetic"
- text: "Convert decimal 255 to hexadecimal."
example_title: "Base Conversion"
- text: "Simplify the fraction 24/36."
example_title: "Fraction Simplification"
---
# Model Card for Model ID
# Qwen3-0.6B-Gensyn-Swarm (tall_tame_panther)
[![Model](https://img.shields.io/badge/🤗%20Hugging%20Face-Model-blue)](https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther)
[![GGUF](https://img.shields.io/badge/GGUF-Available-green)](https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther/tree/main)
[![Gensyn](https://img.shields.io/badge/Trained%20with-Gensyn%20RL--Swarm-orange)](https://gensyn.ai)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
## Model Overview
This model is a continuously trained Qwen3-0.6B fine-tuned using **Gensyn RL-Swarm** framework with **GRPO (Generalized Reward Policy Optimization)** for enhanced reasoning and mathematical capabilities.
**Agent ID:** `tall_tame_panther`
**Training Status:** 🔴 LIVE - Model updates automatically every 5-10 minutes
**Current Progress:** Round 43610+ / 1,000,000
**Framework Version:** Gensyn RL-Swarm v0.4.2
**Contract:** SwarmCoordinator v0.4.2
## Key Features
- **Real-time Training**: Continuous learning with distributed RL across Gensyn swarm network
- **Multi-domain Reasoning**: Trained on logic, arithmetic, and mathematical problem-solving
- **GGUF Support**: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M)
- **llama.cpp Compatible**: Ready for edge deployment and local inference
- **BF16 Precision**: Trained with bfloat16 for optimal performance
- **TGI Compatible**: Supports Text Generation Inference for production deployment
- **Conversational**: Can be used for interactive reasoning tasks
## Training Data
The model is trained on a composite dataset (1,000 samples) with weighted sampling strategy defined in `datasets.yaml`:
| Dataset | Weight | Samples | Focus Area |
|---------|--------|---------|------------|
| Propositional Logic | 7 | 500 | Logical reasoning, truth tables, Boolean operations |
| Calendar Arithmetic | 6 | 500 | Date calculations, leap years, recurring events |
| Decimal Arithmetic | 5 | 500 | Multi-term decimal operations with precision |
| Base Conversion | 4 | 500 | Number system conversions (base 2-16) |
| Fraction Simplification | 4 | 500 | GCD/LCM, fraction reduction |
| Basic Arithmetic | 2 | 500 | Foundation operations with parentheses |
**Total Dataset Size:** 1,000 composite samples
**Training Samples per Round:** 2
**Evaluation Samples:** Real-time via swarm coordination
### Dataset Configuration Details
```
# From rgym_exp/src/datasets.yaml
Propositional Logic:
- Variables: 2-4
- Statements: 2-4
- Complexity: 1-3
Calendar Arithmetic:
- Year: 2023
- Offset: up to 100 days
- Leap year range: 200 years
- Tasks: count_days, weekday_of_date, is_leap_year, recurring_event_day
Decimal Arithmetic:
- Terms: 2-6
- Decimal places: 1-3
- Precision: 5
Base Conversion:
- Base range: 2-16
- Value range: 0-1000
Fraction Simplification:
- Value range: 1-100
- Factor range: 2-100
- Styles: plain, latex_frac, latex_dfrac
Basic Arithmetic:
- Terms: 2-6
- Digits: 1-4
- Operators: +, -, *, /
- Parentheses: enabled
```
## Quick Start
### Standard Transformers
```
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther")
# Example: Math reasoning
prompt = "What is 3/4 simplified to lowest terms?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=256, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs, skip_special_tokens=True))
```
### Text Generation Inference (TGI)
```
docker run -d --gpus all \
-p 8080:80 \
-v $PWD/data:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id 0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther \
--max-input-length 4096 \
--max-total-tokens 8192
```
### GGUF with llama.cpp
```
# Download quantized model (recommended: Q4_K_M)
wget https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen3-0.6B-Gensyn-Swarm-Q4_K_M.gguf
# Run inference
./llama-cli -m Qwen3-0.6B-Gensyn-Swarm-Q4_K_M.gguf \
-p "Solve: (5 + 3) * 2 = ?" \
--temp 0.6 --top-p 0.95
```
### Ollama
```
# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./Qwen3-0.6B-Gensyn-Swarm-Q4_K_M.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
SYSTEM "You are a helpful assistant specialized in mathematical reasoning and logic."
EOF
# Create and run
ollama create qwen3-swarm -f Modelfile
ollama run qwen3-swarm "What is 15 multiplied by 23?"
```
## Available Formats
| Format | Size | Precision | Use Case | Download |
|--------|------|-----------|----------|----------|
| Safetensors (BF16) | 1.19 GB | BF16 | Full precision training/fine-tuning | `model.safetensors` |
| GGUF F16 | 1.14 GB | FP16 | High quality inference | `Qwen3-0.6B-Gensyn-Swarm-F16.gguf` |
| GGUF Q5_K_M | 444 MB | 5-bit | Balanced quality/size | `Qwen3-0.6B-Gensyn-Swarm-Q5_K_M.gguf` |
| GGUF Q4_K_M | 397 MB | 4-bit | **Recommended** for production | `Qwen3-0.6B-Gensyn-Swarm-Q4_K_M.gguf` |
| GGUF Q3_K_M | 347 MB | 3-bit | Smallest, fastest | `Qwen3-0.6B-Gensyn-Swarm-Q3_K_M.gguf` |
All GGUF formats are **llama.cpp compatible** and auto-updated hourly.
## Training Configuration
### Gensyn RL-Swarm Architecture
The model is trained using a decentralized reinforcement learning framework with the following components:
```
# From rgym_exp/config/rg-swarm.yaml
Training Framework:
Method: GRPO (Generalized Reward Policy Optimization)
Base Model: Qwen/Qwen3-0.6B
Training Regime: bfloat16 mixed precision
Max Rounds: 1,000,000
Max Stage: 1
Update Frequency: Every 5-10 minutes
Generations per Round: 2
Transplant Trees: 1
Seed: 42
Blockchain Integration:
Network: Gensyn Testnet
Chain ID: 685685
RPC: https://gensyn-testnet.g.alchemy.com/public
Contract: SwarmCoordinator v0.4.2
Modal Proxy: http://localhost:3000/api/
Swarm Communication:
Framework: Hivemind P2P Backend
Initial Peers: 3 bootnodes
Bootnodes:
- /ip4/38.101.215.12/tcp/30011/p2p/QmQ2gEXoPJg6iMBSUFWGzAabS2VhnzuS782Y637hGjfsRJ
- /ip4/38.101.215.13/tcp/30012/p2p/QmWhiaLrx3HRZfgXc2i7KW5nMUNK7P9tRc71yFJdGEZKkC
- /ip4/38.101.215.14/tcp/30013/p2p/QmQa1SCfYTxx7RvU7qJJRo79Zm1RAwPpkeLueDVJuBBmFp
Startup Timeout: 120s
Beam Size: 25
Reward System:
Manager: DefaultRewardManager
Function Store: RoundRewardFnStore
Reward Function: RGRewards (Reasoning Gym Rewards)
Judge: Swarm Judge API (https://swarm-judge.internal-apps-central1.clusters.gensyn.ai)
```
### Training Hyperparameters
```
Model Architecture:
Hidden Size: 1024
Intermediate Size: 3072
Num Hidden Layers: 28
Num Attention Heads: 16
Num Key-Value Heads: 8
Head Dimension: 128
Max Position Embeddings: 40,960
RMS Norm Epsilon: 1e-06
Rope Theta: 1,000,000
Vocabulary Size: 151,936
GRPO Trainer Config:
Epsilon: 0.2
Epsilon High: 0.28
Generations: 2
Gradient Checkpointing: Enabled
Learning Rate: Adaptive
Generation Config:
Temperature: 0.6
Top-K: 20
Top-P: 0.95
BOS Token: 151643
EOS Token: 151645
Pad Token: 151643
```
## Model Capabilities
This model excels at:
1. **Logical Reasoning**: Propositional logic, truth evaluation, Boolean algebra, logical equivalences
2. **Mathematical Operations**: Multi-precision arithmetic, decimal calculations, fraction manipulation
3. **Number Systems**: Base conversion between binary, octal, decimal, hexadecimal
4. **Date/Time Calculations**: Calendar arithmetic, leap year detection, day-of-week calculations
5. **Step-by-step Problem Solving**: Chain-of-thought reasoning for complex multi-step tasks
6. **Conversational Math Tutoring**: Interactive problem-solving guidance
## Limitations
- **Specialized Domain**: Optimized for reasoning/math tasks; may underperform on creative writing or general chat
- **Training in Progress**: Model weights update every 5-10 minutes; performance may vary between checkpoints
- **Scale**: 0.6B parameters - suitable for edge devices but not state-of-the-art for complex reasoning
- **Experimental**: Trained via decentralized RL swarm; behavior may be less predictable than supervised models
- **Context Length**: 40K tokens supported but best performance within 4K tokens
## Update Schedule
| Format | Update Frequency | Trigger |
|--------|------------------|---------|
| Safetensors (BF16) | Every 5-10 minutes | Automatic via RL-Swarm training |
| GGUF variants (all) | Every 1 hour | Automatic conversion from latest checkpoint |
**Auto-Conversion Pipeline:**
- Monitors repo for new training commits
- Downloads latest `model.safetensors`
- Converts to F16 GGUF base
- Quantizes to Q3_K_M, Q4_K_M, Q5_K_M
- Uploads all formats to repo
Check commit history for exact timestamps of each update.
## Gensyn RL-Swarm Technical Details
This model is trained using [Gensyn RL-Swarm](https://gensyn.ai), a decentralized reinforcement learning framework:
### Architecture Components
1. **Game Manager** (`rgym_exp/src/manager.py`): Orchestrates training rounds and swarm coordination
2. **Trainer** (`rgym_exp/src/trainer.py`): GRPO implementation for policy optimization
3. **Data Manager** (`rgym_exp/src/data.py`): Handles dataset loading and sampling
4. **Reward Manager** (`rgym_exp/src/rewards.py`): Computes rewards using judge API
5. **Coordinator** (`rgym_exp/src/coordinator.py`): Blockchain integration for swarm state
6. **Communication Backend**: Hivemind DHT for peer-to-peer model sharing
### Training Process
```
1. Agent joins swarm via P2P network
2. Coordinator assigns training round via smart contract
3. Agent samples data from weighted datasets
4. Model generates responses (2 generations)
5. Judge API evaluates quality and assigns rewards
6. GRPO updates policy based on rewards
7. Updated model shared via DHT to swarm
8. Best model checkpoint saved to HuggingFace
9. Repeat for next round
```
### Decentralization Benefits
- **Fault Tolerance**: Multiple agents contribute; single node failure doesn't stop training
- **Diverse Exploration**: Different agents explore different strategies
- **Collective Intelligence**: Agents learn from each other's experiences
- **Transparent Verification**: All training rounds verified on-chain
**Swarm Agent:** `tall_tame_panther`
**Contract:** SwarmCoordinator v0.4.2
**Testnet Explorer:** https://gensyn-testnet.explorer.com
## Technical Specifications
### Software Stack
- **Training Framework**: Gensyn RL-Swarm v0.4.2
- **Base Library**: transformers v4.51.3
- **Communication**: hivemind (P2P backend)
- **Blockchain**: Web3.py (Gensyn testnet)
- **Configuration**: Hydra + OmegaConf
- **Logging**: WandB integration
### Hardware Requirements
**Training Node:**
- GPU: NVIDIA A100 40GB or equivalent (for BF16 training)
- RAM: 32GB+ system memory
- Storage: 50GB SSD
- Network: High bandwidth for P2P swarm communication
**Inference:**
- Safetensors: 8GB+ VRAM (GPU), 16GB+ RAM (CPU)
- GGUF Q4_K_M: 4GB RAM (CPU), 2GB VRAM (GPU)
- GGUF Q3_K_M: 3GB RAM (CPU-only compatible)
<!-- Provide a quick summary of what the model is/does. -->
## Reproducibility
To reproduce training results:
1. Clone Gensyn RL-Swarm repository
2. Install dependencies: `pip install -r requirements.txt`
3. Configure `rgym_exp/config/rg-swarm.yaml` with your settings
4. Set environment variables:
```
export HUGGINGFACE_ACCESS_TOKEN=<your-token>
export MODEL_NAME=Qwen/Qwen3-0.6B
export ORG_ID=<your-org-id>
export SWARM_CONTRACT=<contract-address>
```
5. Run: `bash run_rl_swarm.sh`
## Model Details
**Note:** Exact reproduction requires same seed (42), dataset configuration, and swarm coordination state.
## Citation
### Model Description
```
@misc{qwen3-gensyn-swarm-2025,
author = {0xgr3y},
title = {Qwen3-0.6B-Gensyn-Swarm: Continuous RL Training on Distributed Swarm},
year = {2025},
publisher = {HuggingFace},
journal = {HuggingFace Model Hub},
howpublished = {\url{https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther}},
note = {Agent ID: tall\_tame\_panther}
}
@misc{gensyn-rl-swarm-2025,
title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework},
author = {Gensyn AI},
year = {2025},
url = {https://gensyn.ai},
note = {SwarmCoordinator v0.4.2}
}
@article{lacoste2019quantifying,
title={Quantifying the Carbon Emissions of Machine Learning},
author={Lacoste, Alexandre and others},
journal={arXiv preprint arXiv:1910.09700},
year={2019}
}
```
## References
- **arXiv:1910.09700** - ML Carbon Emissions methodology
- **Gensyn Documentation**: https://docs.gensyn.ai
- **Qwen3 Model Card**: https://huggingface.co/Qwen/Qwen3-0.6B
- **Technical Report**: See `technical_report.pdf` in training repository
## License
Apache 2.0 - See [LICENSE](LICENSE) for details
## Contact & Support
<!-- Provide a longer summary of what this model is. -->
- **Developer**: 0xgr3y
- **Agent ID**: tall_tame_panther
- **Issues**: Open an issue on this repo
- **Community**: [Gensyn Discord](https://discord.gg/gensyn)
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
---
- **Developed by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
**⚠️ Important Note**: This is a continuously trained model. For reproducibility, always specify the exact commit hash:
### Model Sources [optional]
```
# Download specific checkpoint
git clone https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther
cd Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther
git checkout <commit-hash>
```
<!-- Provide the basic links for the model. -->
---
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
<div align="center">
## Uses
**🤖 Trained with ❤️ using Gensyn RL-Swarm**
[![Gensyn](https://img.shields.io/badge/Powered%20by-Gensyn%20AI-orange?style=for-the-badge)](https://gensyn.ai)
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
[More Information Needed]
### Downstream Use [optional]
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[More Information Needed]
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing [optional]
[More Information Needed]
#### Training Hyperparameters
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
#### Speeds, Sizes, Times [optional]
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
[More Information Needed]
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
[More Information Needed]
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
[More Information Needed]
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[More Information Needed]
### Results
[More Information Needed]
#### Summary
## Model Examination [optional]
<!-- Relevant interpretability work for the model goes here -->
[More Information Needed]
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
## Technical Specifications [optional]
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
[More Information Needed]
#### Hardware
[More Information Needed]
#### Software
[More Information Needed]
## Citation [optional]
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Model Card Authors [optional]
[More Information Needed]
## Model Card Contact
[More Information Needed]
</div>
```