初始化项目，由ModelHub XC社区提供模型

Model: djtony707/synapse-3b Source: Original Platform
2026-06-20 17:21:20 +08:00
commit c9176e53f0
10 changed files with 972 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,169 @@
+---
+language:
+- en
+license: apache-2.0
+tags:
+- titan-synapse
+- specialist-swarm
+- continuous-learning
+- merged-model
+- mamba
+- xlstm
+- mixture-of-experts
+- fast-weights
+- brain-inspired
+- rust
+- local-inference
+base_model: Qwen/Qwen2.5-3B-Instruct
+model_type: qwen2
+pipeline_tag: text-generation
+datasets:
+- gsm8k
+- openwebmath
+- microsoft/orca-math-word-problems-200k
+- sahil2801/CodeAlpaca-20k
+- nickrosh/Evol-Instruct-Code-80k-v1
+- iamtarun/python_code_instructions_18k_alpaca
+- Open-Orca/SlimOrca
+- yahma/alpaca-cleaned
+---
+
+# Synapse-3B
+
+**Small models that think together. And learn.**
+
+Synapse-3B is a merged specialist model created by [TITAN Synapse](https://github.com/Djtony707/titan-synapse) — an open-source Rust inference engine that runs a swarm of tiny specialist models that collaborate and learn continuously on your GPU.
+
+This model combines **4 specialist LoRA adapters** (math, code, general, coordinator) trained on curated datasets, then merged into a single model using **TIES merging** (Trim, Elect Sign, Merge) for minimal interference between specializations.
+
+## Key Features
+
+- **4 specialist domains** merged into one model without catastrophic forgetting
+- **TIES merging** — trims small deltas, elects signs by majority vote, merges only agreeing directions
+- **Based on Qwen2.5-3B-Instruct** — strong Apache 2.0 base with multilingual support
+- **Part of the Synapse ecosystem** — designed for the brain-inspired Synapse Architecture (Mamba + xLSTM + Sparse MoE + Fast Weights)
+
+## How This Model Was Made
+
+```
+Base Model: Qwen/Qwen2.5-3B-Instruct (Apache 2.0)
+     |
+     +---> QLoRA (rank 64) ---> Math Specialist (GSM8K + OpenWebMath + Orca-Math, 50k samples)
+     +---> QLoRA (rank 64) ---> Code Specialist (CodeAlpaca + Evol-Instruct + Python-18k, 50k samples)
+     +---> QLoRA (rank 64) ---> General Specialist (SlimOrca + Alpaca-Cleaned, 50k samples)
+     +---> QLoRA (rank 32) ---> Coordinator (Synthetic routing, 5k samples)
+     |
+     +---> TIES Merge (trim 80%, sign election, agreement merge)
+     |
+     = Synapse-3B
+```
+
+### Specialist Details
+
+| Specialist | Datasets | Samples | LoRA Rank | Focus |
+|:---|:---|:---:|:---:|:---|
+| **Math** | GSM8K, OpenWebMath, Orca-Math | 50,000 | 64 | Mathematical reasoning, step-by-step problem solving |
+| **Code** | CodeAlpaca-20k, Evol-Instruct-Code-80k, Python-18k | 50,000 | 64 | Code generation, debugging, Python expertise |
+| **General** | SlimOrca, Alpaca-Cleaned | 50,000 | 64 | General knowledge, instruction following, reasoning |
+| **Coordinator** | Synthetic routing examples | 5,000 | 32 | Task analysis, specialist routing, swarm coordination |
+
+### Merge Method: TIES
+
+[TIES (Trim, Elect Sign, Merge)](https://arxiv.org/abs/2306.01708) is used to combine adapters with minimal interference:
+
+1. **Trim** — Remove small-magnitude deltas (keep top 20% per parameter)
+2. **Elect Sign** — For each parameter, take a majority vote on the sign direction across all specialists
+3. **Merge** — Only average deltas that agree with the elected sign
+
+This produces cleaner merges than simple averaging, preserving each specialist's strengths.
+
+## Usage
+
+### With Transformers
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("djtony707/synapse-3b")
+tokenizer = AutoTokenizer.from_pretrained("djtony707/synapse-3b")
+
+messages = [{"role": "user", "content": "Solve: If a train travels 120km in 2 hours, what is its speed in m/s?"}]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+### With TITAN Synapse Engine (Rust, local inference)
+
+```bash
+# Install
+curl -sSL https://raw.githubusercontent.com/Djtony707/titan-synapse/main/install.sh | bash
+
+# Pull and run
+synapse pull synapse-3b
+synapse up
+
+# OpenAI-compatible API on localhost:6900
+curl http://localhost:6900/v1/chat/completions \
+  -d '{"model":"synapse-3b","messages":[{"role":"user","content":"Hello!"}]}'
+```
+
+## The Synapse Architecture (v1.0 Target)
+
+Synapse-3B is the foundation for the **Synapse Architecture** — a brain-inspired modular model that replaces monolithic transformers:
+
+```
+                    THALAMUS (Mamba Router, O(n))
+                         |
+          +--------------+--------------+
+          |              |              |
+     xLSTM Lang    Sparse MoE     Fast-Weight
+      Module       Expert Pool      Memory
+      O(n)        top-k of 8+     Learn during
+     syntax,      specialists     inference,
+     grammar      activate        no backprop
+```
+
+- **No O(n^2) attention** — Mamba (state-space) + xLSTM (recurrent)
+- **Sparse activation** — only 2-3 of 8+ modules fire per token
+- **Fast-weight memory** — learn new facts in ONE forward pass
+- **Full observability** — every routing decision is transparent, no black box
+
+## Training Details
+
+- **Hardware**: NVIDIA RTX 5090 (32GB VRAM)
+- **Training framework**: QLoRA via TRL SFTTrainer
+- **Quantization**: 4-bit NF4 (for training efficiency)
+- **Learning rate**: 2e-4 with cosine scheduler
+- **Epochs**: 3 per specialist
+- **Batch size**: 2 (gradient accumulation 8, effective batch 16)
+- **Max sequence length**: 2048 tokens
+- **Training time**: ~2 hours per specialist on RTX 5090
+- **Merge method**: TIES (trim ratio 0.8)
+- **Created**: March 21, 2026
+
+## Limitations
+
+- This is a 3B parameter model — it won't match 70B+ models on complex reasoning
+- Trained on English-focused datasets; multilingual performance inherited from Qwen base
+- The coordinator specialist is trained on synthetic routing data; real-world routing improves with use
+- Best used as part of the TITAN Synapse swarm (multiple specialists collaborating)
+
+## Citation
+
+```bibtex
+@misc{synapse3b2026,
+  title={Synapse-3B: A Merged Specialist Model for the TITAN Synapse Engine},
+  author={Tony Elliott},
+  year={2026},
+  url={https://huggingface.co/djtony707/synapse-3b},
+  note={Created with TITAN Synapse — https://github.com/Djtony707/titan-synapse}
+}
+```
+
+## License
+
+Apache 2.0 — use it for anything.
+
+Built by [Tony Elliott](https://github.com/Djtony707) with [TITAN Synapse](https://github.com/Djtony707/titan-synapse).