初始化项目,由ModelHub XC社区提供模型
Model: khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Slerp Source: Original Platform
This commit is contained in:
87
README.md
Normal file
87
README.md
Normal file
@@ -0,0 +1,87 @@
|
||||
---
|
||||
base_model:
|
||||
- khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled
|
||||
- khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled
|
||||
library_name: transformers
|
||||
tags:
|
||||
- mergekit
|
||||
- merge
|
||||
license: apache-2.0
|
||||
pipeline_tag: text-generation
|
||||
language:
|
||||
- en
|
||||
datasets:
|
||||
- khazarai/kimi-2.5-high-reasoning-250x
|
||||
- khazarai/qwen3.6-plus-high-reasoning-500x
|
||||
---
|
||||
|
||||
# khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Slerp
|
||||
|
||||
|
||||

|
||||
|
||||
*Note: The sharp drop in "Creative Writing" is an expected and accepted trade-off to maximize extreme logical reasoning and coding precision.*
|
||||
|
||||
This model is a highly experimental and optimized reasoning model created through a surgical SLERP merge of two powerful 4B reasoning models. The goal of this merge was to combine the deep analytical capabilities of Kimi with the mathematical and structural precision of Qwen, while mitigating the catastrophic forgetting commonly seen in SFT model merges.
|
||||
After multiple iterations and layer-by-layer tensor analysis, we achieved a **"1+1=3 Synergy Effect"** in Logical Inference and Planning, outperforming both base models and the official Qwen Thinking model.
|
||||
|
||||
|
||||
### The "Golden Path" (V5) Strategy
|
||||
Standard SLERP merges often destroy RAG capabilities and syntax adherence. To solve this, this model utilizes a custom merge configuration:
|
||||
|
||||
1. **RAG/Vocabulary Fix:** `embed_tokens` and `lm_head` are strictly pinned to `1.0` (Qwen). The model reads and speaks purely using Qwen's vocabulary, completely eliminating the RAG degradation problem.
|
||||
2. **Gradient Attention:** The intermediate attention and MLP layers follow a smooth gradient `[0, 0.1, 0.2, 0.3, 0.5, 0.7, 0.8, 0.9, 1]` to prevent weight interference in deep reasoning steps.
|
||||
|
||||
## Benchmark Performance (Multi-Domain Reasoning)
|
||||
|
||||
|
||||
| Model | Score |
|
||||
| :--- | :--- |
|
||||
| **khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Slerp** | **77.18** |
|
||||
| khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled | 76.09 |
|
||||
| khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled | 75.64 |
|
||||
| Qwen/Qwen3-4B-Thinking-2507 | 73.73 |
|
||||
|
||||
- **Benchmark**: khazarai/Multi-Domain-Reasoning-Benchmark
|
||||
- **Total Questions**: 100
|
||||
|
||||
|
||||
## 💡 Intended Use Cases
|
||||
|
||||
* **Ideal for:** Complex logical deductions, Python code debugging, mathematical problem-solving, and strict RAG (Retrieval-Augmented Generation) pipelines.
|
||||
* **Not recommended for:** Creative writing, poetry, or highly imaginative storytelling.
|
||||
|
||||
|
||||
### Models Merged
|
||||
|
||||
The following models were included in the merge:
|
||||
* [khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled](https://huggingface.co/khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled)
|
||||
* [khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled](https://huggingface.co/khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled)
|
||||
|
||||
|
||||
### Configuration
|
||||
|
||||
The following YAML configuration was used to produce this model:
|
||||
|
||||
```yaml
|
||||
|
||||
models:
|
||||
- model: khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled
|
||||
- model: khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled
|
||||
merge_method: slerp
|
||||
base_model: khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled
|
||||
parameters:
|
||||
t:
|
||||
- filter: embed_tokens
|
||||
value: 1
|
||||
|
||||
- filter: lm_head
|
||||
value: 1
|
||||
|
||||
- value: 1
|
||||
|
||||
- filter: self
|
||||
value: [0, 0.1, 0.2, 0.3, 0.5, 0.7, 0.8, 0.9, 1]
|
||||
|
||||
dtype: bfloat16
|
||||
```
|
||||
Reference in New Issue
Block a user