初始化项目,由ModelHub XC社区提供模型
Model: harryadav3/Qwen3-30B-A3B-REAP-50 Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
333
README.md
Normal file
333
README.md
Normal file
@@ -0,0 +1,333 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen3-30B-A3B
|
||||
tags:
|
||||
- reap
|
||||
- moe
|
||||
- pruning
|
||||
- expert-pruning
|
||||
- qwen3
|
||||
- mixture-of-experts
|
||||
- compression
|
||||
- one-shot
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
model-index:
|
||||
- name: Qwen3-30B-A3B-REAP-50
|
||||
results:
|
||||
- task:
|
||||
type: text-generation
|
||||
name: MMLU
|
||||
dataset:
|
||||
type: cais/mmlu
|
||||
name: MMLU
|
||||
metrics:
|
||||
- type: acc
|
||||
value: 49.42
|
||||
name: Accuracy (0-shot)
|
||||
- task:
|
||||
type: text-generation
|
||||
name: ARC Challenge
|
||||
dataset:
|
||||
type: allenai/ai2_arc
|
||||
name: ARC-Challenge
|
||||
metrics:
|
||||
- type: acc_norm
|
||||
value: 38.65
|
||||
name: Accuracy Normalized (0-shot)
|
||||
- task:
|
||||
type: text-generation
|
||||
name: HellaSwag
|
||||
dataset:
|
||||
type: Rowan/hellaswag
|
||||
name: HellaSwag
|
||||
metrics:
|
||||
- type: acc_norm
|
||||
value: 47.64
|
||||
name: Accuracy Normalized (0-shot)
|
||||
- task:
|
||||
type: text-generation
|
||||
name: BoolQ
|
||||
dataset:
|
||||
type: google/boolq
|
||||
name: BoolQ
|
||||
metrics:
|
||||
- type: acc
|
||||
value: 74.22
|
||||
name: Accuracy (0-shot)
|
||||
- task:
|
||||
type: text-generation
|
||||
name: WinoGrande
|
||||
dataset:
|
||||
type: allenai/winogrande
|
||||
name: WinoGrande
|
||||
metrics:
|
||||
- type: acc
|
||||
value: 58.80
|
||||
name: Accuracy (0-shot)
|
||||
---
|
||||
|
||||
# Qwen3-30B-A3B-REAP-50: 50% Expert-Pruned Qwen3 MoE
|
||||
|
||||
This model is a **50% expert-pruned** version of [Qwen/Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B), compressed using **REAP (Router-weighted Expert Activation Pruning)** from [Cerebras Research](https://github.com/CerebrasResearch/reap).
|
||||
|
||||
REAP is a one-shot compression technique for Mixture-of-Experts (MoE) models that physically removes low-importance experts based on a saliency criterion combining router gate-values and expert activation norms. The method was published at **ICLR 2026**.
|
||||
|
||||
## What Changed
|
||||
|
||||
| Property | Original | Pruned |
|
||||
|----------|----------|--------|
|
||||
| **Total Experts per Layer** | 128 | **64** |
|
||||
| **Active Experts per Token** | 8 | 8 (unchanged) |
|
||||
| **Model Size on Disk** | 57 GB | **30 GB** |
|
||||
| **Safetensor Shards** | 16 | 7 |
|
||||
| **Architecture** | Qwen3MoeForCausalLM | Qwen3MoeForCausalLM (unchanged) |
|
||||
| **Hidden Size** | 2048 | 2048 (unchanged) |
|
||||
| **Layers** | 48 | 48 (unchanged) |
|
||||
| **Precision** | bfloat16 | bfloat16 (unchanged) |
|
||||
|
||||
The pruned model is a standard HuggingFace model and can be loaded directly with `transformers` -- no custom code required.
|
||||
|
||||
---
|
||||
|
||||
## How REAP Works
|
||||
|
||||
### The Problem
|
||||
|
||||
MoE models like Qwen3-30B-A3B use sparsely-activated expert networks: each token is routed to only 8 of 128 available experts per layer. This means most experts sit idle for any given input, making many experts redundant. REAP exploits this by identifying and removing the least important experts.
|
||||
|
||||
### The REAP Saliency Criterion
|
||||
|
||||
REAP scores each expert using a dual criterion that captures both **how often an expert is selected** and **how much it contributes when active**:
|
||||
|
||||
```
|
||||
REAP_score(expert_i) = mean over calibration tokens of:
|
||||
router_weight(expert_i) * activation_norm(expert_i)
|
||||
```
|
||||
|
||||
Where:
|
||||
- **Router weight** (`router_weight`): The softmax probability assigned by the gating network when selecting this expert. Higher means the router "prefers" this expert.
|
||||
- **Expert Activation Norm** (`activation_norm`): The L2 norm of the expert's output vector. Higher means the expert produces larger (more impactful) modifications to the hidden state.
|
||||
|
||||
The product captures experts that are both frequently/strongly selected AND produce meaningful outputs. An expert with high router weight but low activation norm is just noise; one with high activation norm but low router weight is rarely used. REAP finds the experts that matter on both dimensions.
|
||||
|
||||
### Why Pruning Beats Merging
|
||||
|
||||
The REAP paper (ICLR 2026) demonstrates a key finding: **expert pruning consistently outperforms expert merging** for MoE compression on generative tasks. Merging (combining similar experts into one) degrades all participating experts, while pruning (removing entire experts) preserves the full capacity of remaining experts and the router's ability to select among them.
|
||||
|
||||
### The Full Pipeline
|
||||
|
||||
```
|
||||
1. Load Model
|
||||
|
|
||||
2. Attach Observer Hooks to every MoE layer
|
||||
|
|
||||
3. Forward Pass over calibration data (1024 samples)
|
||||
|-- Record router weights per expert per token
|
||||
|-- Record L2 norm of expert outputs per token
|
||||
|
|
||||
4. Compute REAP saliency score for each expert
|
||||
|-- score = mean(router_weight * activation_norm)
|
||||
|
|
||||
5. Rank experts by saliency score (lowest = least important)
|
||||
|
|
||||
6. Prune bottom 50% of experts per layer
|
||||
|-- Remove expert modules from ModuleList
|
||||
|-- Slice router weight matrix to match
|
||||
|
|
||||
7. Update config.json (num_experts: 128 -> 64)
|
||||
|
|
||||
8. Save compressed model
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Detailed Parameters Used
|
||||
|
||||
### Model Configuration
|
||||
|
||||
| Parameter | Value | Description |
|
||||
|-----------|-------|-------------|
|
||||
| `model_name` | `Qwen/Qwen3-30B-A3B` | Base model: 30B total params, 3B active per token |
|
||||
| `num_hidden_layers` | 48 | Number of transformer layers |
|
||||
| `hidden_size` | 2048 | Hidden dimension |
|
||||
| `num_attention_heads` | 32 | Multi-head attention heads |
|
||||
| `num_key_value_heads` | 4 | GQA key-value heads |
|
||||
| `head_dim` | 128 | Per-head dimension |
|
||||
| `intermediate_size` | 6144 | FFN intermediate size (shared experts) |
|
||||
| `moe_intermediate_size` | 768 | Per-expert FFN intermediate size |
|
||||
| `num_experts` | 128 -> **64** | Experts per MoE layer (before -> after) |
|
||||
| `num_experts_per_tok` | 8 | Top-K experts activated per token (unchanged) |
|
||||
| `vocab_size` | 151,936 | Vocabulary size |
|
||||
| `max_position_embeddings` | 40,960 | Maximum sequence length |
|
||||
| `torch_dtype` | bfloat16 | Model precision |
|
||||
|
||||
### Pruning Configuration
|
||||
|
||||
| Parameter | Value | Description |
|
||||
|-----------|-------|-------------|
|
||||
| `prune_method` | `reap` | REAP saliency criterion (router_weight * activation_norm) |
|
||||
| `compression_ratio` | 0.50 | Remove 50% of experts (128 -> 64 per layer) |
|
||||
| `seed` | 42 | Random seed for reproducibility |
|
||||
| `singleton_super_experts` | `false` | Do not force high-activation outlier experts into singleton clusters |
|
||||
| `singleton_outlier_experts` | `false` | Do not force outlier experts into singleton clusters |
|
||||
|
||||
### Observer Configuration (Activation Collection)
|
||||
|
||||
| Parameter | Value | Description |
|
||||
|-----------|-------|-------------|
|
||||
| `samples_per_category` | 1024 | Number of calibration samples processed |
|
||||
| `batch_size` | 1 | Samples per forward pass |
|
||||
| `model_max_length` | 2048 | Maximum sequence length for calibration |
|
||||
| `distance_measure` | `cosine` | Distance metric for expert similarity |
|
||||
| `renormalize_router_weights` | `true` | Renormalize router logits after softmax |
|
||||
| `record_pruning_metrics_only` | `true` | Only collect metrics needed for pruning (skip merging metrics) |
|
||||
| `overwrite_observations` | `false` | Do not overwrite existing observation files |
|
||||
|
||||
### Calibration Dataset
|
||||
|
||||
| Parameter | Value | Description |
|
||||
|-----------|-------|-------------|
|
||||
| `dataset_name` | `theblackcat102/evol-codealpaca-v1` | Code instruction-following dataset |
|
||||
| `split` | `train` | Dataset split used |
|
||||
| `shuffle` | `true` | Shuffle before sampling |
|
||||
|
||||
### Clustering Configuration
|
||||
|
||||
| Parameter | Value | Description |
|
||||
|-----------|-------|-------------|
|
||||
| `cluster_method` | `agglomerative` | Hierarchical agglomerative clustering |
|
||||
| `expert_sim` | `ttm` | Token-to-token similarity matrix for expert similarity |
|
||||
| `linkage_method` | `average` | Average linkage for hierarchical clustering |
|
||||
| `frequency_penalty` | `true` | Penalize frequently-used experts during clustering |
|
||||
|
||||
---
|
||||
|
||||
|
||||
### Timing
|
||||
|
||||
| Phase | Duration |
|
||||
|-------|----------|
|
||||
| Model loading | ~5 seconds |
|
||||
| Observer pass (1024 samples) | ~6.5 hours |
|
||||
| Expert pruning (all 48 layers) | < 1 second |
|
||||
| Model saving | ~26 seconds |
|
||||
| **Total** | **~6.5 hours** |
|
||||
|
||||
---
|
||||
|
||||
## Evaluation Results (0-shot, lm-eval-harness v0.4.11)
|
||||
|
||||
| Benchmark | Metric | Score |
|
||||
|-----------|--------|-------|
|
||||
| **MMLU** (57 subjects) | acc | **49.42%** |
|
||||
| -- Humanities | acc | 39.17% |
|
||||
| -- Social Sciences | acc | 60.38% |
|
||||
| -- STEM | acc | 56.68% |
|
||||
| -- Other | acc | 46.73% |
|
||||
| **ARC Challenge** | acc | 33.62% |
|
||||
| **ARC Challenge** | acc_norm | 38.65% |
|
||||
| **ARC Easy** | acc | 53.16% |
|
||||
| **ARC Easy** | acc_norm | 50.51% |
|
||||
| **HellaSwag** | acc | 37.70% |
|
||||
| **HellaSwag** | acc_norm | 47.64% |
|
||||
| **BoolQ** | acc | **74.22%** |
|
||||
| **WinoGrande** | acc | 58.80% |
|
||||
| **OpenBookQA** | acc | 19.80% |
|
||||
| **OpenBookQA** | acc_norm | 31.20% |
|
||||
| **RTE** | acc | 58.48% |
|
||||
|
||||
### Evaluation Notes
|
||||
|
||||
- All benchmarks run at **0-shot** (no few-shot examples)
|
||||
- Evaluation performed on the **base model** (not instruction-tuned)
|
||||
- Evaluated using `lm-eval-harness` v0.4.11 with `model="hf"` backend
|
||||
- Model loaded with `device_map="auto"` across 2 GPUs
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
### Direct Loading with Transformers
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "harryadav3/Qwen3-30B-A3B-REAP-50"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
device_map="auto",
|
||||
torch_dtype="auto",
|
||||
trust_remote_code=True,
|
||||
)
|
||||
|
||||
messages = [{"role": "user", "content": "Write a Python function to check if a number is prime."}]
|
||||
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
|
||||
inputs = inputs.to(model.device)
|
||||
|
||||
outputs = model.generate(inputs, max_new_tokens=512)
|
||||
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
### Serving with vLLM
|
||||
|
||||
```bash
|
||||
vllm serve harryadav3/Qwen3-30B-A3B-REAP-50 \
|
||||
--tensor-parallel-size 2 \
|
||||
--port 8000 \
|
||||
--trust-remote-code
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Reproducing This Model
|
||||
|
||||
```bash
|
||||
# Clone REAP
|
||||
git clone https://github.com/CerebrasResearch/reap.git
|
||||
cd reap
|
||||
git submodule init && git submodule update --recursive
|
||||
|
||||
# Install
|
||||
uv venv .venv --seed --python 3.12
|
||||
source .venv/bin/activate
|
||||
uv pip install --editable . --native-tls --torch-backend auto
|
||||
|
||||
# Download base model
|
||||
huggingface-cli download Qwen/Qwen3-30B-A3B
|
||||
|
||||
# Run REAP pruning
|
||||
bash experiments/pruning-cli.sh \
|
||||
0,1 \
|
||||
"Qwen/Qwen3-30B-A3B" \
|
||||
"reap" \
|
||||
42 \
|
||||
0.50 \
|
||||
"theblackcat102/evol-codealpaca-v1" \
|
||||
false false false false false false false
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Citation
|
||||
|
||||
If you use this model, please cite the REAP paper:
|
||||
|
||||
```bibtex
|
||||
@inproceedings{klasby2025reap,
|
||||
title={{REAP} the Experts: Why Pruning Prevails for One-Shot {MoE} Compression},
|
||||
author={Mike Klasby and Thao Nguyen and Robert D Nowak},
|
||||
booktitle={The Fourteenth International Conference on Learning Representations},
|
||||
year={2025},
|
||||
url={https://arxiv.org/abs/2510.13999}
|
||||
}
|
||||
```
|
||||
|
||||
## Links
|
||||
|
||||
- **REAP Paper**: [arXiv:2510.13999](https://arxiv.org/abs/2510.13999)
|
||||
- **REAP Repository**: [github.com/CerebrasResearch/reap](https://github.com/CerebrasResearch/reap)
|
||||
- **Base Model**: [Qwen/Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B)
|
||||
- **Cerebras Blog**: [cerebras.ai/blog/reap](https://www.cerebras.ai/blog/reap)
|
||||
28
added_tokens.json
Normal file
28
added_tokens.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"</think>": 151668,
|
||||
"</tool_call>": 151658,
|
||||
"</tool_response>": 151666,
|
||||
"<think>": 151667,
|
||||
"<tool_call>": 151657,
|
||||
"<tool_response>": 151665,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
89
chat_template.jinja
Normal file
89
chat_template.jinja
Normal file
@@ -0,0 +1,89 @@
|
||||
{%- if tools %}
|
||||
{{- '<|im_start|>system\n' }}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{{- messages[0].content + '\n\n' }}
|
||||
{%- endif %}
|
||||
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
||||
{%- for tool in tools %}
|
||||
{{- "\n" }}
|
||||
{{- tool | tojson }}
|
||||
{%- endfor %}
|
||||
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
||||
{%- else %}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
||||
{%- for message in messages[::-1] %}
|
||||
{%- set index = (messages|length - 1) - loop.index0 %}
|
||||
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
|
||||
{%- set ns.multi_step_tool = false %}
|
||||
{%- set ns.last_query_index = index %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- for message in messages %}
|
||||
{%- if message.content is string %}
|
||||
{%- set content = message.content %}
|
||||
{%- else %}
|
||||
{%- set content = '' %}
|
||||
{%- endif %}
|
||||
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
||||
{%- elif message.role == "assistant" %}
|
||||
{%- set reasoning_content = '' %}
|
||||
{%- if message.reasoning_content is string %}
|
||||
{%- set reasoning_content = message.reasoning_content %}
|
||||
{%- else %}
|
||||
{%- if '</think>' in content %}
|
||||
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
||||
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- if loop.index0 > ns.last_query_index %}
|
||||
{%- if loop.last or (not loop.last and reasoning_content) %}
|
||||
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||
{%- endif %}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||
{%- endif %}
|
||||
{%- if message.tool_calls %}
|
||||
{%- for tool_call in message.tool_calls %}
|
||||
{%- if (loop.first and content) or (not loop.first) %}
|
||||
{{- '\n' }}
|
||||
{%- endif %}
|
||||
{%- if tool_call.function %}
|
||||
{%- set tool_call = tool_call.function %}
|
||||
{%- endif %}
|
||||
{{- '<tool_call>\n{"name": "' }}
|
||||
{{- tool_call.name }}
|
||||
{{- '", "arguments": ' }}
|
||||
{%- if tool_call.arguments is string %}
|
||||
{{- tool_call.arguments }}
|
||||
{%- else %}
|
||||
{{- tool_call.arguments | tojson }}
|
||||
{%- endif %}
|
||||
{{- '}\n</tool_call>' }}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- elif message.role == "tool" %}
|
||||
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
||||
{{- '<|im_start|>user' }}
|
||||
{%- endif %}
|
||||
{{- '\n<tool_response>\n' }}
|
||||
{{- content }}
|
||||
{{- '\n</tool_response>' }}
|
||||
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|im_start|>assistant\n' }}
|
||||
{%- if enable_thinking is defined and enable_thinking is false %}
|
||||
{{- '<think>\n\n</think>\n\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
38
config.json
Normal file
38
config.json
Normal file
@@ -0,0 +1,38 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen3MoeForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 151643,
|
||||
"decoder_sparse_step": 1,
|
||||
"eos_token_id": 151645,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 2048,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 6144,
|
||||
"max_position_embeddings": 40960,
|
||||
"max_window_layers": 48,
|
||||
"mlp_only_layers": [],
|
||||
"model_type": "qwen3_moe",
|
||||
"moe_intermediate_size": 768,
|
||||
"norm_topk_prob": true,
|
||||
"num_attention_heads": 32,
|
||||
"num_experts": 64,
|
||||
"num_experts_per_tok": 8,
|
||||
"num_hidden_layers": 48,
|
||||
"num_key_value_heads": 4,
|
||||
"output_router_logits": false,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000.0,
|
||||
"router_aux_loss_coef": 0.001,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.55.0",
|
||||
"use_cache": true,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
13
generation_config.json
Normal file
13
generation_config.json
Normal file
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"bos_token_id": 151643,
|
||||
"do_sample": true,
|
||||
"eos_token_id": [
|
||||
151645,
|
||||
151643
|
||||
],
|
||||
"pad_token_id": 151643,
|
||||
"temperature": 0.6,
|
||||
"top_k": 20,
|
||||
"top_p": 0.95,
|
||||
"transformers_version": "4.55.0"
|
||||
}
|
||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00007.safetensors
Normal file
3
model-00001-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:343e05b7d1ef0da755b43857cc6fd7a7024e6746f8188421eecadf34c48382fa
|
||||
size 5000093080
|
||||
3
model-00002-of-00007.safetensors
Normal file
3
model-00002-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e8059d687853d725090941451f2e34fb18e9056f5564cce02321f5a48f9c6f1c
|
||||
size 4997775080
|
||||
3
model-00003-of-00007.safetensors
Normal file
3
model-00003-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a30af721a7d2cbc69f14c1fae5e189b07cb77ea77987ed08d593937f4a5e2ddc
|
||||
size 4997775720
|
||||
3
model-00004-of-00007.safetensors
Normal file
3
model-00004-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:222a8658711ed92b19469fbea7462e235fd17bc32cb041f8bd181c8bd1a4ae14
|
||||
size 4997775704
|
||||
3
model-00005-of-00007.safetensors
Normal file
3
model-00005-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:daedf7635da71ec1bef5bf48e4af12de389af8b77d7b133223333c2b8ae6f9ac
|
||||
size 4997505344
|
||||
3
model-00006-of-00007.safetensors
Normal file
3
model-00006-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:eef1450e55d8ddecab0d6dabad8298bfed074bf51c2dd149e048775538e0e06f
|
||||
size 4997775712
|
||||
3
model-00007-of-00007.safetensors
Normal file
3
model-00007-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:8a9ffd412a93627b6aece8f25732b25fc90c4f227cdea03e5060ab0cefc0d30f
|
||||
size 2073121000
|
||||
9659
model.safetensors.index.json
Normal file
9659
model.safetensors.index.json
Normal file
File diff suppressed because it is too large
Load Diff
77
reap_args.yaml
Normal file
77
reap_args.yaml
Normal file
@@ -0,0 +1,77 @@
|
||||
cluster_args:
|
||||
cluster_description: null
|
||||
cluster_method: agglomerative
|
||||
compression_ratio: 0.5
|
||||
expert_sim: ttm
|
||||
frequency_penalty: true
|
||||
linkage_method: average
|
||||
max_cluster_size: null
|
||||
multi_layer: null
|
||||
num_clusters: null
|
||||
singleton_outlier_experts: false
|
||||
singleton_super_experts: false
|
||||
softmax_temperature: null
|
||||
ds_args:
|
||||
dataset_config_name: null
|
||||
dataset_name: theblackcat102/evol-codealpaca-v1
|
||||
dataset_test_split: test
|
||||
shuffle: true
|
||||
split: train
|
||||
eval_args:
|
||||
evalplus_tasks:
|
||||
- mbpp
|
||||
- humaneval
|
||||
greedy: true
|
||||
lm_eval_tasks:
|
||||
- winogrande
|
||||
- arc_challenge
|
||||
- arc_easy
|
||||
- boolq
|
||||
- hellaswag
|
||||
- mmlu
|
||||
- openbookqa
|
||||
- rte
|
||||
min_p: 0.0
|
||||
parallel_tasks: 32
|
||||
results_dir: null
|
||||
run_evalplus: true
|
||||
run_livecodebench: true
|
||||
run_lm_eval: true
|
||||
run_math: false
|
||||
run_wildbench: false
|
||||
server_log_file_name: pruning-cli-0.log
|
||||
temperature: 0.7
|
||||
top_k: 20
|
||||
top_p: 0.8
|
||||
use_server: true
|
||||
vllm_port: 8000
|
||||
model_args:
|
||||
model_name: Qwen/Qwen3-30B-A3B
|
||||
num_experts_per_tok_override: null
|
||||
obs_args:
|
||||
batch_size: 1
|
||||
distance_measure: cosine
|
||||
model_max_length: 2048
|
||||
output_file_name: observations_1024_cosine-seed_42.pt
|
||||
overwrite_observations: false
|
||||
record_pruning_metrics_only: true
|
||||
renormalize_router_weights: true
|
||||
return_vllm_tokens_prompt: false
|
||||
samples_per_category: 1024
|
||||
select_only_categories: null
|
||||
split_by_category: false
|
||||
truncate: false
|
||||
prune_args:
|
||||
n_experts_to_prune: null
|
||||
overwrite_pruned_model: false
|
||||
perserve_outliers: false
|
||||
perserve_super_experts: false
|
||||
prune_method: reap
|
||||
reap_args:
|
||||
debug: false
|
||||
do_eval: false
|
||||
plot_clusters: true
|
||||
profile: false
|
||||
run_observer_only: false
|
||||
seed: 42
|
||||
smoke_test: true
|
||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
Binary file not shown.
239
tokenizer_config.json
Normal file
239
tokenizer_config.json
Normal file
@@ -0,0 +1,239 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151665": {
|
||||
"content": "<tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151666": {
|
||||
"content": "</tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151667": {
|
||||
"content": "<think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151668": {
|
||||
"content": "</think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 131072,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user