初始化项目,由ModelHub XC社区提供模型
Model: Mercity/Qwen3-8B-LaCo-Pruned Source: Original Platform
This commit is contained in:
296
README.md
Normal file
296
README.md
Normal file
@@ -0,0 +1,296 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen3-8B-Base
|
||||
arvix: arxiv:2507.02279
|
||||
tags:
|
||||
- pruning
|
||||
- layer-pruning
|
||||
- laco
|
||||
- compressed
|
||||
- qwen3
|
||||
- llm
|
||||
- efficient
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
language:
|
||||
- en
|
||||
- zh
|
||||
- multilingual
|
||||
datasets:
|
||||
- wikipedia
|
||||
model-index:
|
||||
- name: Qwen3-8B-LaCo-Pruned
|
||||
results:
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: HellaSwag
|
||||
type: hellaswag
|
||||
metrics:
|
||||
- type: accuracy_norm
|
||||
value: 48.52
|
||||
name: Accuracy (Normalized)
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: PIQA
|
||||
type: piqa
|
||||
metrics:
|
||||
- type: accuracy_norm
|
||||
value: 65.67
|
||||
name: Accuracy (Normalized)
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: BoolQ
|
||||
type: boolq
|
||||
metrics:
|
||||
- type: accuracy
|
||||
value: 61.77
|
||||
name: Accuracy
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: MMLU
|
||||
type: mmlu
|
||||
metrics:
|
||||
- type: accuracy
|
||||
value: 25.12
|
||||
name: Accuracy (5-shot)
|
||||
---
|
||||
|
||||
# Qwen3-8B-LaCo-Pruned
|
||||
|
||||
This model is a **layer-pruned** version of [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) using the [LaCo (Layer Collapse)](https://arxiv.org/abs/2402.11187) structured pruning method.
|
||||
|
||||
## Model Summary
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Base Model** | [Qwen/Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) |
|
||||
| **Pruning Method** | LaCo (Layer Collapse) |
|
||||
| **Original Layers** | 36 |
|
||||
| **Pruned Layers** | 26 |
|
||||
| **Layers Removed** | 10 |
|
||||
| **Compression** | 27.8% |
|
||||
| **Parameters** | ~5.8B (reduced from ~8B) |
|
||||
|
||||
## Intended Use
|
||||
|
||||
- **Research** on model compression and efficiency
|
||||
- **Fine-tuning base** for domain-specific applications
|
||||
- **Inference optimization** where speed/memory matters more than factual accuracy
|
||||
- **Edge deployment** scenarios with limited computational resources
|
||||
|
||||
## ⚠️ Important Limitations
|
||||
|
||||
This pruned model has **significantly reduced factual knowledge capabilities**. It performs at near-random levels on knowledge-intensive benchmarks like MMLU.
|
||||
|
||||
| Use Case | Status |
|
||||
|----------|--------|
|
||||
| Physical reasoning tasks | ✅ Good (82.6% retained) |
|
||||
| Reading comprehension | ⚠️ Acceptable (74.3% retained) |
|
||||
| Common sense reasoning | ⚠️ Degraded (61.8% retained) |
|
||||
| Factual question answering | ❌ Not recommended |
|
||||
| Knowledge-intensive tasks | ❌ Not recommended |
|
||||
|
||||
**Recommendation:** Fine-tune this model on your target domain before deployment.
|
||||
|
||||
---
|
||||
|
||||
## Pruning Details
|
||||
|
||||
### LaCo Hyperparameters
|
||||
|
||||
| Parameter | Value | Description |
|
||||
|-----------|-------|-------------|
|
||||
| MERGE_LAYERS (C) | 3 | Layers merged per operation |
|
||||
| LOWEST_LAY (L) | 4 | Minimum layer index for merging |
|
||||
| HIGHEST_LAY (H) | 28 | Maximum layer index for merging |
|
||||
| INTERVAL (I) | 2 | Minimum gap between merge points |
|
||||
| THRESHOLD (T) | 0.85 | Cosine similarity threshold |
|
||||
| MAX_COMPRESSION | 30% | Maximum allowed compression |
|
||||
|
||||
### Pruning Statistics
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Successful Merges | 5 |
|
||||
| Rejected Merges | 0 |
|
||||
| Total Iterations | 6 |
|
||||
| Final Compression | 27.8% |
|
||||
|
||||
### Hidden State Similarity (Calibration Set)
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Average | 0.9680 |
|
||||
| Min | 0.9492 |
|
||||
| Max | 0.9766 |
|
||||
|
||||
Individual similarities: `[0.9492, 0.9727, 0.9609, 0.9766, 0.9688, 0.9648, 0.9648, 0.9766, 0.9727, 0.9727]`
|
||||
|
||||
### Perplexity Results
|
||||
|
||||
| Model | Perplexity | Ratio |
|
||||
|-------|------------|-------|
|
||||
| Original (Qwen3-8B-Base) | 26.19 | 1.00× |
|
||||
| Pruned (this model) | 71.48 | **2.73×** |
|
||||
|
||||
---
|
||||
|
||||
## Benchmark Results
|
||||
|
||||
### Comparison with Original Qwen3-8B-Base
|
||||
|
||||
| Benchmark | Original | Pruned | Retention | Status |
|
||||
|-----------|----------|--------|-----------|--------|
|
||||
| **PIQA** | 79.54% | 65.67% | 82.6% | ✅ Good |
|
||||
| **BoolQ** | 83.09% | 61.77% | 74.3% | ⚠️ Acceptable |
|
||||
| **HellaSwag** | 78.55% | 48.52% | 61.8% | ⚠️ Degraded |
|
||||
| **MMLU (5-shot)** | 76.89% | 25.12% | 32.7% | ❌ Near random |
|
||||
|
||||
*Original scores from [Qwen3 Technical Report](https://arxiv.org/abs/2505.09388)*
|
||||
|
||||
### Key Findings
|
||||
|
||||
1. **Physical reasoning preserved:** PIQA retained 82.6% of original performance
|
||||
2. **Factual knowledge destroyed:** MMLU collapsed to random-chance (25% for 4-way MCQ)
|
||||
3. **Perplexity underestimates damage:** 2.73× PPL ratio doesn't predict the benchmark collapse
|
||||
4. **Layer-specific knowledge:** Factual knowledge appears encoded in specific removed layers
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Inference
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "Mercity/Qwen3-8B-LaCo-Pruned"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
torch_dtype="auto",
|
||||
device_map="auto",
|
||||
trust_remote_code=True
|
||||
)
|
||||
|
||||
# Text generation
|
||||
prompt = "The process of photosynthesis"
|
||||
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
### With 4-bit Quantization (Further Compression)
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
||||
|
||||
quantization_config = BitsAndBytesConfig(
|
||||
load_in_4bit=True,
|
||||
bnb_4bit_compute_dtype="float16",
|
||||
bnb_4bit_quant_type="nf4",
|
||||
)
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"Mercity/Qwen3-8B-LaCo-Pruned",
|
||||
quantization_config=quantization_config,
|
||||
device_map="auto",
|
||||
trust_remote_code=True
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recovery Recommendations
|
||||
|
||||
To restore performance after pruning:
|
||||
|
||||
### Option 1: LoRA Fine-tuning (Recommended)
|
||||
```python
|
||||
from peft import LoraConfig, get_peft_model
|
||||
|
||||
lora_config = LoraConfig(
|
||||
r=32,
|
||||
lora_alpha=64,
|
||||
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
|
||||
"gate_proj", "up_proj", "down_proj"],
|
||||
lora_dropout=0.05,
|
||||
)
|
||||
model = get_peft_model(model, lora_config)
|
||||
# Fine-tune on OpenOrca, Alpaca, or domain-specific data
|
||||
```
|
||||
|
||||
### Option 2: Knowledge Distillation
|
||||
Use original Qwen3-8B-Base as teacher to transfer knowledge back.
|
||||
|
||||
### Expected Recovery
|
||||
- With fine-tuning: +15-25% on MMLU
|
||||
- With knowledge distillation: +25-35% on MMLU
|
||||
|
||||
---
|
||||
|
||||
## Technical Specifications
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| Architecture | Transformer decoder-only |
|
||||
| Parameters | ~5.8B |
|
||||
| Layers | 26 |
|
||||
| Hidden Size | 4096 |
|
||||
| Attention Heads (Q) | 32 |
|
||||
| Attention Heads (KV) | 8 (GQA) |
|
||||
| Intermediate Size | 12288 |
|
||||
| Vocabulary Size | 151,669 |
|
||||
| Max Context Length | 32,768 tokens |
|
||||
| Precision | bfloat16 |
|
||||
|
||||
---
|
||||
|
||||
## Citation
|
||||
|
||||
If you use this model, please cite the original LaCo paper and Qwen3:
|
||||
|
||||
```bibtex
|
||||
@article{yang2024laco,
|
||||
title={LaCo: Large Language Model Pruning via Layer Collapse},
|
||||
author={Yang, Yifei and Cao, Zouying and Zhao, Hai},
|
||||
journal={arXiv preprint arXiv:2402.11187},
|
||||
year={2024}
|
||||
}
|
||||
|
||||
@misc{qwen3technicalreport,
|
||||
title={Qwen3 Technical Report},
|
||||
author={Qwen Team},
|
||||
year={2025},
|
||||
eprint={2505.09388},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CL},
|
||||
url={https://arxiv.org/abs/2505.09388}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [LaCo Paper](https://arxiv.org/abs/2402.11187)
|
||||
- [LaCo Official Implementation](https://github.com/yangyifei729/LaCo)
|
||||
- [Qwen3 Technical Report](https://arxiv.org/abs/2505.09388)
|
||||
- [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base)
|
||||
|
||||
## License
|
||||
|
||||
Apache 2.0 (same as base Qwen3 model)
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
- Qwen Team for the excellent Qwen3-8B-Base model
|
||||
- LaCo authors for the pruning methodology
|
||||
- Hugging Face for model hosting
|
||||
Reference in New Issue
Block a user