初始化项目，由ModelHub XC社区提供模型

Model: Mercity/Qwen3-8B-LaCo-Pruned Source: Original Platform
2026-05-12 06:01:10 +08:00
commit b1d3ea69a0
16 changed files with 152772 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,296 @@
+---
+license: apache-2.0
+base_model: Qwen/Qwen3-8B-Base
+arvix: arxiv:2507.02279
+tags:
+  - pruning
+  - layer-pruning
+  - laco
+  - compressed
+  - qwen3
+  - llm
+  - efficient
+library_name: transformers
+pipeline_tag: text-generation
+language:
+  - en
+  - zh
+  - multilingual
+datasets:
+  - wikipedia
+model-index:
+  - name: Qwen3-8B-LaCo-Pruned
+    results:
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          name: HellaSwag
+          type: hellaswag
+        metrics:
+          - type: accuracy_norm
+            value: 48.52
+            name: Accuracy (Normalized)
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          name: PIQA
+          type: piqa
+        metrics:
+          - type: accuracy_norm
+            value: 65.67
+            name: Accuracy (Normalized)
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          name: BoolQ
+          type: boolq
+        metrics:
+          - type: accuracy
+            value: 61.77
+            name: Accuracy
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          name: MMLU
+          type: mmlu
+        metrics:
+          - type: accuracy
+            value: 25.12
+            name: Accuracy (5-shot)
+---
+
+# Qwen3-8B-LaCo-Pruned
+
+This model is a **layer-pruned** version of [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) using the [LaCo (Layer Collapse)](https://arxiv.org/abs/2402.11187) structured pruning method.
+
+## Model Summary
+
+| Attribute | Value |
+|-----------|-------|
+| **Base Model** | [Qwen/Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) |
+| **Pruning Method** | LaCo (Layer Collapse) |
+| **Original Layers** | 36 |
+| **Pruned Layers** | 26 |
+| **Layers Removed** | 10 |
+| **Compression** | 27.8% |
+| **Parameters** | ~5.8B (reduced from ~8B) |
+
+## Intended Use
+
+- **Research** on model compression and efficiency
+- **Fine-tuning base** for domain-specific applications
+- **Inference optimization** where speed/memory matters more than factual accuracy
+- **Edge deployment** scenarios with limited computational resources
+
+## ⚠️ Important Limitations
+
+This pruned model has **significantly reduced factual knowledge capabilities**. It performs at near-random levels on knowledge-intensive benchmarks like MMLU.
+
+| Use Case | Status |
+|----------|--------|
+| Physical reasoning tasks | ✅ Good (82.6% retained) |
+| Reading comprehension | ⚠️ Acceptable (74.3% retained) |
+| Common sense reasoning | ⚠️ Degraded (61.8% retained) |
+| Factual question answering | ❌ Not recommended |
+| Knowledge-intensive tasks | ❌ Not recommended |
+
+**Recommendation:** Fine-tune this model on your target domain before deployment.
+
+---
+
+## Pruning Details
+
+### LaCo Hyperparameters
+
+| Parameter | Value | Description |
+|-----------|-------|-------------|
+| MERGE_LAYERS (C) | 3 | Layers merged per operation |
+| LOWEST_LAY (L) | 4 | Minimum layer index for merging |
+| HIGHEST_LAY (H) | 28 | Maximum layer index for merging |
+| INTERVAL (I) | 2 | Minimum gap between merge points |
+| THRESHOLD (T) | 0.85 | Cosine similarity threshold |
+| MAX_COMPRESSION | 30% | Maximum allowed compression |
+
+### Pruning Statistics
+
+| Metric | Value |
+|--------|-------|
+| Successful Merges | 5 |
+| Rejected Merges | 0 |
+| Total Iterations | 6 |
+| Final Compression | 27.8% |
+
+### Hidden State Similarity (Calibration Set)
+
+| Metric | Value |
+|--------|-------|
+| Average | 0.9680 |
+| Min | 0.9492 |
+| Max | 0.9766 |
+
+Individual similarities: `[0.9492, 0.9727, 0.9609, 0.9766, 0.9688, 0.9648, 0.9648, 0.9766, 0.9727, 0.9727]`
+
+### Perplexity Results
+
+| Model | Perplexity | Ratio |
+|-------|------------|-------|
+| Original (Qwen3-8B-Base) | 26.19 | 1.00× |
+| Pruned (this model) | 71.48 | **2.73×** |
+
+---
+
+## Benchmark Results
+
+### Comparison with Original Qwen3-8B-Base
+
+| Benchmark | Original | Pruned | Retention | Status |
+|-----------|----------|--------|-----------|--------|
+| **PIQA** | 79.54% | 65.67% | 82.6% | ✅ Good |
+| **BoolQ** | 83.09% | 61.77% | 74.3% | ⚠️ Acceptable |
+| **HellaSwag** | 78.55% | 48.52% | 61.8% | ⚠️ Degraded |
+| **MMLU (5-shot)** | 76.89% | 25.12% | 32.7% | ❌ Near random |
+
+*Original scores from [Qwen3 Technical Report](https://arxiv.org/abs/2505.09388)*
+
+### Key Findings
+
+1. **Physical reasoning preserved:** PIQA retained 82.6% of original performance
+2. **Factual knowledge destroyed:** MMLU collapsed to random-chance (25% for 4-way MCQ)
+3. **Perplexity underestimates damage:** 2.73× PPL ratio doesn't predict the benchmark collapse
+4. **Layer-specific knowledge:** Factual knowledge appears encoded in specific removed layers
+
+---
+
+## Usage
+
+### Basic Inference
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_name = "Mercity/Qwen3-8B-LaCo-Pruned"
+
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto",
+    trust_remote_code=True
+)
+
+# Text generation
+prompt = "The process of photosynthesis"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+### With 4-bit Quantization (Further Compression)
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype="float16",
+    bnb_4bit_quant_type="nf4",
+)
+
+model = AutoModelForCausalLM.from_pretrained(
+    "Mercity/Qwen3-8B-LaCo-Pruned",
+    quantization_config=quantization_config,
+    device_map="auto",
+    trust_remote_code=True
+)
+```
+
+---
+
+## Recovery Recommendations
+
+To restore performance after pruning:
+
+### Option 1: LoRA Fine-tuning (Recommended)
+```python
+from peft import LoraConfig, get_peft_model
+
+lora_config = LoraConfig(
+    r=32,
+    lora_alpha=64,
+    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", 
+                    "gate_proj", "up_proj", "down_proj"],
+    lora_dropout=0.05,
+)
+model = get_peft_model(model, lora_config)
+# Fine-tune on OpenOrca, Alpaca, or domain-specific data
+```
+
+### Option 2: Knowledge Distillation
+Use original Qwen3-8B-Base as teacher to transfer knowledge back.
+
+### Expected Recovery
+- With fine-tuning: +15-25% on MMLU
+- With knowledge distillation: +25-35% on MMLU
+
+---
+
+## Technical Specifications
+
+| Attribute | Value |
+|-----------|-------|
+| Architecture | Transformer decoder-only |
+| Parameters | ~5.8B |
+| Layers | 26 |
+| Hidden Size | 4096 |
+| Attention Heads (Q) | 32 |
+| Attention Heads (KV) | 8 (GQA) |
+| Intermediate Size | 12288 |
+| Vocabulary Size | 151,669 |
+| Max Context Length | 32,768 tokens |
+| Precision | bfloat16 |
+
+---
+
+## Citation
+
+If you use this model, please cite the original LaCo paper and Qwen3:
+
+```bibtex
+@article{yang2024laco,
+  title={LaCo: Large Language Model Pruning via Layer Collapse},
+  author={Yang, Yifei and Cao, Zouying and Zhao, Hai},
+  journal={arXiv preprint arXiv:2402.11187},
+  year={2024}
+}
+
+@misc{qwen3technicalreport,
+  title={Qwen3 Technical Report},
+  author={Qwen Team},
+  year={2025},
+  eprint={2505.09388},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  url={https://arxiv.org/abs/2505.09388}
+}
+```
+
+## References
+
+- [LaCo Paper](https://arxiv.org/abs/2402.11187)
+- [LaCo Official Implementation](https://github.com/yangyifei729/LaCo)
+- [Qwen3 Technical Report](https://arxiv.org/abs/2505.09388)
+- [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base)
+
+## License
+
+Apache 2.0 (same as base Qwen3 model)
+
+## Acknowledgments
+
+- Qwen Team for the excellent Qwen3-8B-Base model
+- LaCo authors for the pruning methodology
+- Hugging Face for model hosting