初始化项目，由ModelHub XC社区提供模型

Model: zhangsq-nju/MobileLLM-350M-EdgeRazor-2.79bit Source: Original Platform
2026-04-21 23:36:14 +08:00
commit 53683fe467
15 changed files with 1773 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,111 @@
+---
+base_model: facebook/MobileLLM-ParetoQ-350M-BF16
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- mobilellm
+- edgerazor
+- quantization
+license: other
+license_name: fair-noncommercial-research
+license_link: https://huggingface.co/facebook/MobileLLM-ParetoQ-350M-BF16/blob/main/LICENSE
+---
+
+<div align="center">
+  <br/>
+  <img src="./asset/Logo-HF.png" alt="EdgeRazor Logo" width="60%">
+  <h3>
+    EdgeRazor for Lightweight LLMs
+  </h3>
+
+  <p>
+    <!-- <a href="https://arxiv.org/abs/2604.xxxxx" target="blank">
+      <img src="https://img.shields.io/badge/arXiv-EdgeRazor-b31b1b?style=flat&logo=arxiv" alt="arXiv EdgeRazor">
+    </a> -->
+    <a href="https://github.com/zhangsq-nju/EdgeRazor" target="blank">
+      <img src="https://img.shields.io/badge/GitHub-EdgeRazor-blue?style=flat&logo=github" alt="GitHub EdgeRazor">
+    </a>
+  </p>
+
+
+</div>
+
+<h1>MobileLLM-350M-EdgeRazor-2.79bit</h1>
+
+## Contents
+
+- [Contents](#contents)
+- [Model Overview](#model-overview)
+- [Model Bit-Widths](#model-bit-widths)
+- [Model Performance](#model-performance)
+- [Quickstart](#quickstart)
+- [Citation](#citation)
+
+## Model Overview
+
+- Base Model: [facebook/MobileLLM-ParetoQ-350M-BF16](https://huggingface.co/facebook/MobileLLM-ParetoQ-350M-BF16)
+- Training: [zhangsq-nju/EdgeRazor](https://github.com/zhangsq-nju/EdgeRazor)
+- Quantization: 2.79-bit for all decoder layers; 4-bit for embedding and lm_head
+
+## Model Bit-Widths
+
+| Mixed-Precision Recipe       | Bit-Width | This Repo    |
+| ---------------------------- | --------- | ------------ |
+| 100% 4-bit + 0% 1.58-bit     | 4         |              |
+| 50% 4-bit + 50% 1.58-bit     | 2.79      | ✔️ |
+| 12.5% 4-bit + 87.5% 1.58-bit | 1.88      |              |
+| 0% 4-bit + 100% 1.58-bit     | 1.58      |              |
+
+## Model Performance
+
+| Models         | W-A-KV     | ARC-e | ARC-c | HellaS. | BoolQ | PIQA  | WinoG. | SIQA  | OBQA  | Tr.QA2 | Ethics | MMLU  | GSM8K | HumanE. | Average (↑) |
+| -------------- | ---------- | ----- | ----- | ------- | ----- | ----- | ------ | ----- | ----- | ------ | ------ | ----- | ----- | ------- | ----------- |
+| MobileLLM-350M | 16-16-16   | 64.94 | 35.49 | 52.87   | 58.96 | 70.84 | 56.35  | 40.79 | 40.20 | 37.44  | 53.98  | 23.52 | 0.00  | 0.00    | **41.18**   |
+| EdgeRazor      | 4-16-16    | 69.19 | 36.26 | 51.91   | 62.26 | 70.40 | 56.20  | 40.74 | 37.40 | 37.96  | 57.41  | 25.00 | 0.53  | 0.00    | **41.94**   |
+| EdgeRazor      | 2.79-16-16 | 65.87 | 32.68 | 45.98   | 61.71 | 68.82 | 56.27  | 40.02 | 35.00 | 38.97  | 56.53  | 24.27 | 0.76  | 0.00    | **40.53**   |
+| EdgeRazor      | 1.88-16-16 | 61.20 | 28.75 | 40.76   | 58.23 | 66.59 | 55.01  | 39.51 | 33.00 | 40.98  | 56.22  | 25.03 | 0.53  | 0.00    | **38.91**   |
+| EdgeRazor      | 1.58-16-16 | 58.63 | 26.19 | 38.95   | 58.07 | 65.29 | 53.04  | 39.30 | 32.20 | 41.97  | 56.26  | 24.12 | 0.53  | 0.00    | **38.04**   |
+| EdgeRazor      | 4-8-8      | 69.11 | 35.84 | 51.82   | 62.60 | 70.35 | 56.20  | 40.58 | 37.40 | 37.90  | 57.21  | 24.66 | 0.45  | 0.00    | **41.86**   |
+| EdgeRazor      | 2.79-8-8   | 65.99 | 32.68 | 45.99   | 62.11 | 68.55 | 56.51  | 40.07 | 35.20 | 39.05  | 56.51  | 24.41 | 0.99  | 0.00    | **40.62**   |
+| EdgeRazor      | 1.88-8-8   | 61.36 | 29.18 | 40.86   | 58.23 | 66.92 | 55.49  | 39.56 | 33.20 | 40.95  | 56.13  | 24.97 | 0.38  | 0.00    | **39.02**   |
+| EdgeRazor      | 1.58-8-8   | 58.67 | 26.19 | 38.92   | 58.04 | 65.23 | 53.83  | 39.25 | 32.00 | 42.03  | 56.33  | 24.19 | 0.83  | 0.00    | **38.12**   |
+
+## Quickstart
+
+It is recommended to ensure that `EdgeRazor` is installed in advance for weight-activation quantization. The provided weights are already quantized (quantized_weights*scaling_bf16); to enable activation and KV cache quantization, set `trust_remote_code=True` in the model configuration.
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained(
+    "zhangsq-nju/MobileLLM-ParetoQ-350M-BF16-EdgeRazor-2.79bit",
+    use_fast=False
+)
+model = AutoModelForCausalLM.from_pretrained(
+    "zhangsq-nju/MobileLLM-ParetoQ-350M-BF16-EdgeRazor-2.79bit", 
+    trust_remote_code=True
+)
+```
+
+Note that the default tokenizer does not contain special tokens. For example you can use:
+
+```bash
+tokenizer.add_special_tokens(
+    {
+        "eos_token": "</s>",
+        "bos_token": "<s>",
+        "unk_token": "<unk>",
+    }
+)
+```
+
+## Citation
+
+If you find our project useful in your research, please consider kindly citing our papers ✏️:
+
+```
+@article{zhangsh-edgerazor,
+  title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation},
+  author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang},
+  year={2026},
+}
+```