初始化项目,由ModelHub XC社区提供模型
Model: zhangsq-nju/MobileLLM-350M-EdgeRazor-2.79bit Source: Original Platform
This commit is contained in:
111
README.md
Normal file
111
README.md
Normal file
@@ -0,0 +1,111 @@
|
||||
---
|
||||
base_model: facebook/MobileLLM-ParetoQ-350M-BF16
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- mobilellm
|
||||
- edgerazor
|
||||
- quantization
|
||||
license: other
|
||||
license_name: fair-noncommercial-research
|
||||
license_link: https://huggingface.co/facebook/MobileLLM-ParetoQ-350M-BF16/blob/main/LICENSE
|
||||
---
|
||||
|
||||
<div align="center">
|
||||
<br/>
|
||||
<img src="./asset/Logo-HF.png" alt="EdgeRazor Logo" width="60%">
|
||||
<h3>
|
||||
EdgeRazor for Lightweight LLMs
|
||||
</h3>
|
||||
|
||||
<p>
|
||||
<!-- <a href="https://arxiv.org/abs/2604.xxxxx" target="blank">
|
||||
<img src="https://img.shields.io/badge/arXiv-EdgeRazor-b31b1b?style=flat&logo=arxiv" alt="arXiv EdgeRazor">
|
||||
</a> -->
|
||||
<a href="https://github.com/zhangsq-nju/EdgeRazor" target="blank">
|
||||
<img src="https://img.shields.io/badge/GitHub-EdgeRazor-blue?style=flat&logo=github" alt="GitHub EdgeRazor">
|
||||
</a>
|
||||
</p>
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
<h1>MobileLLM-350M-EdgeRazor-2.79bit</h1>
|
||||
|
||||
## Contents
|
||||
|
||||
- [Contents](#contents)
|
||||
- [Model Overview](#model-overview)
|
||||
- [Model Bit-Widths](#model-bit-widths)
|
||||
- [Model Performance](#model-performance)
|
||||
- [Quickstart](#quickstart)
|
||||
- [Citation](#citation)
|
||||
|
||||
## Model Overview
|
||||
|
||||
- Base Model: [facebook/MobileLLM-ParetoQ-350M-BF16](https://huggingface.co/facebook/MobileLLM-ParetoQ-350M-BF16)
|
||||
- Training: [zhangsq-nju/EdgeRazor](https://github.com/zhangsq-nju/EdgeRazor)
|
||||
- Quantization: 2.79-bit for all decoder layers; 4-bit for embedding and lm_head
|
||||
|
||||
## Model Bit-Widths
|
||||
|
||||
| Mixed-Precision Recipe | Bit-Width | This Repo |
|
||||
| ---------------------------- | --------- | ------------ |
|
||||
| 100% 4-bit + 0% 1.58-bit | 4 | |
|
||||
| 50% 4-bit + 50% 1.58-bit | 2.79 | ✔️ |
|
||||
| 12.5% 4-bit + 87.5% 1.58-bit | 1.88 | |
|
||||
| 0% 4-bit + 100% 1.58-bit | 1.58 | |
|
||||
|
||||
## Model Performance
|
||||
|
||||
| Models | W-A-KV | ARC-e | ARC-c | HellaS. | BoolQ | PIQA | WinoG. | SIQA | OBQA | Tr.QA2 | Ethics | MMLU | GSM8K | HumanE. | Average (↑) |
|
||||
| -------------- | ---------- | ----- | ----- | ------- | ----- | ----- | ------ | ----- | ----- | ------ | ------ | ----- | ----- | ------- | ----------- |
|
||||
| MobileLLM-350M | 16-16-16 | 64.94 | 35.49 | 52.87 | 58.96 | 70.84 | 56.35 | 40.79 | 40.20 | 37.44 | 53.98 | 23.52 | 0.00 | 0.00 | **41.18** |
|
||||
| EdgeRazor | 4-16-16 | 69.19 | 36.26 | 51.91 | 62.26 | 70.40 | 56.20 | 40.74 | 37.40 | 37.96 | 57.41 | 25.00 | 0.53 | 0.00 | **41.94** |
|
||||
| EdgeRazor | 2.79-16-16 | 65.87 | 32.68 | 45.98 | 61.71 | 68.82 | 56.27 | 40.02 | 35.00 | 38.97 | 56.53 | 24.27 | 0.76 | 0.00 | **40.53** |
|
||||
| EdgeRazor | 1.88-16-16 | 61.20 | 28.75 | 40.76 | 58.23 | 66.59 | 55.01 | 39.51 | 33.00 | 40.98 | 56.22 | 25.03 | 0.53 | 0.00 | **38.91** |
|
||||
| EdgeRazor | 1.58-16-16 | 58.63 | 26.19 | 38.95 | 58.07 | 65.29 | 53.04 | 39.30 | 32.20 | 41.97 | 56.26 | 24.12 | 0.53 | 0.00 | **38.04** |
|
||||
| EdgeRazor | 4-8-8 | 69.11 | 35.84 | 51.82 | 62.60 | 70.35 | 56.20 | 40.58 | 37.40 | 37.90 | 57.21 | 24.66 | 0.45 | 0.00 | **41.86** |
|
||||
| EdgeRazor | 2.79-8-8 | 65.99 | 32.68 | 45.99 | 62.11 | 68.55 | 56.51 | 40.07 | 35.20 | 39.05 | 56.51 | 24.41 | 0.99 | 0.00 | **40.62** |
|
||||
| EdgeRazor | 1.88-8-8 | 61.36 | 29.18 | 40.86 | 58.23 | 66.92 | 55.49 | 39.56 | 33.20 | 40.95 | 56.13 | 24.97 | 0.38 | 0.00 | **39.02** |
|
||||
| EdgeRazor | 1.58-8-8 | 58.67 | 26.19 | 38.92 | 58.04 | 65.23 | 53.83 | 39.25 | 32.00 | 42.03 | 56.33 | 24.19 | 0.83 | 0.00 | **38.12** |
|
||||
|
||||
## Quickstart
|
||||
|
||||
It is recommended to ensure that `EdgeRazor` is installed in advance for weight-activation quantization. The provided weights are already quantized (quantized_weights*scaling_bf16); to enable activation and KV cache quantization, set `trust_remote_code=True` in the model configuration.
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
tokenizer = AutoTokenizer.from_pretrained(
|
||||
"zhangsq-nju/MobileLLM-ParetoQ-350M-BF16-EdgeRazor-2.79bit",
|
||||
use_fast=False
|
||||
)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"zhangsq-nju/MobileLLM-ParetoQ-350M-BF16-EdgeRazor-2.79bit",
|
||||
trust_remote_code=True
|
||||
)
|
||||
```
|
||||
|
||||
Note that the default tokenizer does not contain special tokens. For example you can use:
|
||||
|
||||
```bash
|
||||
tokenizer.add_special_tokens(
|
||||
{
|
||||
"eos_token": "</s>",
|
||||
"bos_token": "<s>",
|
||||
"unk_token": "<unk>",
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Citation
|
||||
|
||||
If you find our project useful in your research, please consider kindly citing our papers ✏️:
|
||||
|
||||
```
|
||||
@article{zhangsh-edgerazor,
|
||||
title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation},
|
||||
author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang},
|
||||
year={2026},
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user