MobileLLM-350M-EdgeRazor-1.…/README.md

---
base_model: facebook/MobileLLM-ParetoQ-350M-BF16
library_name: transformers
pipeline_tag: text-generation
tags:
- mobilellm
- edgerazor
- quantization
license: other
license_name: fair-noncommercial-research
license_link: https://huggingface.co/facebook/MobileLLM-ParetoQ-350M-BF16/blob/main/LICENSE
---

<div align="center">
  <br/>
  <img src="./asset/Logo-HF.png" alt="EdgeRazor Logo" width="60%">
  <h3>
    EdgeRazor for Lightweight LLMs
  </h3>

  <p>
    <!-- <a href="https://arxiv.org/abs/2604.xxxxx" target="blank">
      <img src="https://img.shields.io/badge/arXiv-EdgeRazor-b31b1b?style=flat&logo=arxiv" alt="arXiv EdgeRazor">
    </a> -->
    <a href="https://github.com/zhangsq-nju/EdgeRazor" target="blank">
      <img src="https://img.shields.io/badge/GitHub-EdgeRazor-blue?style=flat&logo=github" alt="GitHub EdgeRazor">
    </a>
  </p>


</div>

<h1>MobileLLM-350M-EdgeRazor-1.88bit</h1>

## Contents

- [Contents](#contents)
- [Model Overview](#model-overview)
- [Model Bit-Widths](#model-bit-widths)
- [Model Performance](#model-performance)
- [Quickstart](#quickstart)
- [Citation](#citation)

## Model Overview

- Base Model: [facebook/MobileLLM-ParetoQ-350M-BF16](https://huggingface.co/facebook/MobileLLM-ParetoQ-350M-BF16)
- Training: [zhangsq-nju/EdgeRazor](https://github.com/zhangsq-nju/EdgeRazor)
- Quantization: 1.88-bit for all decoder layers; 4-bit for embedding and lm_head

## Model Bit-Widths

| Mixed-Precision Recipe       | Bit-Width | This Repo    |
| ---------------------------- | --------- | ------------ |
| 100% 4-bit + 0% 1.58-bit     | 4         |              |
| 50% 4-bit + 50% 1.58-bit     | 2.79      |              |
| 12.5% 4-bit + 87.5% 1.58-bit | 1.88      | ✔️ |
| 0% 4-bit + 100% 1.58-bit     | 1.58      |              |

## Model Performance

| Models         | W-A-KV     | ARC-e | ARC-c | HellaS. | BoolQ | PIQA  | WinoG. | SIQA  | OBQA  | Tr.QA2 | Ethics | MMLU  | GSM8K | HumanE. | Average (↑) |
| -------------- | ---------- | ----- | ----- | ------- | ----- | ----- | ------ | ----- | ----- | ------ | ------ | ----- | ----- | ------- | ----------- |
| MobileLLM-350M | 16-16-16   | 64.94 | 35.49 | 52.87   | 58.96 | 70.84 | 56.35  | 40.79 | 40.20 | 37.44  | 53.98  | 23.52 | 0.00  | 0.00    | **41.18**   |
| EdgeRazor      | 4-16-16    | 69.19 | 36.26 | 51.91   | 62.26 | 70.40 | 56.20  | 40.74 | 37.40 | 37.96  | 57.41  | 25.00 | 0.53  | 0.00    | **41.94**   |
| EdgeRazor      | 2.79-16-16 | 65.87 | 32.68 | 45.98   | 61.71 | 68.82 | 56.27  | 40.02 | 35.00 | 38.97  | 56.53  | 24.27 | 0.76  | 0.00    | **40.53**   |
| EdgeRazor      | 1.88-16-16 | 61.20 | 28.75 | 40.76   | 58.23 | 66.59 | 55.01  | 39.51 | 33.00 | 40.98  | 56.22  | 25.03 | 0.53  | 0.00    | **38.91**   |
| EdgeRazor      | 1.58-16-16 | 58.63 | 26.19 | 38.95   | 58.07 | 65.29 | 53.04  | 39.30 | 32.20 | 41.97  | 56.26  | 24.12 | 0.53  | 0.00    | **38.04**   |
| EdgeRazor      | 4-8-8      | 69.11 | 35.84 | 51.82   | 62.60 | 70.35 | 56.20  | 40.58 | 37.40 | 37.90  | 57.21  | 24.66 | 0.45  | 0.00    | **41.86**   |
| EdgeRazor      | 2.79-8-8   | 65.99 | 32.68 | 45.99   | 62.11 | 68.55 | 56.51  | 40.07 | 35.20 | 39.05  | 56.51  | 24.41 | 0.99  | 0.00    | **40.62**   |
| EdgeRazor      | 1.88-8-8   | 61.36 | 29.18 | 40.86   | 58.23 | 66.92 | 55.49  | 39.56 | 33.20 | 40.95  | 56.13  | 24.97 | 0.38  | 0.00    | **39.02**   |
| EdgeRazor      | 1.58-8-8   | 58.67 | 26.19 | 38.92   | 58.04 | 65.23 | 53.83  | 39.25 | 32.00 | 42.03  | 56.33  | 24.19 | 0.83  | 0.00    | **38.12**   |

## Quickstart

It is recommended to ensure that `EdgeRazor` is installed in advance for weight-activation quantization. The provided weights are already quantized (quantized_weights*scaling_bf16); to enable activation and KV cache quantization, set `trust_remote_code=True` in the model configuration.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "zhangsq-nju/MobileLLM-ParetoQ-350M-BF16-EdgeRazor-1.88bit",
    use_fast=False
)
model = AutoModelForCausalLM.from_pretrained(
    "zhangsq-nju/MobileLLM-ParetoQ-350M-BF16-EdgeRazor-1.88bit",
    trust_remote_code=True
)
```

Note that the default tokenizer does not contain special tokens. For example you can use:

```bash
tokenizer.add_special_tokens(
    {
        "eos_token": "</s>",
        "bos_token": "<s>",
        "unk_token": "<unk>",
    }
)
```

## Citation

If you find our project useful in your research, please consider kindly citing our papers ✏️:

```
@article{zhangsh-edgerazor,
  title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation},
  author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang},
  year={2026},
}
```