Files
ModelHub XC 5c8d8fe87a 初始化项目,由ModelHub XC社区提供模型
Model: zhangsq-nju/MobileLLM-350M-EdgeRazor-1.88bit
Source: Original Platform
2026-04-21 23:35:58 +08:00

4.7 KiB

base_model, library_name, pipeline_tag, tags, license, license_name, license_link
base_model library_name pipeline_tag tags license license_name license_link
facebook/MobileLLM-ParetoQ-350M-BF16 transformers text-generation
mobilellm
edgerazor
quantization
other fair-noncommercial-research https://huggingface.co/facebook/MobileLLM-ParetoQ-350M-BF16/blob/main/LICENSE

EdgeRazor Logo

EdgeRazor for Lightweight LLMs

GitHub EdgeRazor

MobileLLM-350M-EdgeRazor-1.88bit

Contents

Model Overview

Model Bit-Widths

Mixed-Precision Recipe Bit-Width This Repo
100% 4-bit + 0% 1.58-bit 4
50% 4-bit + 50% 1.58-bit 2.79
12.5% 4-bit + 87.5% 1.58-bit 1.88 ✔️
0% 4-bit + 100% 1.58-bit 1.58

Model Performance

Models W-A-KV ARC-e ARC-c HellaS. BoolQ PIQA WinoG. SIQA OBQA Tr.QA2 Ethics MMLU GSM8K HumanE. Average (↑)
MobileLLM-350M 16-16-16 64.94 35.49 52.87 58.96 70.84 56.35 40.79 40.20 37.44 53.98 23.52 0.00 0.00 41.18
EdgeRazor 4-16-16 69.19 36.26 51.91 62.26 70.40 56.20 40.74 37.40 37.96 57.41 25.00 0.53 0.00 41.94
EdgeRazor 2.79-16-16 65.87 32.68 45.98 61.71 68.82 56.27 40.02 35.00 38.97 56.53 24.27 0.76 0.00 40.53
EdgeRazor 1.88-16-16 61.20 28.75 40.76 58.23 66.59 55.01 39.51 33.00 40.98 56.22 25.03 0.53 0.00 38.91
EdgeRazor 1.58-16-16 58.63 26.19 38.95 58.07 65.29 53.04 39.30 32.20 41.97 56.26 24.12 0.53 0.00 38.04
EdgeRazor 4-8-8 69.11 35.84 51.82 62.60 70.35 56.20 40.58 37.40 37.90 57.21 24.66 0.45 0.00 41.86
EdgeRazor 2.79-8-8 65.99 32.68 45.99 62.11 68.55 56.51 40.07 35.20 39.05 56.51 24.41 0.99 0.00 40.62
EdgeRazor 1.88-8-8 61.36 29.18 40.86 58.23 66.92 55.49 39.56 33.20 40.95 56.13 24.97 0.38 0.00 39.02
EdgeRazor 1.58-8-8 58.67 26.19 38.92 58.04 65.23 53.83 39.25 32.00 42.03 56.33 24.19 0.83 0.00 38.12

Quickstart

It is recommended to ensure that EdgeRazor is installed in advance for weight-activation quantization. The provided weights are already quantized (quantized_weights*scaling_bf16); to enable activation and KV cache quantization, set trust_remote_code=True in the model configuration.

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "zhangsq-nju/MobileLLM-ParetoQ-350M-BF16-EdgeRazor-1.88bit",
    use_fast=False
)
model = AutoModelForCausalLM.from_pretrained(
    "zhangsq-nju/MobileLLM-ParetoQ-350M-BF16-EdgeRazor-1.88bit", 
    trust_remote_code=True
)

Note that the default tokenizer does not contain special tokens. For example you can use:

tokenizer.add_special_tokens(
    {
        "eos_token": "</s>",
        "bos_token": "<s>",
        "unk_token": "<unk>",
    }
)

Citation

If you find our project useful in your research, please consider kindly citing our papers ✏️:

@article{zhangsh-edgerazor,
  title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation},
  author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang},
  year={2026},
}