9662933715e16b0bb60ceb1e279ce4e56aed6eb0
Model: zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF Source: Original Platform
base_model, pipeline_tag, tags, license, license_link
| base_model | pipeline_tag | tags | license | license_link | |||
|---|---|---|---|---|---|---|---|
| Qwen/Qwen3-1.7B | text-generation |
|
apache-2.0 | https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/LICENSE |
Contents
Model Overview
- Base Model: Qwen/Qwen3-1.7B
- Training: zhangsq-nju/EdgeRazor
- Inference: ggml-org/llama.cpp
Model Bit-Widths
| Mixed-Precision Recipe | Bit-Width | This Repo | GGUF Type |
|---|---|---|---|
| 100% 4-bit + 0% 1.58-bit | 4 | ✔️ | Q4_0 |
| 50% 4-bit + 50% 1.58-bit | 2.79 | ✖️ | Not supported |
| 12.5% 4-bit + 87.5% 1.58-bit | 1.88 | ✖️ | Not supported |
| 0% 4-bit + 100% 1.58-bit | 1.58 | ✔️ | TQ1_0, TQ2_0 |
Get Started
Use llama.cpp to conduct efficient inference on edge devices.
Check the cli.sh script for basic usage.
Model list:
Qwen3-1.7B-BF16.gguf: BF16 model from the original Qwen3-1.7BQwen3-1.7B-EdgeRazor-Q4_0.gguf: Q4_0 model from the Qwen3-1.7B-EdgeRazor-4bitQwen3-1.7B-EdgeRazor-TQ1_0.gguf: TQ1_0 model from Qwen3-1.7B-EdgeRazor-1.58bitQwen3-1.7B-EdgeRazor-TQ2_0.gguf: TQ2_0 model from Qwen3-1.7B-EdgeRazor-1.58bit
Citation
If you find our project useful in your research, please consider kindly citing our papers ✏️:
@article{zhangsh-edgerazor,
title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation},
author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang},
year={2026},
journal={arXiv preprint arXiv:2605.04062}
}
Description
Languages
Shell
100%
