35 lines
1.0 KiB
Markdown
35 lines
1.0 KiB
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
base_model: Qwen/Qwen3-1.7B
|
||
|
|
tags:
|
||
|
|
- quantization
|
||
|
|
- reasoning
|
||
|
|
- qat
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
---
|
||
|
|
|
||
|
|
# ReasoningQAT-Qwen3-1.7B-3bit
|
||
|
|
|
||
|
|
This model is a **3-bit pseudo-quantized** version of [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B), trained with Quantization-Aware Training (QAT) for reasoning tasks.
|
||
|
|
|
||
|
|
## Details
|
||
|
|
|
||
|
|
- **Base model:** Qwen3-1.7B
|
||
|
|
- **Quantization:** W3G128 (3-bit weights, group size 128)
|
||
|
|
- **Format:** Pseudo-quantized (stored in FP16; weights lie on 3-bit quantization grids)
|
||
|
|
- **Method:** ReasoningQAT — QAT combining knowledge distillation with teacher-confidence-weighted DFT loss, trained end-to-end on reasoning data
|
||
|
|
|
||
|
|
## Citation
|
||
|
|
|
||
|
|
```bibtex
|
||
|
|
@inproceedings{
|
||
|
|
okoshi2026towards,
|
||
|
|
title={Towards Quantization-Aware Training for Ultra-Low-Bit Reasoning {LLM}s},
|
||
|
|
author={Yasuyuki Okoshi and Hikari Otsuka and Daichi Fujiki and Masato Motomura},
|
||
|
|
booktitle={The Fourteenth International Conference on Learning Representations},
|
||
|
|
year={2026},
|
||
|
|
url={https://openreview.net/forum?id=Azsd2qyK6C}
|
||
|
|
}
|
||
|
|
```
|