This model is a 3-bit pseudo-quantized version of Qwen3-1.7B, trained with Quantization-Aware Training (QAT) for reasoning tasks.
Details
Base model: Qwen3-1.7B
Quantization: W3G128 (3-bit weights, group size 128)
Format: Pseudo-quantized (stored in FP16; weights lie on 3-bit quantization grids)
Method: ReasoningQAT — QAT combining knowledge distillation with teacher-confidence-weighted DFT loss, trained end-to-end on reasoning data
Citation
@inproceedings{okoshi2026towards,title={Towards Quantization-Aware Training for Ultra-Low-Bit Reasoning {LLM}s},author={Yasuyuki Okoshi and Hikari Otsuka and Daichi Fujiki and Masato Motomura},booktitle={The Fourteenth International Conference on Learning Representations},year={2026},url={https://openreview.net/forum?id=Azsd2qyK6C}}