7.4 KiB
7.4 KiB
library_name, license, license_link, pipeline_tag, base_model
| library_name | license | license_link | pipeline_tag | base_model | |
|---|---|---|---|---|---|
| transformers | apache-2.0 | https://huggingface.co/Qwen/Qwen3-8B/blob/main/LICENSE | text-generation |
|
Qwen3-8B-Instruct-2512-SFT
NOTE:This model is the Instruct-aligned variant, and it will not generate <think></think> blocks in its outputs.
Additionally, there is no need to specify enable_thinking=False anymore.
Among them, the 8B and 14B SFT and DFT variants are obtained via full-parameter fine-tuning, while the 32B models are trained using LoRA due to hardware resource constraints.
The dataset used is the Chinese Distillation Dataset based on Qwen3-235B-2507.available at:Chinese-Qwen3-235B-2507-Distill-data-110k
For details and code regarding model training and quantization, please seeTraining and Quantization Guide
Here is the list of models released in this version:
| Model | 4-bit AWQ | 8-bit FP8 | GPTQ | NVIDIA FP4 | Weight-Activation | ||||
|---|---|---|---|---|---|---|---|---|---|
| AWQ | AWQ-asym | INT4 | INT8 | NVFP4 | NVFP4-A16 | W4A16 | W8A8 | ||
| Qwen3-8B-Instruct-2512-DFT | AWQ | awq-asym | FP8 | GPTQ(int4) | GPTQ(int8) | NVFP4 | NVFP4A16 | W4A16 | W8A8 |
| Qwen3-8B-Instruct-2512-SFT | AWQ | awq-asym | FP8 | GPTQ(int4) | GPTQ(int8) | NVFP4 | NVFP4A16 | W4A16 | W8A8 |
| Qwen3-14B-Instruct-2512-DFT | AWQ | awq-asym | FP8 | GPTQ(int4) | GPTQ(int8) | NVFP4 | NVFP4A16 | W4A16 | W8A8 |
| Qwen3-14B-Instruct-2512-SFT | AWQ | awq-asym | FP8 | GPTQ(int4) | GPTQ(int8) | NVFP4 | NVFP4A16 | W4A16 | W8A8 |
| Qwen3-32B-Instruct-2512-DFT | AWQ | awq-asym | FP8 | GPTQ(int4) | GPTQ(int8) | NVFP4 | NVFP4A16 | W4A16 | W8A8 |
【Dependencies】
vllm>=0.10.2
transformers>=4.56.1