[doc]Modify quantization tutorials (#5026)

### What this PR does / why we need it? Modify quantization tutorials to correct a few mistakes: Qwen3-32B-W4A4.md and Qwen3-8B-W4A8.md Qwen3-8B-W4A8: need to set one idle npu card. Qwen3-32B-W4A4: need to set two idle npu cards for the flatquant training and modify the calib_file path which does not match the ModeSlim version. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: ad32e3e19c Signed-off-by: IncSec <1790766300@qq.com>
2025-12-15 20:12:06 +08:00
parent e90e8afc94
commit a5cb8e40f5
2 changed files with 5 additions and 1 deletions
--- a/docs/source/tutorials/Qwen3-32B-W4A4.md
+++ b/docs/source/tutorials/Qwen3-32B-W4A4.md
@@ -55,10 +55,12 @@ cd example/Qwen
 MODEL_PATH=/home/models/Qwen3-32B
 # Path to save converted weight, Replace with your local path
 SAVE_PATH=/home/models/Qwen3-32B-w4a4
 # Set two idle NPU cards
 export ASCEND_RT_VISIBLE_DEVICES=0,1
 python3 w4a4.py --model_path $MODEL_PATH \
                --save_directory $SAVE_PATH \
-                --calib_file ../common/qwen_qwen3_cot_w4a4.json \
+                --calib_file ./calib_data/qwen3_cot_w4a4.json \
                --trust_remote_code True \
                --batch_size 1
 ```
--- a/docs/source/tutorials/Qwen3-8B-W4A8.md
+++ b/docs/source/tutorials/Qwen3-8B-W4A8.md
@@ -47,6 +47,8 @@ cd example/Qwen
 MODEL_PATH=/home/models/Qwen3-8B
 # Path to save converted weight, Replace with your local path
 SAVE_PATH=/home/models/Qwen3-8B-w4a8
 # Set an idle NPU card
 export ASCEND_RT_VISIBLE_DEVICES=0
 python quant_qwen.py \
          --model_path $MODEL_PATH \