[doc]Modify quantization tutorials (#5026)
### What this PR does / why we need it?
Modify quantization tutorials to correct a few mistakes:
Qwen3-32B-W4A4.md and Qwen3-8B-W4A8.md
Qwen3-8B-W4A8: need to set one idle npu card.
Qwen3-32B-W4A4: need to set two idle npu cards for the flatquant
training and modify the calib_file path which does not match the
ModeSlim version.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: IncSec <1790766300@qq.com>
This commit is contained in:
@@ -55,10 +55,12 @@ cd example/Qwen
|
|||||||
MODEL_PATH=/home/models/Qwen3-32B
|
MODEL_PATH=/home/models/Qwen3-32B
|
||||||
# Path to save converted weight, Replace with your local path
|
# Path to save converted weight, Replace with your local path
|
||||||
SAVE_PATH=/home/models/Qwen3-32B-w4a4
|
SAVE_PATH=/home/models/Qwen3-32B-w4a4
|
||||||
|
# Set two idle NPU cards
|
||||||
|
export ASCEND_RT_VISIBLE_DEVICES=0,1
|
||||||
|
|
||||||
python3 w4a4.py --model_path $MODEL_PATH \
|
python3 w4a4.py --model_path $MODEL_PATH \
|
||||||
--save_directory $SAVE_PATH \
|
--save_directory $SAVE_PATH \
|
||||||
--calib_file ../common/qwen_qwen3_cot_w4a4.json \
|
--calib_file ./calib_data/qwen3_cot_w4a4.json \
|
||||||
--trust_remote_code True \
|
--trust_remote_code True \
|
||||||
--batch_size 1
|
--batch_size 1
|
||||||
```
|
```
|
||||||
|
|||||||
@@ -47,6 +47,8 @@ cd example/Qwen
|
|||||||
MODEL_PATH=/home/models/Qwen3-8B
|
MODEL_PATH=/home/models/Qwen3-8B
|
||||||
# Path to save converted weight, Replace with your local path
|
# Path to save converted weight, Replace with your local path
|
||||||
SAVE_PATH=/home/models/Qwen3-8B-w4a8
|
SAVE_PATH=/home/models/Qwen3-8B-w4a8
|
||||||
|
# Set an idle NPU card
|
||||||
|
export ASCEND_RT_VISIBLE_DEVICES=0
|
||||||
|
|
||||||
python quant_qwen.py \
|
python quant_qwen.py \
|
||||||
--model_path $MODEL_PATH \
|
--model_path $MODEL_PATH \
|
||||||
|
|||||||
Reference in New Issue
Block a user