[DOC] update modelslim version (#908)
1. update modelslim version to fix deepseek related issues 2. add note for "--quantization ascend" Signed-off-by: 22dimensions <waitingwind@foxmail.com>
This commit is contained in:
@@ -36,7 +36,7 @@ see https://www.modelscope.cn/models/vllm-ascend/QwQ-32B-W8A8
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# (Optional)This tag is recommended and has been verified
|
# (Optional)This tag is recommended and has been verified
|
||||||
git clone https://gitee.com/ascend/msit -b modelslim-VLLM-8.1.RC1.b020
|
git clone https://gitee.com/ascend/msit -b modelslim-VLLM-8.1.RC1.b020_001
|
||||||
|
|
||||||
cd msit/msmodelslim
|
cd msit/msmodelslim
|
||||||
# Install by run this script
|
# Install by run this script
|
||||||
@@ -68,6 +68,10 @@ The converted model files looks like:
|
|||||||
```
|
```
|
||||||
|
|
||||||
Run the following script to start the vLLM server with quantize model:
|
Run the following script to start the vLLM server with quantize model:
|
||||||
|
|
||||||
|
:::{note}
|
||||||
|
The value "ascend" for "--quantization" argument will be supported after [a specific PR](https://github.com/vllm-project/vllm-ascend/pull/877) is merged and released, you can cherry-pick this commit for now.
|
||||||
|
:::
|
||||||
```bash
|
```bash
|
||||||
vllm serve /home/models/QwQ-32B-w8a8 --tensor-parallel-size 4 --served-model-name "qwq-32b-w8a8" --max-model-len 4096 --quantization ascend
|
vllm serve /home/models/QwQ-32B-w8a8 --tensor-parallel-size 4 --served-model-name "qwq-32b-w8a8" --max-model-len 4096 --quantization ascend
|
||||||
```
|
```
|
||||||
|
|||||||
Reference in New Issue
Block a user