[doc][main] Correct mistakes in doc (#4945)

### What this PR does / why we need it?
Correct mistakes in doc

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
This commit is contained in:
lilinsiman
2025-12-12 19:17:10 +08:00
committed by GitHub
parent f708d919f8
commit fc818f1509
9 changed files with 18 additions and 28 deletions

View File

@@ -76,7 +76,7 @@ vllm serve Qwen/Qwen3-235B-A22 \
- Network bandwidth must support expert redistribution traffic (≥ 10 Gbps recommended).
3. Model Compatibility:
- Only MoE models with explicit expert parallelism support (e.g., Qwen3-235B-A22) are compatible.
- Only MoE models with explicit expert parallelism support (e.g., Qwen3 MoE models) are compatible.
- Verify model architecture supports dynamic expert routing through --enable-expert-parallel.
4. Gating Configuration:

View File

@@ -113,6 +113,7 @@ python3 -m vllm.entrypoints.openai.api_server \
"kv_role": "kv_producer",
"kv_port": "20001",
"kv_connector_extra_config": {
"use_ascend_direct": true,
"prefill": {
"dp_size": 1,
"tp_size": 1

View File

@@ -7,6 +7,11 @@ You can refer to [Supported Models](https://docs.vllm.ai/en/latest/models/suppor
You can run LoRA with ACLGraph mode now. Please refer to [Graph Mode Guide](./graph_mode.md) for a better LoRA performance.
Address for downloading models:\
base model: https://www.modelscope.cn/models/vllm-ascend/Llama-2-7b-hf/files \
lora model:
https://www.modelscope.cn/models/vllm-ascend/llama-2-7b-sql-lora-test/files
## Example
We provide a simple LoRA example here, which enables the ACLGraph mode by default.

View File

@@ -6,13 +6,13 @@ Since version 0.9.0rc2, the quantization feature is experimentally supported by
## Install ModelSlim
To quantize a model, you should install [ModelSlim](https://gitee.com/ascend/msit/blob/master/msmodelslim/README.md) which is the Ascend compression and acceleration tool. It is an affinity-based compression tool designed for acceleration, using compression as its core technology and built upon the Ascend platform.
To quantize a model, you should install [ModelSlim](https://gitcode.com/Ascend/msit/tree/master) which is the Ascend compression and acceleration tool. It is an affinity-based compression tool designed for acceleration, using compression as its core technology and built upon the Ascend platform.
Install ModelSlim:
```bash
# The branch(br_release_MindStudio_8.1.RC2_TR5_20260624) has been verified
git clone -b br_release_MindStudio_8.1.RC2_TR5_20260624 https://gitee.com/ascend/msit
git clone -b br_release_MindStudio_8.1.RC2_TR5_20260624 https://gitcode.com/Ascend/msit/tree/master
cd msit/msmodelslim