[doc][main] Correct mistakes in doc (#4945)

### What this PR does / why we need it? Correct mistakes in doc - vLLM version: v0.12.0 - vLLM main: ad32e3e19c --------- Signed-off-by: lilinsiman <lilinsiman@gmail.com>
2025-12-12 19:17:10 +08:00
parent f708d919f8
commit fc818f1509
9 changed files with 18 additions and 28 deletions
--- a/docs/source/user_guide/configuration/additional_config.md
+++ b/docs/source/user_guide/configuration/additional_config.md
@@ -33,12 +33,14 @@ The following table lists additional configuration options available in vLLM Asc
 | `expert_map_path`                   | str  | `None`  | When using expert load balancing for an MoE model, an expert map path needs to be passed in.                                                 |
 | `kv_cache_dtype`                    | str  | `None`  | When using the KV cache quantization method, KV cache dtype needs to be set, currently only int8 is supported.                                |
 | `enable_shared_expert_dp`           | bool | `False` | When the expert is shared in DP, it delivers better performance but consumes more memory. Currently only DeepSeek series models are supported. |
-| `multistream_overlap_shared_expert` | bool | `False` | Whether to enable multistream shared expert. This option only takes effects on MoE models with shared experts.                                |
+| `lmhead_tensor_parallel_size`       | int  | `None`  | The custom tensor parallel size of lmhead. Restriction: Can only be used when tensor_parallel=1                                                                                                   |
+| `oproj_tensor_parallel_size`        | int  | `None`  | The custom tensor parallel size of oproj.                                                                                                     |
+| `multistream_overlap_shared_expert` | bool | `False` | Whether to enable multistream shared expert. This option only takes effect on MoE models with shared experts.                                |
 | `dynamic_eplb`                      | bool | `False` | Whether to enable dynamic EPLB.                                                                                                                |
 | `num_iterations_eplb_update`        | int  | `400`   | Forward iterations when EPLB begins.                                                                                                      |
 | `gate_eplb`                         | bool | `False` | Whether to enable EPLB only once.                                                                                                              |
 | `num_wait_worker_iterations`        | int  | `30`    | The  forward iterations when the EPLB worker will finish CPU tasks. In our test default value 30 can cover most cases.                           |
-| `expert_map_record_path`            | str  | `None`  | When dynamic EPLB is completed, save the current expert load heatmap to the specified path.                                                   |
+| `expert_map_record_path`            | str  | `None`  | Save the expert load calculation results to a new expert table in the specified directory.                                                  |
 | `init_redundancy_expert`            | int  | `0`     | Specify redundant experts during initialization.                                                                                              |
 | `dump_config`                      | str | `None`  | Configuration file path for msprobe dump(eager mode).                                                                                          |

--- a/docs/source/user_guide/feature_guide/eplb_swift_balancer.md
+++ b/docs/source/user_guide/feature_guide/eplb_swift_balancer.md
@@ -76,7 +76,7 @@ vllm serve Qwen/Qwen3-235B-A22 \
   - Network bandwidth must support expert redistribution traffic (≥ 10 Gbps recommended).

 3. Model Compatibility:
-   - Only MoE models with explicit expert parallelism support (e.g., Qwen3-235B-A22) are compatible.
+   - Only MoE models with explicit expert parallelism support (e.g., Qwen3 MoE models) are compatible.
   - Verify model architecture supports dynamic expert routing through --enable-expert-parallel.

 4. Gating Configuration:
--- a/docs/source/user_guide/feature_guide/kv_pool.md
+++ b/docs/source/user_guide/feature_guide/kv_pool.md
@@ -113,6 +113,7 @@ python3 -m vllm.entrypoints.openai.api_server \
                "kv_role": "kv_producer",
                "kv_port": "20001",
                "kv_connector_extra_config": {
+                    "use_ascend_direct": true,
                    "prefill": {
                        "dp_size": 1,
                        "tp_size": 1
--- a/docs/source/user_guide/feature_guide/lora.md
+++ b/docs/source/user_guide/feature_guide/lora.md
@@ -7,6 +7,11 @@ You can refer to [Supported Models](https://docs.vllm.ai/en/latest/models/suppor

 You can run LoRA with ACLGraph mode now. Please refer to [Graph Mode Guide](./graph_mode.md) for a better LoRA performance.

+Address for downloading models:\
+base model: https://www.modelscope.cn/models/vllm-ascend/Llama-2-7b-hf/files \
+lora model:
+https://www.modelscope.cn/models/vllm-ascend/llama-2-7b-sql-lora-test/files
+
 ## Example
 We provide a simple LoRA example here, which enables the ACLGraph mode by default.

--- a/docs/source/user_guide/feature_guide/quantization.md
+++ b/docs/source/user_guide/feature_guide/quantization.md
@@ -6,13 +6,13 @@ Since version 0.9.0rc2, the quantization feature is experimentally supported by

 ## Install ModelSlim

-To quantize a model, you should install [ModelSlim](https://gitee.com/ascend/msit/blob/master/msmodelslim/README.md) which is the Ascend compression and acceleration tool. It is an affinity-based compression tool designed for acceleration, using compression as its core technology and built upon the Ascend platform.
+To quantize a model, you should install [ModelSlim](https://gitcode.com/Ascend/msit/tree/master) which is the Ascend compression and acceleration tool. It is an affinity-based compression tool designed for acceleration, using compression as its core technology and built upon the Ascend platform.

 Install ModelSlim:

 ```bash
 # The branch(br_release_MindStudio_8.1.RC2_TR5_20260624) has been verified
-git clone -b br_release_MindStudio_8.1.RC2_TR5_20260624 https://gitee.com/ascend/msit
+git clone -b br_release_MindStudio_8.1.RC2_TR5_20260624 https://gitcode.com/Ascend/msit/tree/master

 cd msit/msmodelslim

--- a/docs/source/user_guide/support_matrix/supported_features.md
+++ b/docs/source/user_guide/support_matrix/supported_features.md
@@ -2,6 +2,8 @@

 The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We are also actively collaborating with the community to accelerate support.

+Functional call: https://docs.vllm.ai/en/latest/features/tool_calling/
+
 You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is the feature support status of vLLM Ascend:

 | Feature                       |      Status    | Next Step                                                              |