[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
This commit is contained in:
Cao Yi
2026-02-13 15:50:05 +08:00
committed by GitHub
parent b6bc3d2f9d
commit 6de207de88
50 changed files with 273 additions and 272 deletions

View File

@@ -1,6 +1,6 @@
# Additional Configuration
Additional configuration is a mechanism provided by vLLM to allow plugins to control inner behavior by themselves. VLLM Ascend uses this mechanism to make the project more flexible.
Additional configuration is a mechanism provided by vLLM to allow plugins to control internal behavior by themselves. VLLM Ascend uses this mechanism to make the project more flexible.
## How to use
@@ -26,25 +26,25 @@ The following table lists additional configuration options available in vLLM Asc
| Name | Type | Default | Description |
|-------------------------------------|------|---------|-----------------------------------------------------------------------------------------------------------|
| `xlite_graph_config` | dict | `{}` | Configuration options for xlite graph mode |
| `xlite_graph_config` | dict | `{}` | Configuration options for Xlite graph mode |
| `weight_prefetch_config` | dict | `{}` | Configuration options for weight prefetch |
| `finegrained_tp_config` | dict | `{}` | Configuration options for module tensor parallelism |
| `ascend_compilation_config` | dict | `{}` | Configuration options for ascend compilation |
| `eplb_config` | dict | `{}` | Configuration options for ascend compilation |
| `npugraph_ex_config` | dict | `{}` | Configuration options for npugraph_ex backend |
| `npugraph_ex_config` | dict | `{}` | Configuration options for Npugraph_ex backend |
| `refresh` | bool | `false` | Whether to refresh global Ascend configuration content. This is usually used by rlhf or ut/e2e test case. |
| `dump_config_path` | str | `None` | Configuration file path for msprobe dump(eager mode). |
| `enable_async_exponential` | bool | `False` | Whether to enable async exponential overlap. To enable async exponential, set this config to True. |
| `enable_async_exponential` | bool | `False` | Whether to enable asynchronous exponential overlap. To enable asynchronous exponential, set this config to True. |
| `enable_shared_expert_dp` | bool | `False` | When the expert is shared in DP, it delivers better performance but consumes more memory. Currently only DeepSeek series models are supported. |
| `multistream_overlap_shared_expert` | bool | `False` | Whether to enable multistream shared expert. This option only takes effect on MoE models with shared experts. |
| `multistream_overlap_gate` | bool | `False` | Whether to enable multistream overlap gate. This option only takes effect on MoE models with shared experts. |
| `multistream_overlap_shared_expert` | bool | `False` | Whether to enable multi-stream shared expert. This option only takes effect on MoE models with shared experts. |
| `multistream_overlap_gate` | bool | `False` | Whether to enable multi-stream overlap gate. This option only takes effect on MoE models with shared experts. |
| `recompute_scheduler_enable` | bool | `False` | Whether to enable recompute scheduler. |
| `enable_cpu_binding` | bool | `False` | Whether to enable CPU binding. |
| `SLO_limits_for_dynamic_batch` | int | `-1` | SLO limits for dynamic batch. This is new scheduler to support dynamic feature |
| `enable_npugraph_ex` | bool | `False` | Whether to enable npugraph ex graph mode. |
| `enable_cpu_binding` | bool | `False` | Whether to enable CPU Binding. |
| `SLO_limits_for_dynamic_batch` | int | `-1` | SLO limits for dynamic batch. This is new scheduler to support dynamic batch feature |
| `enable_npugraph_ex` | bool | `False` | Whether to enable npugraph_ex graph mode. |
| `pa_shape_list` | list | `[]` | The custom shape list of page attention ops. |
| `enable_kv_nz` | bool | `False` | Whether to enable kvcache NZ layout. This option only takes effects on models using MLA (e.g., DeepSeek). |
| `layer_sharding` | dict | `{}` | Configuration options for layer sharding linear |
| `enable_kv_nz` | bool | `False` | Whether to enable KV cache NZ layout. This option only takes effects on models using MLA (e.g., DeepSeek). |
| `layer_sharding` | dict | `{}` | Configuration options for Layer Sharding Linear |
The details of each configuration option are as follows:
@@ -52,8 +52,8 @@ The details of each configuration option are as follows:
| Name | Type | Default | Description |
| ---- | ---- | ------- | ----------- |
| `enabled` | bool | `False` | Whether to enable xlite graph mode. Currently only Llama, Qwen dense series models, and Qwen3-vl are supported. |
| `full_mode` | bool | `False` | Whether to enable xlite for both the prefill and decode stages. By default, xlite is only enabled for the decode stage. |
| `enabled` | bool | `False` | Whether to enable Xlite graph mode. Currently only Llama, Qwen dense series models, and Qwen3-VL are supported. |
| `full_mode` | bool | `False` | Whether to enable Xlite for both the prefill and decode stages. By default, Xlite is only enabled for the decode stage. |
**weight_prefetch_config**
@@ -66,8 +66,8 @@ The details of each configuration option are as follows:
| Name | Type | Default | Description |
| ---- | ---- | ------- | ----------- |
| `lmhead_tensor_parallel_size` | int | `0` | The custom tensor parallel size of lmhead. |
| `oproj_tensor_parallel_size` | int | `0` | The custom tensor parallel size of oproj. |
| `lmhead_tensor_parallel_size` | int | `0` | The custom tensor parallel size of lm_head. |
| `oproj_tensor_parallel_size` | int | `0` | The custom tensor parallel size of o_proj. |
| `embedding_tensor_parallel_size` | int | `0` | The custom tensor parallel size of embedding. |
| `mlp_tensor_parallel_size` | int | `0` | The custom tensor parallel size of mlp. |