[main][Docs] Fix spelling errors across documentation (#6649)

Fix various spelling mistakes in the project documentation to improve
clarity and correctness.
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
This commit is contained in:
Cao Yi
2026-02-10 11:14:57 +08:00
committed by GitHub
parent 5b8e47cb68
commit 1c7d1163f5
30 changed files with 67 additions and 67 deletions

View File

@@ -96,7 +96,7 @@ There are some other rules about ACL Graph and stream. Currently, we use func `u
We will expand the stream resource limitation in the future.
## Limitation
## Limitations
1. `FULL` and `FULL_AND_PIECEWISE` are not supported now;
2. When use ACL Graph and MTP and `num_speculative_tokens > 1`, as vLLM don't support this case in v0.11.0, we need to set `cudagraph_capture_sizes` explicitly.

View File

@@ -12,7 +12,7 @@ vLLM Ascend currently supports [MooncakeStore](https://github.com/kvcache-ai/Moo
While one can utilize mooncake store in vLLM V1 engine by setting it as a remote backend of LMCache with GPU (see [Tutorial](https://github.com/LMCache/LMCache/blob/dev/examples/kv_cache_reuse/remote_backends/mooncakestore/README.md)), we find it would be better to integrate a connector that directly supports mooncake store and can utilize the data transfer strategy to one that is best fit to Huawei NPU hardware.
Hence, we propose to integrate Mooncake Store with a brand new **MooncakeStoreConnectorV1**, which is indeed largly inspired by **LMCacheConnectorV1** (see the `How is MooncakestoreConnectorV1 Implemented?` section).
Hence, we propose to integrate Mooncake Store with a brand new **MooncakeStoreConnectorV1**, which is indeed largely inspired by **LMCacheConnectorV1** (see the `How is MooncakeStoreConnectorV1 Implemented?` section).
## Usage
@@ -47,7 +47,7 @@ By introducing KV Connector V1, users can seamlessly combine HBM-based Prefix Ca
When used together with Mooncake PD (Prefill-Decode) Disaggregation, the KV Cache Pool can further decouple prefill and decode stages across devices or nodes.
Currently, we only perform put and get operation of KV Pool for **Prefiil Nodes**, and Decode Nodes get their KV Cache from Mooncake P2P KV Connector, i.e. MooncakeConnector.
Currently, we only perform put and get operation of KV Pool for **Prefill Nodes**, and Decode Nodes get their KV Cache from Mooncake P2P KV Connector, i.e. MooncakeConnector.
The key benefit of doing this is that we can keep the gain in performance by computing less with Prefix Caching from HBM and KV Pool for Prefill Nodes while not sacrificing the data transfer efficiency between Prefill and Decode nodes with P2P KV Connector that transfer KV Caches between NPU devices directly.
@@ -55,11 +55,11 @@ To Enable this feature, we need to setup both Mooncake Connector and Mooncake St
For details, please also refer to the Mooncake Connector Store Deployment Guide.
## How is MooncakestoreConnectorV1 Implemented?
## How is MooncakeStoreConnectorV1 Implemented?
**MooncakestoreConnectorV1** inhereits the KV Connector V1 class in vLLM V1: through implementing the required methods defined in the KV connector V1 base class, one can integrate a thrid-party KV cache transfer/storage backend into the vLLM framework.
**MooncakeStoreConnectorV1** inherits the KV Connector V1 class in vLLM V1: through implementing the required methods defined in the KV connector V1 base class, one can integrate a third-party KV cache transfer/storage backend into the vLLM framework.
MooncakeStoreConnectorV1 is also largly inspried by LMCacheConnectorV1 in term of the `Lookup Engine`/`Lookup Client` design for looking up KV cache keys, and the `ChunkedTokenDatabase` class for processing tokens into prefix-aware hashes as well as other hashing related designs. On top of this, we have also added our own design including `KVTransferThread` that allows async `get` and `put` of KV caches with multi-threading, and NPU-related data transfer optimization such as removing the `LocalBuffer` in LMCache to remove redundant data transfer.
MooncakeStoreConnectorV1 is also largely inspired by LMCacheConnectorV1 in terms of the `Lookup Engine`/`Lookup Client` design for looking up KV cache keys, and the `ChunkedTokenDatabase` class for processing tokens into prefix-aware hashes as well as other hashing related designs. On top of this, we have also added our own design including `KVTransferThread` that allows async `get` and `put` of KV caches with multi-threading, and NPU-related data transfer optimization such as removing the `LocalBuffer` in LMCache to remove redundant data transfer.
The KV Connector methods that need to be implemented can be categorized into scheduler-side methods that are called in V1 scheduler and worker-side methods that are called in V1 worker, namely:
@@ -77,14 +77,14 @@ The KV Connector methods that need to be implemented can be categorized into sch
`wait_for_layer_load`: Optional; Wait for layer load in layerwise + async KV load scenario.
`save_kv_layer`: Optional Do layerwise KV cache put into KV Pool.
`wait_for_save`: Wait for KV Save to finish if async KV cache save/put.
`get_finished` Get request that finished KV transfer, `done_sending` if `put` finished, `done_reciving` if `get` finished.
`get_finished` Get request that finished KV transfer, `done_sending` if `put` finished, `done_receiving` if `get` finished.
## DFX
1. When looking up a key in KV Pool, if we cannot find the key, there is no Cache Hit for this specific block; we return no hit for this block and do not look up further blocks for current request.
2. Similaly, when we are trying to put a block into KV Pool and failed, we do not put further blocks (subject to change).
2. Similarly, when we are trying to put a block into KV Pool and failed, we do not put further blocks (subject to change).
## Limitation
## Limitations
1. Currently, Mooncake Store for vLLM-Ascend only supports DRAM as the storage for KV Cache pool.

View File

@@ -243,7 +243,7 @@ During the update process, if only a few layers or the expert table of a certain
When updating expert weights, ensure that the memory allocated for the expert weights has been released, or that the expert (referring to the old version) is no longer in use.
## Limitation
## Limitations
Before using EPLB, start the script and add `export DYNAMIC_EPLB="true"`.
Before performing load data collection (or performance data collection), start the script and add `export EXPERT_MAP_RECORD="true"`.

View File

@@ -94,7 +94,7 @@ with fake_mode:
)
```
The default fusion pass in npugraph_ex is also implemented based on this API. You can see more examples of using this API in the vllm-ascned and nugraph_ex code repositories.
The default fusion pass in npugraph_ex is also implemented based on this API. You can see more examples of using this API in the vllm-ascend and npugraph_ex code repositories.
### DFX

View File

@@ -70,7 +70,7 @@ Before writing a patch, following the principle above, we should patch the least
7. Add the Unit Test and E2E Test. Any newly added code in vLLM Ascend should contain the Unit Test and E2E Test as well. You can find more details in [test guide](../contribution/testing.md)
## Limitation
## Limitations
1. In V1 Engine, vLLM starts three kinds of process: Main process, EngineCore process and Worker process. Now vLLM Ascend only can patch the code in Main process and Worker process by default. If you want to patch the code running in EngineCore process, you should patch EngineCore process entirely during setup. Find the entire code in `vllm.v1.engine.core`. Please override `EngineCoreProc` and `DPEngineCoreProc` entirely.
2. If you are running edited vLLM code, the version of vLLM may be changed automatically. For example, if you run the edited vLLM based on v0.9.n, the version of vLLM may be changed to v0.9.nxxx. In this case, the patch for v0.9.n in vLLM Ascend would not work as expected, because vLLM Ascend can't distinguish the version of the vLLM you're using. In this case, you can set the environment variable `VLLM_VERSION` to specify the version of the vLLM you're using, and then the patch for v0.10.0 should work.

View File

@@ -26,7 +26,7 @@ The `create_weights` method is used for weight initialization; the `process_weig
We need to implement the `create_weights`, `process_weights_after_loading`, and `apply` methods for different **layers** (**attention**, **mlp**, **moe**).
**Supplemnet**: When loading the model, the quantized model's description file **quant_model_description.json** needs to be read. This file describes the quantization configuration and parameters for each part of the model weights, for example:
**Supplement**: When loading the model, the quantized model's description file **quant_model_description.json** needs to be read. This file describes the quantization configuration and parameters for each part of the model weights, for example:
```json
{