[Doc] Fix doc typo (#1424)
1. Fix the typo 2. Fix 404 url 3. update graph mode and additional config user guide Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
@@ -48,7 +48,7 @@ Before writing a patch, following the principle above, we should patch the least
|
||||
|
||||
1. Decide which version of vLLM we should patch. For example, after analysis, here we want to patch both 0.9.1 and main of vLLM.
|
||||
2. Decide which process we should patch. For example, here `distributed` belongs to the vLLM main process, so we should patch `platform`.
|
||||
3. Create the patch file in the write folder. The file should be named as `patch_{module_name}.py`. The example here is `vllm_ascend/patch/platform/patch_common/patch_distributed.py`.
|
||||
3. Create the patch file in the right folder. The file should be named as `patch_{module_name}.py`. The example here is `vllm_ascend/patch/platform/patch_common/patch_distributed.py`.
|
||||
4. Write your patch code in the new file. Here is an example:
|
||||
```python
|
||||
import vllm
|
||||
|
||||
@@ -28,10 +28,10 @@ The following table lists the additional configuration options available in vLLM
|
||||
|-------------------------------| ---- |------|-----------------------------------------------------------------------------------------------|
|
||||
| `torchair_graph_config` | dict | `{}` | The config options for torchair graph mode |
|
||||
| `ascend_scheduler_config` | dict | `{}` | The config options for ascend scheduler |
|
||||
| `expert_tensor_parallel_size` | str | `0` | Expert tensor parallel size the model to use. |
|
||||
| `refresh` | bool | `false` | Whether to refresh global ascend config content. This value is usually used by rlhf case. |
|
||||
| `expert_map_path` | str | None | When using expert load balancing for the MOE model, an expert map path needs to be passed in. |
|
||||
| `chunked_prefill_for_mla` | bool | `False` | Whether to enable the fused operator-like chunked_prefill. |
|
||||
| `expert_tensor_parallel_size` | str | `0` | Expert tensor parallel size the model to use. |
|
||||
| `refresh` | bool | `false` | Whether to refresh global ascend config content. This value is usually used by rlhf or ut/e2e test case. |
|
||||
| `expert_map_path` | str | `None` | When using expert load balancing for the MOE model, an expert map path needs to be passed in. |
|
||||
| `chunked_prefill_for_mla` | bool | `False` | Whether to enable the fused operator-like chunked_prefill. |
|
||||
|
||||
The details of each config option are as follows:
|
||||
|
||||
@@ -58,7 +58,7 @@ ascend_scheduler_config also support the options from [vllm scheduler config](ht
|
||||
|
||||
### Example
|
||||
|
||||
A full example of additional configuration is as follows:
|
||||
An example of additional configuration is as follows:
|
||||
|
||||
```
|
||||
{
|
||||
|
||||
@@ -1,9 +1,10 @@
|
||||
# Graph Mode Guide
|
||||
|
||||
|
||||
```{note}
|
||||
This feature is currently experimental. In future versions, there may be behavioral changes around configuration, coverage, performance improvement.
|
||||
```
|
||||
|
||||
This guide provides instructions for using Ascend Graph Mode with vLLM Ascend. Please note that graph mode is only available on V1 Engine. And only Qwen, DeepSeek series models are well tested in 0.9.0rc1. We'll make it stable and generalize in the next release.
|
||||
This guide provides instructions for using Ascend Graph Mode with vLLM Ascend. Please note that graph mode is only available on V1 Engine. And only Qwen, DeepSeek series models are well tested from 0.9.0rc1. We'll make it stable and generalize in the next release.
|
||||
|
||||
## Getting Started
|
||||
|
||||
|
||||
Reference in New Issue
Block a user