[Doc][Misc] Comprehensive documentation cleanup and grammatical fixes (#8073)
What this PR does / why we need it? This pull request performs a comprehensive cleanup of the vLLM Ascend documentation. It fixes numerous typos, grammatical errors, and phrasing issues across community guidelines, developer documents, hardware tutorials, and feature guides. Key improvements include correcting hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code examples (removing duplicate flags and trailing commas), and improving the clarity of technical explanations. These changes are necessary to ensure the documentation is professional, accurate, and easy for users to follow. Does this PR introduce any user-facing change? No, this PR contains documentation-only updates. How was this patch tested? The changes were manually reviewed for accuracy and grammatical correctness. No functional code changes were introduced. --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
This commit is contained in:
@@ -328,7 +328,7 @@ vllm serve Qwen/Qwen3-VL-8B-Instruct \
|
||||
```
|
||||
|
||||
:::{note}
|
||||
Add `--max_model_len` option to avoid ValueError that the Qwen3-VL-8B-Instruct model's max seq len (256000) is larger than the maximum number of tokens that can be stored in KV cache. This will differ with different NPU series base on the HBM size. Please modify the value according to a suitable value for your NPU series.
|
||||
Add `--max_model_len` option to avoid ValueError that the Qwen3-VL-8B-Instruct model's max seq len (256000) is larger than the maximum number of tokens that can be stored in KV cache. This will differ with different NPU series based on the HBM size. Please modify the value according to a suitable value for your NPU series.
|
||||
:::
|
||||
|
||||
If your service start successfully, you can see the info shown below:
|
||||
@@ -474,8 +474,6 @@ The accuracy of some models is already within our CI monitoring scope, including
|
||||
- `Qwen2.5-VL-7B-Instruct`
|
||||
- `Qwen3-VL-8B-Instruct`
|
||||
|
||||
You can refer to the [monitoring configuration](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test_nightly_a2.yaml).
|
||||
|
||||
:::::{tab-set}
|
||||
:sync-group: install
|
||||
|
||||
@@ -486,28 +484,28 @@ As an example, take the `mmmu_val` dataset as a test dataset, and run accuracy e
|
||||
|
||||
1. Refer to [Using lm_eval](../../developer_guide/evaluation/using_lm_eval.md) for more details on `lm_eval` installation.
|
||||
|
||||
```shell
|
||||
pip install lm_eval
|
||||
```
|
||||
```shell
|
||||
pip install lm_eval
|
||||
```
|
||||
|
||||
2. Run `lm_eval` to execute the accuracy evaluation.
|
||||
|
||||
```shell
|
||||
lm_eval \
|
||||
--model vllm-vlm \
|
||||
--model_args pretrained=Qwen/Qwen3-VL-8B-Instruct,max_model_len=8192,gpu_memory_utilization=0.7 \
|
||||
--tasks mmmu_val \
|
||||
--batch_size 32 \
|
||||
--apply_chat_template \
|
||||
--trust_remote_code \
|
||||
--output_path ./results
|
||||
```
|
||||
```shell
|
||||
lm_eval \
|
||||
--model vllm-vlm \
|
||||
--model_args pretrained=Qwen/Qwen3-VL-8B-Instruct,max_model_len=8192,gpu_memory_utilization=0.7 \
|
||||
--tasks mmmu_val \
|
||||
--batch_size 32 \
|
||||
--apply_chat_template \
|
||||
--trust_remote_code \
|
||||
--output_path ./results
|
||||
```
|
||||
|
||||
3. After execution, you can get the result, here is the result of `Qwen3-VL-8B-Instruct` in `vllm-ascend:0.11.0rc3` for reference only.
|
||||
|
||||
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|
||||
|---------|------:|------|-----:|------|---|-----:|---|-----:|
|
||||
|mmmu_val | 0|none | |acc |↑ |0.5389|± |0.0159|
|
||||
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|
||||
|---------|------:|------|-----:|------|---|-----:|---|-----:|
|
||||
|mmmu_val | 0|none | |acc |↑ |0.5389|± |0.0159|
|
||||
|
||||
::::
|
||||
::::{tab-item} Qwen2.5-VL-32B-Instruct
|
||||
@@ -517,27 +515,27 @@ As an example, take the `mmmu_val` dataset as a test dataset, and run accuracy e
|
||||
|
||||
1. Refer to [Using lm_eval](../../developer_guide/evaluation/using_lm_eval.md) for more details on `lm_eval` installation.
|
||||
|
||||
```shell
|
||||
pip install lm_eval
|
||||
```
|
||||
```shell
|
||||
pip install lm_eval
|
||||
```
|
||||
|
||||
2. Run `lm_eval` to execute the accuracy evaluation.
|
||||
|
||||
```shell
|
||||
lm_eval \
|
||||
--model vllm-vlm \
|
||||
--model_args pretrained=Qwen/Qwen2.5-VL-32B-Instruct,max_model_len=8192,tensor_parallel_size=2 \
|
||||
--tasks mmmu_val \
|
||||
--apply_chat_template \
|
||||
--trust_remote_code \
|
||||
--output_path ./results
|
||||
```
|
||||
```shell
|
||||
lm_eval \
|
||||
--model vllm-vlm \
|
||||
--model_args pretrained=Qwen/Qwen2.5-VL-32B-Instruct,max_model_len=8192,tensor_parallel_size=2 \
|
||||
--tasks mmmu_val \
|
||||
--apply_chat_template \
|
||||
--trust_remote_code \
|
||||
--output_path ./results
|
||||
```
|
||||
|
||||
3. After execution, you can get the result, here is the result of `Qwen2.5-VL-32B-Instruct` in `vllm-ascend:0.11.0rc3` for reference only.
|
||||
|
||||
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|
||||
|---------|------:|------|-----:|------|---|-----:|---|-----:|
|
||||
|mmmu_val | 0|none | |acc |↑ |0.5744|± |0.0158|
|
||||
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|
||||
|---------|------:|------|-----:|------|---|-----:|---|-----:|
|
||||
|mmmu_val | 0|none | |acc |↑ |0.5744|± |0.0158|
|
||||
|
||||
::::
|
||||
:::::
|
||||
@@ -546,7 +544,7 @@ lm_eval \
|
||||
|
||||
### Using vLLM Benchmark
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/benchmarking/) for more details.
|
||||
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user