[Doc][Misc] Comprehensive documentation cleanup and grammatical fixes (#8073)
What this PR does / why we need it? This pull request performs a comprehensive cleanup of the vLLM Ascend documentation. It fixes numerous typos, grammatical errors, and phrasing issues across community guidelines, developer documents, hardware tutorials, and feature guides. Key improvements include correcting hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code examples (removing duplicate flags and trailing commas), and improving the clarity of technical explanations. These changes are necessary to ensure the documentation is professional, accurate, and easy for users to follow. Does this PR introduce any user-facing change? No, this PR contains documentation-only updates. How was this patch tested? The changes were manually reviewed for accuracy and grammatical correctness. No functional code changes were introduced. --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
This commit is contained in:
@@ -122,7 +122,7 @@ The parameters are explained as follows:
|
||||
|
||||
- Setting the environment variable `VLLM_ASCEND_BALANCE_SCHEDULING=1` enables balance scheduling. This may help increase output throughput and reduce TPOT in v1 scheduler. However, TTFT may degrade in some scenarios. Furthermore, enabling this feature is not recommended in scenarios where PD is separated.
|
||||
- For single-node deployment, we recommend using `dp4tp4` instead of `dp2tp8`.
|
||||
- `--max-model-len` specifies the maximum context length - that is, the sum of input and output tokens for a single request. For performance testing with an input length of 3.5K and output length of 1.5K, a value of `16384` is sufficient, however, for precision testing, please set it at least `35000`.
|
||||
- `--max-model-len` specifies the maximum context length - that is, the sum of input and output tokens for a single request. For performance testing with an input length of 3.5K and output length of 1.5K, a value of `16384` is sufficient, however, for precision testing, please set it to at least `35000`.
|
||||
- `--no-enable-prefix-caching` indicates that prefix caching is disabled. To enable it, remove this option.
|
||||
- If you use the w4a8 weight, more memory will be allocated to kvcache, and you can try to increase system throughput to achieve greater throughput.
|
||||
|
||||
@@ -181,7 +181,7 @@ vllm serve vllm-ascend/DeepSeek-R1-W8A8 \
|
||||
```shell
|
||||
#!/bin/sh
|
||||
|
||||
# this obtained through ifconfig
|
||||
# this is obtained through ifconfig
|
||||
# nic_name is the network interface name corresponding to local_ip of the current node
|
||||
nic_name="xxxx"
|
||||
local_ip="xxxx"
|
||||
@@ -258,10 +258,10 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
2. After execution, you can get the result, here is the result of `DeepSeek-R1-W8A8` in `vllm-ascend:0.11.0rc2` for reference only.
|
||||
|
||||
| dataset | version | metric | mode | vllm-api-general-chat |
|
||||
|----- | ----- | ----- | ----- | -----|
|
||||
| aime2024dataset | - | accuracy | gen | 80.00 |
|
||||
| gpqadataset | - | accuracy | gen | 72.22 |
|
||||
| dataset | version | metric | mode | vllm-api-general-chat |
|
||||
|----- | ----- | ----- | ----- | -----|
|
||||
| aime2024dataset | - | accuracy | gen | 80.00 |
|
||||
| gpqadataset | - | accuracy | gen | 72.22 |
|
||||
|
||||
### Using Language Model Evaluation Harness
|
||||
|
||||
@@ -271,13 +271,13 @@ As an example, take the `gsm8k` dataset as a test dataset, and run accuracy eval
|
||||
|
||||
2. Run `lm_eval` to execute the accuracy evaluation.
|
||||
|
||||
```shell
|
||||
lm_eval \
|
||||
--model local-completions \
|
||||
--model_args model=path/DeepSeek-R1-W8A8,base_url=http://<node0_ip>:<port>/v1/completions,tokenized_requests=False,trust_remote_code=True \
|
||||
--tasks gsm8k \
|
||||
--output_path ./
|
||||
```
|
||||
```shell
|
||||
lm_eval \
|
||||
--model local-completions \
|
||||
--model_args model=path/DeepSeek-R1-W8A8,base_url=http://<node0_ip>:<port>/v1/completions,tokenized_requests=False,trust_remote_code=True \
|
||||
--tasks gsm8k \
|
||||
--output_path ./
|
||||
```
|
||||
|
||||
3. After execution, you can get the result.
|
||||
|
||||
@@ -291,7 +291,7 @@ Refer to [Using AISBench for performance evaluation](../../developer_guide/evalu
|
||||
|
||||
Run performance evaluation of `DeepSeek-R1-W8A8` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/benchmarking/) for more details.
|
||||
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user