[Doc][Misc] Comprehensive documentation cleanup and grammatical fixes (#8073)

What this PR does / why we need it? This pull request performs a comprehensive cleanup of the vLLM Ascend documentation. It fixes numerous typos, grammatical errors, and phrasing issues across community guidelines, developer documents, hardware tutorials, and feature guides. Key improvements include correcting hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code examples (removing duplicate flags and trailing commas), and improving the clarity of technical explanations. These changes are necessary to ensure the documentation is professional, accurate, and easy for users to follow. Does this PR introduce any user-facing change? No, this PR contains documentation-only updates. How was this patch tested? The changes were manually reviewed for accuracy and grammatical correctness. No functional code changes were introduced. --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
2026-04-09 15:37:57 +08:00
parent c40a387f63
commit 0d1424d81a
71 changed files with 1295 additions and 1296 deletions
--- a/docs/source/tutorials/features/suffix_speculative_decoding.md
+++ b/docs/source/tutorials/features/suffix_speculative_decoding.md
@@ -12,8 +12,8 @@ This document provides step-by-step guidance on how to deploy and benchmark the
 | Common Sense Reasoning         | ARC              |
 | Mathematical Reasoning         | gsm8k            |
 | Natural Language Understanding | SuperGLUE_BoolQ  |
-| Comprehensive Examination      | agieval          |
-| Multi-turn Dialogue            | sharegpt         |
+| Comprehensive Examination      | AGIEval          |
+| Multi-turn Dialogue            | ShareGPT         |

 The benchmarking tool used in this tutorial is AISBench, which supports performance testing for all the datasets listed above. The final section of this tutorial presents a performance comparison between enabling and disabling Suffix Decoding under the condition of satisfying an SLO TPOT < 50ms across different datasets and concurrency levels. Validations demonstrate that the Qwen3-32B model achieves a throughput improvement of approximately 20% to 80% on various real-world datasets when Suffix Decoding is enabled.

@@ -171,7 +171,7 @@ Below is the raw detailed test results:
 | 1                   | 207       | 314        | 100      | 54.1          | 18.4                 | 36.1            | 26.8                   | 33.4%       | 49.8%     | 45.6%    |
 | 16                  | 207       | 314        | 100      | 60.0          | 229.7                | 43.5            | 303.9                  | 33.4%       | 38.0%     | 32.3%    |
 | 32                  | 207       | 314        | 100      | 62.7          | 396.4                | 47.8            | 507.5                  | 33.4%       | 31.3%     | 28.0%    |
-| **Agieval**         |           |            |          |               |                      |                 |                        |             |           |          |
+| **AGIEval**         |           |            |          |               |                      |                 |                        |             |           |          |
 | 1                   | 735       | 1880       | 100      | 53.1          | 18.7                 | 31.8            | 34.1                   | 50.3%       | 66.8%     | 81.9%    |
 | 24                  | 735       | 1880       | 100      | 64.0          | 381.2                | 43.3            | 629.0                  | 50.3%       | 47.8%     | 65.0%    |
 | 34                  | 735       | 1880       | 100      | 70.0          | 494.6                | 50.2            | 768.4                  | 50.3%       | 39.4%     | 55.3%    |