xc-llm-ascend/source at 41d48cb9745a3bb8024d72938f367813e2417bed - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

zhangguinan be5b66de6d [Doc] Contributing a Benchmark Tutorial for Suffix Speculative Decoding (#6323 )

### What this PR does / why we need it?
Suffix Decoding is a CPU-based speculative decoding optimization that
accelerates inference by pattern matching and frequency-based prediction
from both prompts and generated content.

This document provides a step-by-step guide for deploying and evaluating
**Suffix Speculative Decoding** on the **Ascend** platform. By analyzing
performance gains across diverse datasets, it demonstrates the
significant advantages of this technology in inference acceleration. Our
goal is to empower developers to achieve high-efficiency model
optimization using Ascend hardware.
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: zhangmuzhibangde <1037640609@qq.com>

2026-02-03 14:52:38 +08:00

..

_templates/sections

[Doc] Update doc url link (#5781 )

2026-01-12 11:21:31 +08:00

[doc](cp) correct the prefill of GQA and adjust desc of block table. (#5697 )

2026-01-19 18:53:48 +08:00

[Main2Main][Deps][Misc] Upgrade vLLM to v0.15.0 (#6470 )

2026-02-02 15:57:55 +08:00

developer_guide

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

locale/zh_CN/LC_MESSAGES

implement model runner v2 basic framework (#5051 )

2025-12-18 15:51:54 +08:00

[Doc] Add sphinx build for vllm-ascend (#55 )

2025-02-13 18:44:17 +08:00

[Doc] Contributing a Benchmark Tutorial for Suffix Speculative Decoding (#6323 )

2026-02-03 14:52:38 +08:00

[doc][npugraph_ex]add npugraph_ex introduction doc (#6306 )

2026-01-30 11:21:37 +08:00

conf.py

[Main2Main][Deps][Misc] Upgrade vLLM to v0.15.0 (#6470 )

2026-02-02 15:57:55 +08:00

faqs.md

[Doc] add release note for v0.14.0rc1 (#6225 )

2026-01-26 14:22:40 +08:00

index.md

[Doc] Added deploying on k8s with kthena (#4674 )

2025-12-23 17:46:04 +08:00

installation.md

[CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (#6145 )

2026-01-22 19:50:54 +08:00

quick_start.md

[Doc] Update max_tokens to max_completion_tokens in all docs (#6248 )

2026-01-26 11:57:40 +08:00