Files
xc-llm-ascend/docs/source/tutorials/index.md
zhangguinan be5b66de6d [Doc] Contributing a Benchmark Tutorial for Suffix Speculative Decoding (#6323)
### What this PR does / why we need it?
Suffix Decoding is a CPU-based speculative decoding optimization that
accelerates inference by pattern matching and frequency-based prediction
from both prompts and generated content.

This document provides a step-by-step guide for deploying and evaluating
**Suffix Speculative Decoding** on the **Ascend** platform. By analyzing
performance gains across diverse datasets, it demonstrates the
significant advantages of this technology in inference acceleration. Our
goal is to empower developers to achieve high-efficiency model
optimization using Ascend hardware.
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: zhangmuzhibangde <1037640609@qq.com>
2026-02-03 14:52:38 +08:00

48 lines
853 B
Markdown

# Tutorials
:::{toctree}
:caption: Models
:maxdepth: 1
Qwen2.5-Omni.md
Qwen2.5-7B.md
Qwen3-Dense.md
Qwen-VL-Dense.md
Qwen3-30B-A3B.md
Qwen3-235B-A22B.md
Qwen3-VL-30B-A3B-Instruct.md
Qwen3-VL-235B-A22B-Instruct.md
Qwen3-Coder-30B-A3B.md
Qwen3_embedding.md
Qwen3-VL-Embedding.md
Qwen3_reranker.md
Qwen3-VL-Reranker.md
Qwen3-8B-W4A8.md
Qwen3-32B-W4A4.md
Qwen3-Next.md
Qwen3-Omni-30B-A3B-Thinking.md
DeepSeek-V3.1.md
DeepSeek-V3.2.md
DeepSeek-R1.md
GLM4.x.md
Kimi-K2-Thinking.md
PaddleOCR-VL.md
:::
:::{toctree}
:caption: Features
:maxdepth: 1
pd_colocated_mooncake_multi_instance.md
pd_disaggregation_mooncake_single_node.md
pd_disaggregation_mooncake_multi_node.md
long_sequence_context_parallel_single_node.md
long_sequence_context_parallel_multi_node.md
suffix_speculative_decoding.md
ray
:::
:::{toctree}
:caption: Hardware
:maxdepth: 1
310p.md
:::