### What this PR does / why we need it?
Add user guide for **Fine-Grained Tensor Parallelism** feature.
Documents usage, supported components (`embedding`, `lm_head`, `o_proj`,
`mlp`/`dense_ffn`), model compatibility, and deployment guidelines.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: zzhx1 <zzh_201018@outlook.com>
Signed-off-by: chenxiao <Jaychou1620@Gmail.com>
Signed-off-by: 秋刀鱼 <jaychou1620@Gmail.com>
Co-authored-by: chenxiao <Jaychou1620@Gmail.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
24 lines
359 B
Markdown
24 lines
359 B
Markdown
# Feature Guide
|
|
|
|
This section provides a detailed usage guide of vLLM Ascend features.
|
|
|
|
:::{toctree}
|
|
:caption: Feature Guide
|
|
:maxdepth: 1
|
|
graph_mode
|
|
quantization
|
|
quantization-llm-compressor
|
|
sleep_mode
|
|
structured_output
|
|
lora
|
|
eplb_swift_balancer
|
|
netloader
|
|
dynamic_batch
|
|
kv_pool
|
|
external_dp
|
|
large_scale_ep
|
|
ucm_deployment
|
|
Fine_grained_TP
|
|
speculative_decoding
|
|
:::
|