### What this PR does / why we need it?
Add user guide for **Fine-Grained Tensor Parallelism** feature.
Documents usage, supported components (`embedding`, `lm_head`, `o_proj`,
`mlp`/`dense_ffn`), model compatibility, and deployment guidelines.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: zzhx1 <zzh_201018@outlook.com>
Signed-off-by: chenxiao <Jaychou1620@Gmail.com>
Signed-off-by: 秋刀鱼 <jaychou1620@Gmail.com>
Co-authored-by: chenxiao <Jaychou1620@Gmail.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
359 B
359 B
Feature Guide
This section provides a detailed usage guide of vLLM Ascend features.
:::{toctree} :caption: Feature Guide :maxdepth: 1 graph_mode quantization quantization-llm-compressor sleep_mode structured_output lora eplb_swift_balancer netloader dynamic_batch kv_pool external_dp large_scale_ep ucm_deployment Fine_grained_TP speculative_decoding :::