Files
xc-llm-ascend/docs/source/tutorials/index.md
wangxiaoteng888 ca05f7d632 [Bugfix] TP size larger than KV cache head causes accuracy issues (#3366)
### What this PR does / why we need it?
Resolve the issue where, in the case of unequal TP (Tensor Parallelism),
the TP size is larger than the number of model attention kvcache heads,
causing the KV cache to generate duplicates, which leads to transmission
errors in the original code.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com>
2025-10-11 11:22:23 +08:00

25 lines
438 B
Markdown

# Tutorials
:::{toctree}
:caption: Deployment
:maxdepth: 1
single_npu
single_npu_multimodal
single_npu_audio
single_npu_qwen3_embedding
single_npu_qwen3_quantization
multi_npu_qwen3_next
multi_npu
multi_npu_moge
multi_npu_qwen3_moe
multi_npu_quantization
single_node_300i
multi-node_dsv3.2.md
multi_node
multi_node_kimi
multi_node_qwen3vl
multi_node_pd_disaggregation_llmdatadist
multi_node_pd_disaggregation_mooncake
multi_node_ray
:::