xc-llm-ascend

Author	SHA1	Message	Date
Li Wang	4d780a8b01	[Misc] Revert "[Misc] Bump mooncake version to v0.3.8.post1 (#6110 )" (#6164 ) ### What this PR does / why we need it? The new version of moonkcake lead to the image build failure. see https://github.com/vllm-project/vllm-ascend/actions/runs/21236469259/job/61105443733, we should revert it first ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `d68209402d` Signed-off-by: wangli <wangli858794774@gmail.com>	2026-01-23 09:53:32 +08:00
Li Wang	37a9cf818a	[Misc] Bump mooncake version to v0.3.8.post1 (#6110 ) ### What this PR does / why we need it? Since the mooncake has the newer [release](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.8.post1), we pin the tag to latest release ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `d68209402d` Signed-off-by: wangli <wangli858794774@gmail.com>	2026-01-22 11:03:16 +08:00
liziyu	451bbdc292	[Doc] add tls check to pd disaggregation readme (#5638 ) ### What this PR does / why we need it? update pd disaggregation multi_node readme, update the environment check command for A3, add tls check ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `8be6432bda` Signed-off-by: liziyu <liziyu16@huawei.com>	2026-01-12 15:49:18 +08:00
zhangmuzhi_yuwan	6c1a685b30	[Doc] add new doc for mooncake: PD-Colocated cross-node multi-instance validation of Mooncake's KV Cache reuse and performance. (#5415 ) ### What this PR does / why we need it? This documentation provides a comprehensive technical guide for deploying vLLM-Ascend using a Prefill-Decode (PD) colocated architecture integrated with Mooncake, a high-performance distributed KV Cache transfer engine. As Large Language Model (LLM) serving scales, managing KV Cache efficiently across distributed nodes is essential for reducing latency and optimizing hardware utilization. The tutorial focuses on a multi-instance setup using Huawei Atlas 800T A2 nodes. By leveraging Mooncake’s distributed memory pooling, vLLM instances can achieve seamless cross-node KV Cache reuse. This capability allows an instance to retrieve precomputed cache from a remote node's DRAM via high-speed RoCE networks, effectively bypassing redundant prefill computations. ### Does this PR introduce _any_ user-facing change? No - vLLM version: release/v0.13.0 - vLLM main: `0bfd7484fd` --------- Signed-off-by: zhangmuzhibangde <1037640609@qq.com> Signed-off-by: zhangmuzhi_yuwan <1037640609@qq.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2026-01-05 14:19:57 +08:00

4 Commits