xc-llm-ascend

Author	SHA1	Message	Date
UnifiedCacheManager	195eac665b	[Core][Worker] Add UCMConnector for KV Cache Offloading (#4411 ) ### What this PR does / why we need it? This PR introduces the initial integration of UCM (Unified Cache Management) into the vllm-ascend distributed KV-cache system. Specifically, it adds: - A new `UCMConnector` implementation under the distributed KV-transfer framework. - Support for offloading KV-cache blocks to external UCM backends (DRAM / NFS / Localdisk), depending on UCM configuration). - Integration with vLLM V1 KV connector interface, including metadata handling and role registration. Why it is needed: - UCM provides a unified, high-performance storage layer for KV-cache externalization. - This enables vllm-ascend to support out-of-core KV-cache workloads, improve memory efficiency, and leverage hardware-accelerated storage paths (RDMA / NFS / hybrid modes). - This connector is a required component to allow future work on multi-node inference + UCM-based scaling. --- ### Does this PR introduce _any_ user-facing change? Yes, but limited: - A new `kv_connector=UCMConnector` option becomes available through the configuration interface. - When selected, vllm-ascend workers may initialize UCM and offload KV-cache blocks externally. - No default behaviors are changed. Users must explicitly enable this connector. This PR does not modify: - existing APIs, - default execution paths, - model runner behavior, - user workflow unless `UCMConnector` is configured. --- ### How was this patch tested? --- ### Prefix Caching Benchmark We provide preliminary measurements for TTFT (ms) under VLLM benchmark. Tests run on 2 * Ascend 910B3, vllm-ascend 0.11.0, Tensor Parallel size 2, with UCM (Localdisk) enabled. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: UnifiedCacheManager <unifiedcachem@163.com>	2025-12-16 10:53:30 +08:00
wangxiaoteng888	a77045f355	[P/D][main]Offline the llmdatadist connector related parts of the code and files. (#4780 ) ### What this PR does / why we need it? As support for the mooncake connector is now available, the llmdatadist connector is no longer being maintained, so the llmdatadist-related files need to be retired. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-12-09 22:36:43 +08:00
fems14	5447a039b9	[Feature][main]reconstruction kvpool connector to ascend connector (#4438 ) ### What this PR does / why we need it? 1.In short, we renamed the existing MooncakeStoreConnector to AscendStoreConnector and extracted the storage engine interaction logic into a new Backend class. Associated RFC：https://github.com/vllm-project/vllm-ascend/issues/4329 2.Fixed the issue where the number of input parameters for the connector was incorrect, introduced in vllm 0.11.2 ### Does this PR introduce _any_ user-facing change? change MooncakeStoreConnector to AscendStoreConnector ### How was this patch tested? - vLLM version: v0.11.2 --------- Signed-off-by: fems14 <1804143737@qq.com>	2025-11-28 18:08:37 +08:00
fems14	55e23fabec	【bugfix】fix connector register failed (#3335 ) ### What this PR does / why we need it? Register the connector in the plugin ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: fems14 <1804143737@qq.com>	2025-10-09 21:09:54 +08:00
Chao Lei	a486ff8c11	KVCache Transfer via Layer-wise Strategy in Disaggregation (#2602 ) ### What this PR does / why we need it? See RFC: https://github.com/vllm-project/vllm-ascend/issues/2470 This PR add a new kv connector for layer-wised kv transfer ### Does this PR introduce _any_ user-facing change? yes, a new kv connector is added. User can use layer wised feature now. ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 --------- Signed-off-by: leichao.lc <leichao139636@163.com> Signed-off-by: CaveNightingale <2859066733@qq.com> Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: hanxinlong <50882499@qq.com> Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: CaveNightingale <2859066733@qq.com> Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: hanxinlong <50882499@qq.com>	2025-09-30 15:10:29 +08:00
Chao Lei	cef43b524e	[Feat] A Connector that supports Mooncake store (#2913 ) ### What this PR does / why we need it? Added a new connector for Mooncake store integration to enable kvcache reuse in scenarios with system prompts or multi-turn dialogues. ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: `5963b98b46` --------- Signed-off-by: LCAIZJ <leichao139636@163.com> Signed-off-by: fems14 <1804143737@qq.com> Co-authored-by: fems14 <1804143737@qq.com> Co-authored-by: Dreamerleader <2270923832@qq.com> Co-authored-by: Pz1116 <zpbzpb123123@gmail.com> Co-authored-by: lizy124 <1950471827@qq.com> Co-authored-by: zouyida2052 <zouyida2002@gmail.com>	2025-09-18 14:04:45 +08:00
Chao Lei	03ca2b26ca	[P/D] Mooncake Connector for v1 distributed (#1568 ) ### What this PR does / why we need it? This PR adopt Mooncake TransferEngine for kv cache register and pull_blocks style disaggregate prefill implementation. ### Does this PR introduce any user-facing change? No ### Dependencies 1. Cann Dependencies Using Mooncake TransferEngine with Ascend Transport requires CANN version 8.2.RC1 or higher.（see detail Mooncake[#502](https://github.com/kvcache-ai/Mooncake/pull/502)） 2. vllm-ascend This PR depends on changes introduced by #950 (modifications to `model_runner_v1`) and #1361 (updates to `schedule`), both of which have been merged into the `v0.9.1-dev` branch and are expected to land in `main` shortly. ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: `1c859a1387` --------- Signed-off-by: leichao.lc <leichao139636@163.com> Co-authored-by: jianzs <zheng.shoujian@outlook.com> Co-authored-by: zzy-ContiLearn <1831242919@qq.com> Co-authored-by: fems14 <1804143737@qq.com> Co-authored-by: Dreamerleader <2270923832@qq.com> Co-authored-by: chris668899 <15105191595@126.com> Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>	2025-08-18 14:30:07 +08:00
wangxiyuan	0190b68f51	[Misc]Remove PD v0 code (#2047 ) Cleanup V0 disaggregated prefill code for V0 Engine. part of https://github.com/vllm-project/vllm-ascend/issues/1620 TODO: enable v1 e2e test. - vLLM version: v0.10.0 - vLLM main: `2cc571199b` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-28 19:09:22 +08:00
Pleaplusone	df0ec55162	Disaggregate prefill for kv cache register style (#950 ) ### What this PR does / why we need it? This PR adopt `LLMDataDist` for kv cache register and `pull_blocks` style disaggregate prefill implementation. The interface implementation mainly follows the design of NIXL PR https://github.com/vllm-project/vllm/pull/17751/files#diff-7eaad0b7dee0626bf29d10081b0f0c5e3ea15a4af97e7b182a4e0d35f8346953 . This PR can be test with the following step: - Generate the rank table for all machine. - execute`toy_proxy.py` to launch the disaggregate prefill proxy server, specify the prefill ip, port and the decode ip, port - Run the prefill server and decode server. - send the request to the disaggregate prefill proxy ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.9.2 - vLLM main: `8d0a01a5f2` --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com> Signed-off-by: machenglong <machenglong_yewu@cmss.chinamobile.com> Signed-off-by: liziyu179 <3475441767@qq.com> Signed-off-by: underfitc <hucong24@huawei.com> Signed-off-by: zouyida2052 <zouyida@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: underfituu <hzhucong@163.com> Co-authored-by: machenglong <machenglong_yewu@cmss.chinamobile.com> Co-authored-by: liziyu179 <3475441767@qq.com> Co-authored-by: underfitc <hucong24@huawei.com> Co-authored-by: zouyida2052 <zouyida@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com> Co-authored-by: underfituu <hzhucong@163.com>	2025-07-26 17:15:47 +08:00
whx	8b194ad12e	[Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694 ) ### What this PR does / why we need it? - This PR proposes a P2P version of Disaggregated Prefill based on llm_datadist which manages data transfer. - This solution reconstructs previous offline single-node Disaggregated Prefill solution, and supports multi-node and online serveing now. - Currently this solution supports 1P1D situation of Deepseek hybrid parallelism (P: TP+EP, D: DP+EP). Note that xPyD situation is considered in the solution design, and will be supported soon within v1 engine. --------- Signed-off-by: hw_whx <wanghexiang7@huawei.com> Signed-off-by: ganyi <pleaplusone.gy@gmail.com> Co-authored-by: hw_whx <wanghexiang7@huawei.com> Co-authored-by: ganyi <pleaplusone.gy@gmail.com>	2025-05-01 22:31:36 +08:00
eeethenQ	44a8301424	[Feature] Add PD separation feature (#432 ) ### What this PR does / why we need it? Adapt Disaggregated Prefill feature onto Ascend device ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? The test usage has been provided alongwith the PR, in examples/offline_disaggregated_prefill_npu.py To run it, do this ``` export PROMPT_DEVICE_ID=0,1 export DECODE_DEVICE_ID=2,3 python examples/offline_disaggregated_prefill_npu.py ``` --------- Signed-off-by: ZihuiQian <qianzihui@huawei.com> Co-authored-by: ZihuiQian <qianzihui@huawei.com>	2025-04-15 15:11:35 +08:00

11 Commits