xc-llm-ascend

Author	SHA1	Message	Date
yuxinshan	0bb1f91c2c	[Feature] Mooncake connector get remote ptp size (#5822 ) ### What this PR does / why we need it? To support elastic scaling when using mooncake connector, we should support to configure different tp sizes for different nodes. As a result, we transfer the prefill node information, such as tp size, through the request's kv_transfer_params. The decode nodes get the prefill tp size through the request's kv_transfer_params, instead of getting it from the configuration of the mooncake connector . - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` Signed-off-by: yuxinshan <syx_ctyg@126.com> Signed-off-by: CalvinXKY <kyxiezju@163.com>	2026-01-26 14:28:33 +08:00
SILONG ZENG	153da1a669	[Lint]Style: Convert `vllm-ascend/` to ruff format(Batch #4 ) (#6200 ) ### What this PR does / why we need it? Scope of Changes: \| File Path \| \| :--- \| \| `vllm_ascend/distributed/kv_transfer/__init__.py` \| \| `vllm_ascend/distributed/kv_transfer/kv_p2p/mooncake_connector.py` \| \| `vllm_ascend/distributed/kv_transfer/kv_p2p/mooncake_layerwise_connector.py` \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: `d68209402d` Signed-off-by: MrZ20 <2609716663@qq.com>	2026-01-24 20:40:48 +08:00
liziyu	f66bcdfb29	[P/D] Mooncake connector add zmq socket fail log (#6155 ) Mooncake connector add zmq socket fail log - vLLM version: v0.13.0 - vLLM main: `d68209402d` Signed-off-by: liziyu <liziyu16@huawei.com>	2026-01-24 12:06:42 +08:00
weiguihua2	4173255c0c	[main][Bugix] fix kv pcp+pooling+pd separation bug (#6153 ) ### What this PR does / why we need it? Rectify the problem that the pcp and pd separation and kv pooling scenario. In the pooling scenario, multi_nodes_meta_mapping is empty. As a result, an error is reported when the remote_host information is obtained through the get_remote_port_send_num method. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.13.0 - vLLM main: `d68209402d` Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2026-01-23 16:15:04 +08:00
wangxiaoteng888	82a2b3bcc7	[P/D]Add ssl cert for metaserver proxy (#5875 ) ### What this PR does / why we need it? When the P node accesses the proxy meteserver, add the SSL certificate and the CA certificate path to improve security. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.13.0 - vLLM main: `bde38c11df` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>	2026-01-23 11:11:44 +08:00
zhangxinyuehfad	819a4459ce	Drop vLLM 0.13.0 support (#6069 ) ### What this PR does / why we need it? Drop vLLM 0.13.0 support, upgrade to 0.14.0 - vLLM version: v0.13.0 - vLLM main: `d68209402d` --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-01-23 09:45:08 +08:00
wangxiaoteng888	f2c0ced06d	[P/D][PCP]bugfix pcp force free twice caused logger error (#6124 ) ### What this PR does / why we need it? The issue of the D node mistakenly sending the pull-end signal twice, leading to the P node printing logger errors abnormally, has been resolved. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.13.0 - vLLM main: `d68209402d` Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>	2026-01-22 16:24:33 +08:00
Li Wang	484e7c59dc	[CI] optimize lint term (#5986 ) ### What this PR does / why we need it? This patch purpose to optimize the lint check term. The main idea is to reduce unnecessary installation time. 1. The installation of vllm is not must, only append the path of vllm src to the `PATHONPATH` is effective 2. This installation of `requirements-dev.txt` is not must, we have a pre-built image `quay.io/ascend-ci/vllm-ascend:lint` with all the requirements installed in advance. NOTE: the conditions for triggering image builds are: 1).Daily scheduled build; 2) Build when requirements are modified; 3) Manual build. This ensures that the dependencies in our image are up-to-date to the greatest extent possible. 3. The `mypy` was separated from the `pre-commit` hook for performance reasons; we found that integrating `mypy` into the `pre-commit` hook resulted in poor performance. 4. Reduce the CPU core consumption from 16 -> 8 ### Does this PR introduce _any_ user-facing change? The end-to-end lint time was optimized from 20min/per PR to 8min/per PR ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `2c24bc6996` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-01-22 15:46:59 +08:00
JiangWeixiang	cef04b3555	[bugfix] adapt_remote_request_id (#6051 ) This PR addresses a request ID mismatch issue in the PD (Prefill-Decoding) separation deployment scenario for vllm-ascend. Upstream vLLM recently mitigated request ID collisions by appending a random suffix to each request_id (e.g., req-123 → req-123-abc), refer to [PR-27987](https://github.com/vllm-project/vllm/pull/27987 ) & [PR-29665](https://github.com/vllm-project/vllm/pull/29665). While this works in single-node deployments, it breaks compatibility in PD-separated setups: the Producer (Prefill node) and Consumer (Decoding node) end up with different request_id values, preventing the Consumer from correctly retrieving the KV cache generated by the Producer. To resolve this, this PR introduces a new field remote_request_id in the metadata passed via mooncake_connector. The Producer preserves and forwards the original (unmodified) request_id as remote_request_id. The Consumer then uses this remote_request_id—instead of its locally generated suffixed ID—to fetch the correct KV cache from the Prefill node. This ensures consistent request identification across PD nodes while maintaining compatibility with upstream vLLM’s request ID deduplication mechanism. <img width="1279" height="781" alt="image" src="https://github.com/user-attachments/assets/274238c1-dab6-4d3a-9ee4-6e578679b762" /> - vLLM version: v0.13.0 - vLLM main: `d68209402d` Signed-off-by: ghphotoframe <854746559@qq.com> Co-authored-by: jiangweixiang <jwx02384838@antgroup.com>	2026-01-22 10:48:40 +08:00
wangxiaochao6	bc486d9530	[main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (#5960 ) ### What this PR does / why we need it? In PD disaggregation case, when P has multi nodes, mooncake fails to send data. Fix the issue in this PR. The details: If a P rank does not need to transfer kv cache to any one D rank, D node should send a message to P node to release the kv cache in P node. If P has multi nodes, D node should know the corresponding IP in each P node, then D node can send message to the right P node. Otherwise, send data error will happen. This PR fix this issue by providing P nodes IP to D node through Parameter `remote_port_send_num`. - vLLM version: v0.13.0 - vLLM main: `2c24bc6996` --------- Signed-off-by: wangxiaochao <w00642655@china.huawei.com> Co-authored-by: wangxiaochao <w00642655@china.huawei.com>	2026-01-19 16:35:13 +08:00
wangxiaoteng888	fff5df3efe	[P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (#5968 ) ### What this PR does / why we need it? The force-free secondary release request causes the node to crash. When requests are pulled too quickly, they should not be added to the delay-free queue. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.13.0 - vLLM main: `2c24bc6996` Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>	2026-01-17 18:49:27 +08:00
wjunLu	c11a05c4e1	[Main2Main] Upgrade vllm commit to 0113 (#5839 ) ### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors https://github.com/vllm-project/vllm/pull/31916 https://github.com/vllm-project/vllm/pull/32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to https://github.com/vllm-project/vllm/pull/24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified https://github.com/vllm-project/vllm/pull/31998 - Skip some pooling tests, which are caused by https://github.com/vllm-project/vllm/pull/32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs https://github.com/vllm-project/vllm/pull/32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by https://github.com/vllm-project/vllm/pull/32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>	2026-01-15 09:48:53 +08:00
lty	295018ec0f	[Refactor]Refactor of vllm_ascend/distributed module (#5719 ) ### What this PR does / why we need it? Based on the RFC:https://github.com/vllm-project/vllm-ascend/issues/5604 This PR is a refactoring of vllm_ascend/distributed, moving all kv_transfer realtaed codes into a dedicated folder, which has already been done in vLLM ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` --------- Signed-off-by: lty <linhebiwen@gmail.com>	2026-01-15 08:57:40 +08:00

13 Commits