xc-llm-ascend

Author	SHA1	Message	Date
wangxiyuan	eeedf7c503	[Main2Main][Deps][Misc] Upgrade vLLM to v0.15.0 (#6470 ) ### What this PR does / why we need it? This PR upgrades the vLLM dependency from `v0.14.1` to `v0.15.0`. This involves: - Updating the `VLLM_TAG` in all `Dockerfile`. - Updating the vLLM version in `docs/source/conf.py`. - Removing conditional code paths specific to `v0.14.1` across the codebase, which simplifies maintenance. - Fix `TypeError: MMEncoderAttention.__init__() got an unexpected keyword argument 'multimodal_config'` due to https://github.com/vllm-project/vllm/pull/31972. - Fix `_shared_experts: 'NoneType' object is not callable` due to https://github.com/vllm-project/vllm/pull/32082 by https://github.com/vllm-project/vllm-ascend/pull/6335. - Fix `ReshapeAndCacheOperation setup failed!` due to https://github.com/vllm-project/vllm/pull/25954 by overriding attention metadata slots. This upgrade is necessary to keep the project aligned with the latest features, bug fixes, and API changes in the vLLM project. ### Does this PR introduce _any_ user-facing change? No, this is an internal dependency update and does not introduce any user-facing changes. ### How was this patch tested? CI is expected to pass with these changes, ensuring that all existing tests are successful with the new vLLM version. - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` co-authored-by: shen-shanshan <467638484@qq.com> --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-02 15:57:55 +08:00
meihanc	fea197ad50	[Main2Main] Upgrade vllm commit to 0123 (#6169 ) ### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： https://github.com/vllm-project/vllm/pull/32245 https://github.com/vllm-project/vllm/pull/32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to https://github.com/vllm-project/vllm/pull/28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to https://github.com/vllm-project/vllm/pull/30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to https://github.com/vllm-project/vllm/pull/24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：https://github.com/vllm-project/vllm/pull/32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. https://github.com/vllm-project/vllm/pull/32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor https://github.com/vllm-project/vllm/pull/30143 3. Remove unused `maybe_setup_kv_connector` due to https://github.com/vllm-project/vllm/pull/32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to https://github.com/vllm-project/vllm/pull/32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in https://github.com/vllm-project/vllm/pull/32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: `d68209402d` --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com>	2026-01-27 08:44:36 +08:00
LICO67373	2a6d95c389	[Cleanup] Remove dead code make_attention_mask function (#5818 ) ### What this PR does / why we need it? This PR removes the unused `make_attention_mask` function from `vllm_ascend/worker/v2/attn_utils.py`. Why it's dead code: - After PR #4870 (attention mask unification refactor), attention mask generation has been centralized in the `AttentionMaskBuilder` singleton class - The mask is now generated directly by metadata builders when needed (e.g., `AscendAttentionMetadataBuilder`, `AscendMLAMetadataBuilder`) - The `make_attention_mask` function is no longer called anywhere in the codebase - The function's parameters (including `attn_mask` and `spec_attn_mask`) were also removed from `build_attn_metadata` in the same refactor Changes: - Remove `make_attention_mask` function (24 lines) from `vllm_ascend/worker/v2/attn_utils.py` ### Does this PR introduce _any_ user-facing change? No. This is a code cleanup that removes dead code. No user-facing behavior changes. ### How was this patch tested? - Verified that `make_attention_mask` is not called anywhere in the codebase (via `grep`) - CI tests pass to ensure no regressions - The function has been unused since PR #4870 was merged - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` Signed-off-by: lico67373 <918688502@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com>	2026-01-14 16:52:51 +08:00
Ronald	e20813f441	[Feature] implement eagle spec decoding for model runner v2 (#5840 ) ### What this PR does / why we need it? this pr implement eagle spec decoding for model runner v2, please see RFC https://github.com/vllm-project/vllm-ascend/issues/5208 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? vLLM version: v0.13.0 --------- Signed-off-by: Ronald1995 <ronaldautomobile@163.com>	2026-01-14 09:18:05 +08:00
LICO67373	380f089fbf	[Refactor] Fix AttentionMaskBuilder singleton and remove redundant pcp_prefill_mask (#4870 ) ## What this PR does / why we need it? This PR fixes the `AttentionMaskBuilder` singleton initialization issue introduced in PR #4779 and removes the unused `pcp_prefill_mask` field. ### Background After PR #4779 made `AttentionMaskBuilder` a singleton with `@singleton` decorator, the class constructor now requires a `device` parameter. However, two initialization sites were still using the old parameterless constructor, causing failures. ### Changes 1. Fix singleton initialization - Fixed `AttentionMaskBuilder()` → `AttentionMaskBuilder(self.device)` in `AscendMLAMetadataBuilder.__init__()` - Fixed `AttentionMaskBuilder()` → `AttentionMaskBuilder(self.device)` in `AscendAttentionMetadataBuilder.__init__()` 2. Remove unused field - Removed `pcp_prefill_mask` field from `AscendPrefillContextParallelMetadata` (never used in codebase) - Updated related test assertions ### Related - Issue #5463 - PR #4779 (Unify all mask generation methods) - PR #5389 (Make AttentionMaskBuilder singleton) ## Does this PR introduce _any_ user-facing change? No. This is an internal refactoring. ## How was this patch tested? - ✅ Local testing: No linter errors - ✅ Unit tests for attention modules verified - ⏳ CI pipeline Signed-off-by: lico67373 <918688502@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com>	2026-01-07 17:09:52 +08:00
Ronald	e7e1a7dc05	[Feature] support eager mode in model runner v2 (#5210 ) ### What this PR does / why we need it? #5051 only implement a basic framework for model runner v2, but there are still some bugs for e2e functionality, this PR aim to enable basic functionality. model runner v2 plans: https://github.com/vllm-project/vllm-ascend/issues/5208 - vLLM version: release/v0.13.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: Ronald1995 <ronaldautomobile@163.com>	2025-12-29 15:28:34 +08:00
weijinqian0	95e8a52156	[Refactor] move the metadata from attention_v1 to util(ready for extract common_cp) & realize Ascendmetadata inherit from the parent class. (#5203 ) RFC: https://github.com/vllm-project/vllm-ascend/issues/4629 1. Remove the pcp-related code from attention_v1. 2. Establish the inheritance relationship of CommonAttentionMetadata. TODO 1. extract common_cp 2. move cp metadata to common_cp. 3. remove commonAttentionMetadata for aclgraph. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: weijinqian_v1 <weijinqian@huawei.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>	2025-12-23 00:10:52 +08:00
weijinqian0	35ad11b637	[Refactor] remove some metadata variables in attention_v1. (#5160 ) RFC: https://github.com/vllm-project/vllm-ascend/issues/4629 Reason: The metadata data class contains an excessive number of variables. We will inherit the metadata of the community and simultaneously remove some variables that are no longer needed at present. Todo: 1. remove attn_state partly. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: weijinqian_v1 <weijinqian@huawei.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>	2025-12-19 14:57:09 +08:00
Ronald	b69b04d3a9	implement model runner v2 basic framework (#5051 ) ### What this PR does / why we need it? This PR aim to implement model runner v2 basic framework in vllm-ascend, the e2e function is not guaranteed by this pr. ### Does this PR introduce _any_ user-facing change? use envs.VLLM_USE_V2_MODEL_RUNNER to decide if choose model_runenr_v2. ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: Ronald1995 <ronaldautomobile@163.com>	2025-12-18 15:51:54 +08:00

9 Commits