xc-llm-ascend

Author	SHA1	Message	Date
Li Wang	83a4065b4b	[CI] Add pre-commit check for patch logger (#7446 ) ### What this PR does / why we need it? See https://github.com/vllm-project/vllm-ascend/pull/7402, pre-commit hook will forbid init_logger(__name__) in vllm_ascend patch modules - vLLM version: v0.17.0 - vLLM main: `8a680463fa` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-19 16:53:20 +08:00
wangxiyuan	13777bf3f0	[Spec Decode]clean up spec decode interface (#6947 ) This pull request refactors the speculative decoding proposer interface to align with upstream vLLM, removing the local `Proposer` interface and renaming methods to `propose`. This is the first step. In the future we should remove the class register and just add few Ascend specified method once the arch in vLLM is ready. - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-03-05 14:30:10 +08:00
SILONG ZENG	06aa6036f6	[Lint]Style: Convert `vllm-ascend/` to ruff format(new Batch #8 ) (#6604 ) ### What this PR does / why we need it? Scope of Changes: \| File Path \| \| :--- \| \| vllm_ascend/ops/\_\_init\_\_.py \| \| vllm_ascend/ops/activation.py \| \| vllm_ascend/ops/flashcomm2_oshard_manager.py \| \| vllm_ascend/ops/layernorm.py \| \| vllm_ascend/ops/mla.py \| \| vllm_ascend/ops/mm_encoder_attention.py \| \| vllm_ascend/ops/register_custom_ops.py \| \| vllm_ascend/ops/vocab_parallel_embedding.py \| \| vllm_ascend/ops/weight_prefetch.py \| \| vllm_ascend/spec_decode/\_\_init\_\_.py \| \| vllm_ascend/spec_decode/eagle_proposer.py \| \| vllm_ascend/spec_decode/interface.py \| \| vllm_ascend/spec_decode/mtp_proposer.py \| \| vllm_ascend/spec_decode/ngram_proposer.py \| \| vllm_ascend/spec_decode/suffix_proposer.py \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `d7e17aaacd` Signed-off-by: MrZ20 <2609716663@qq.com>	2026-02-07 09:16:07 +08:00
wangxiyuan	06c0aed124	[CI] Fix broken CI (#6599 ) Revert `4fb3d5e1b2` it breaks E2E Test - vLLM version: v0.15.0 - vLLM main: `d7e17aaacd`	2026-02-06 17:23:58 +08:00
SILONG ZENG	4fb3d5e1b2	[Lint]Style: Convert `vllm-ascend/` to ruff format(Batch #8 ) (#6129 ) ### What this PR does / why we need it? Scope of Changes: \| File Path \| \| :--- \| \| vllm_ascend/ops/\_\_init\_\_.py \| \| vllm_ascend/ops/activation.py \| \| vllm_ascend/ops/flashcomm2_oshard_manager.py \| \| vllm_ascend/ops/layernorm.py \| \| vllm_ascend/ops/mla.py \| \| vllm_ascend/ops/mm_encoder_attention.py \| \| vllm_ascend/ops/register_custom_ops.py \| \| vllm_ascend/ops/vocab_parallel_embedding.py \| \| vllm_ascend/ops/weight_prefetch.py \| \| vllm_ascend/spec_decode/\_\_init\_\_.py \| \| vllm_ascend/spec_decode/eagle_proposer.py \| \| vllm_ascend/spec_decode/interface.py \| \| vllm_ascend/spec_decode/mtp_proposer.py \| \| vllm_ascend/spec_decode/ngram_proposer.py \| \| vllm_ascend/spec_decode/suffix_proposer.py \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `d68209402d` Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: SILONG ZENG <2609716663@qq.com>	2026-02-06 15:25:08 +08:00
simplzyu	f8d03d21f1	Add Medusa speculative decoding support for vllm_ascend (#5668 ) ### What this PR does / why we need it? `vllm_ascend` already supports several speculative decoding strategies such as MTP, EAGLE, N-gram, and suffix decoding. However, Medusa is not yet supported. Medusa is an efficient speculative decoding framework that leverages a lightweight draft model to propose multiple tokens in a single step, which can significantly improve decoding throughput and reduce latency. To enable Medusa-based speculative decoding on Ascend hardware and provide more decoding options for users, this PR adds Medusa support into the `vllm_ascend` speculative decoding pipeline. ### Does this PR introduce _any_ user-facing change? This PR introduces Medusa speculative decoding as an additional speculative decoding method: ✔ Adds `MedusaProposer` and integrates it into the speculative decoding registry ✔ Extends `SpecDcodeType` with a `MEDUSA` enum entry ✔ Updates `NPUModelRunner` to recognize and invoke Medusa during decoding ✔ Adds Medusa-specific handling in the draft token generation logic ✔ Ensures backward compatibility — Medusa is only used when explicitly enabled Key code changes include: * New file: `vllm_ascend/spec_decode/medusa_proposer.py` * Register Medusa in `get_spec_decode_method` * Extend proposer type hints to include `MedusaProposer` * Add a Medusa-specific branch in `generate_draft_token_ids` * Pass `sample_hidden_states` required by Medusa ### How was this patch tested? Medusa is implemented as a new proposer class (`MedusaProposer`) following the existing speculative decoding interface. The integration works as follows: 1. Users enable Medusa via the speculative decoding configuration. 2. `get_spec_decode_method()` returns a `MedusaProposer` instance when `method="medusa"`. 3. During decoding, `NPUModelRunner` detects that the active drafter is a `MedusaProposer`. 4. Instead of the generic speculative decoding path, the Medusa-specific `generate_token_ids()` method is invoked, which consumes: * `valid_sampled_token_ids` * `sampling_metadata` * `spec_decode_metadata` * `sample_hidden_states` 5. The proposed tokens are validated by the target model as usual. When Medusa is not enabled, the decoding pipeline behaves exactly as before, ensuring full backward compatibility. - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` Signed-off-by: simplzyu <191163281@qq.com> Signed-off-by: simplzyu <zhenyuguo@cmbchina.com>	2026-01-23 14:14:23 +08:00

6 Commits