xc-llm-ascend/vllm_ascend at 775fbc4cd21718b53a033856822ff9f53b6b28cc - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

fems14 775fbc4cd2 【main】【bugfix】fix: restrict default MLAPO activation to Decode nodes only (#6451 )

### What this PR does / why we need it?
There is an issue with the current default logic for MLAPO (MLA Policy
Optimization). By design, MLAPO should only be enabled by default on
Decode (D) nodes. However, in hybrid (collocated prefill and decode)
scenarios, the strategy is erroneously activated during the Prefill
stage.
This PR corrects the default behavior to ensure that MLAPO is
exclusively enabled for the Decoding phase. This prevents unexpected
policy interference during Prefill and ensures optimal performance in
hybrid deployment environments.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: fems14 <1804143737@qq.com>

2026-01-31 22:44:56 +08:00

..

[Fixbugs]: fix refactor cause to 310p chunkprefill error (#6340 )

2026-01-28 16:41:32 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

【main】【bugfix】fix: restrict default MLAPO activation to Decode nodes only (#6451 )

2026-01-31 22:44:56 +08:00

[e2e Test][npugraph_ex]add static kernel e2e test case (#6320 )

2026-01-30 16:24:48 +08:00

[0.14.1][bugfix][sched] fix incompatibility of RecomputeScheduler with vllm v0.14.1 (#6286 )

2026-01-28 20:16:58 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[P/D] Using the cache load operator to replace the index select operator. (#6295 )

2026-01-30 14:27:53 +08:00

[EPLB][Bugfix] EPLB support fp/bf16 (#5531 )

2026-01-26 14:28:16 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

[Bugfix]Modify NPU rotary encoding parameter fields，fix RopeOperation setup failed in condition of self.rotary_dim < self.head_size (#6310 )

2026-01-30 21:25:04 +08:00

[Misc] Drop deepseek patch (#6288 )

2026-01-29 14:45:50 +08:00

[Refactor] Quantization Module Refactor (#5738 )

2026-01-23 14:13:47 +08:00

[ops] support advanced apply_top_k_top_p without top_k constraint (#6098 )

2026-01-26 09:08:42 +08:00

Qwen3-VL-MoE EAGLE support for vLLM-Ascend (#6327 )

2026-01-29 16:44:30 +08:00

[ModelRunner] Revert "[Fix] Pads query_start_loc to satisfy FIA/TND constraint (#6459 )

2026-01-31 16:33:34 +08:00

[CI] optimize lint term (#5986 )

2026-01-22 15:46:59 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[BugFix] Disable enable_shared_expert_dp by default if tensor_parallel_size=1 (#6361 )

2026-01-28 22:01:01 +08:00

ascend_forward_context.py

[Main2Main] Upgrade vllm commit to 0123 (#6169 )

2026-01-27 08:44:36 +08:00

batch_invariant.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

cpu_binding.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

envs.py

Default enable MLAPO (#5952 )

2026-01-22 09:26:39 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

platform.py

[Misc] Removes unnecessary graph size re-initialization (#6280 )

2026-01-27 14:38:07 +08:00

profiling_config.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

utils.py

[Misc] Removes unnecessary graph size re-initialization (#6280 )

2026-01-27 14:38:07 +08:00