xc-llm-ascend/ut at 9e2965bae2403d1b9a18fdcd03d94704e1f9f01c - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Qi Mao 9d0b7c8e98 [Platform][BugFix] Preserve hybrid block size on Ascend (#7528 )

### What this PR does / why we need it
This PR fixes a startup regression for Ascend hybrid attention + mamba
models after upgrading to vLLM `0.18.0`.
However, after the vLLM `0.18.0` upgrade, worker initialization still
calls the generic platform hook:
- `current_platform.update_block_size_for_backend(vllm_config)`

### How this PR fixes it

This PR keeps the fix strictly inside `vllm-ascend`.

It adds an Ascend override for
`NPUPlatform.update_block_size_for_backend()`:

- for hybrid models, do not run the generic upstream block-size fallback
- preserve the block size that was already computed by the hybrid
model-specific config logic
- for non-hybrid models, keep the original upstream behavior unchanged

- vLLM version: v0.18.0
- vLLM main:
8b6325758c
---------
Signed-off-by: maoxx241 <maomaoyu870@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>

2026-03-22 11:21:49 +08:00

..

[refactor] replace scattered business kwargs with typed request objects and explicit stage boundaries (#7024 )

2026-03-20 23:23:57 +08:00

[Feature]Supports DSv3.1 PD separation and C8 quantization (#7222 )

2026-03-16 22:49:05 +08:00

batch_invariant

[Feature] Add docs of batch invariance and make some extra operators patch (#6910 )

2026-03-05 09:12:40 +08:00

[Feature] support aclgraph for model runner v2 (#7110 )

2026-03-13 09:11:46 +08:00

[MM][Bugfix] Update hf_config to hf_text_config (#5319 )

2026-01-06 16:41:39 +08:00

device_allocator

[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (#6945 )

2026-03-03 17:20:52 +08:00

[Main2Main] Upgrade vLLM to 0303 (#6944 )

2026-03-06 09:08:52 +08:00

[EPLB] Reduce the memory used for batch_isend_irecv (#7344 )

2026-03-20 12:25:58 +08:00

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

[P/D][PCP] mooncake layerwise support pcp function (#6627 )

2026-02-12 11:02:25 +08:00

model_loader/netloader

Revert "moe_gating_top_k" (#5512 )

2025-12-30 15:05:47 +08:00

[refactor] replace scattered business kwargs with typed request objects and explicit stage boundaries (#7024 )

2026-03-20 23:23:57 +08:00

patch/worker/patch_common

[Feat] Support routing replay (#6696 )

2026-02-26 10:22:47 +08:00

[refactor] replace scattered business kwargs with typed request objects and explicit stage boundaries (#7024 )

2026-03-20 23:23:57 +08:00

[Refactor] Import global var form vllm instead of overwirte it (#5469 )

2026-01-07 18:41:45 +08:00

[Bugfix] Fix padding logic in eagle proposer for kimi25 (#7348 )

2026-03-21 16:57:22 +08:00

[Feat] Support separate attention backend for target and draft model. (#7342 )

2026-03-21 10:48:01 +08:00

__init__.py

[2/4][Refactor] Refactor torchair utils (#1892 )

2025-07-21 19:43:30 +08:00

base.py

[Feature]: implement the fusion of allreduce and matmul in prefill phase when tp is enabled (#1926 )

2025-07-28 15:13:37 +08:00

conftest.py

[Main2Main] Upgrade vllm commit to 0105 (#5595 )

2026-01-06 08:44:29 +08:00

test_ascend_config.py

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

test_envs.py

[Misc] Remove redundant imported envs, using envs_ascend instead (#2193 )

2025-08-14 09:33:39 +08:00

test_platform.py

[Platform][BugFix] Preserve hybrid block size on Ascend (#7528 )

2026-03-22 11:21:49 +08:00

test_utils.py

[300I][Bugfix] fix unquant model weight nd2nz error (#6851 )

2026-03-03 15:57:26 +08:00