xc-llm-ascend/vllm_ascend at c860535246cc751b6be7d1da2092e4380013598c - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

yesyue-w c860535246 【A5】【Qwen VL】Qwen VL adapt for A5 (#7046 )

### What this PR does / why we need it?
Replace the '_npu_flash_attention_unpad' operator with the
'npu_fusion_attention' operator to ensure that the Qwen VL model can run
in the A5 environment and remove the 'mrope' operator call restriction
for A5.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

Signed-off-by: 汪越 <wangyue361@h-partners.com>

2026-03-20 16:56:12 +08:00

..

[CI] Add pre-commit check for patch logger (#7446 )

2026-03-19 16:53:20 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[bugfix][accuracy] Fix ds indexer accuracy problem caused by k rope (#7341 )

2026-03-18 14:20:21 +08:00

[main2main] upgrade vllm to 0308 (#7213 )

2026-03-18 09:24:43 +08:00

[CI] Add pre-commit check for patch logger (#7446 )

2026-03-19 16:53:20 +08:00

GMM custom operator optimization in small batch scenarios (#7100 )

2026-03-19 16:10:30 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[EPLB] Reduce the memory used for batch_isend_irecv (#7344 )

2026-03-20 12:25:58 +08:00

[EPLB] Reduce the memory used for batch_isend_irecv (#7344 )

2026-03-20 12:25:58 +08:00

[CI] Add pre-commit check for patch logger (#7446 )

2026-03-19 16:53:20 +08:00

[Bugfix][LoRA] Fix the bug when runs Qwen3-Reranker-0.6B with LoRA. (#7156 )

2026-03-15 17:55:42 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

【A5】【Qwen VL】Qwen VL adapt for A5 (#7046 )

2026-03-20 16:56:12 +08:00

[Bugfix] Restore balance scheduling patch for v0.17.0 (#7479 )

2026-03-19 20:12:57 +08:00

Adapt w8a8mxfp8 quantization for Qwen VL models (#7417 )

2026-03-20 16:18:58 +08:00

[Feature] Add docs of batch invariance and make some extra operators patch (#6910 )

2026-03-05 09:12:40 +08:00

[CI] Add pre-commit check for patch logger (#7446 )

2026-03-19 16:53:20 +08:00

[EPLB][Bugfix] Set parallel_config.enable_eplb to true to load redundant experts (#7470 )

2026-03-20 15:22:55 +08:00

Main2main upgrade to vllm 0317 afternoon (#7409 )

2026-03-18 23:24:27 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[Misc] Drop Prefetch MLP Env (#7357 )

2026-03-19 14:27:27 +08:00

ascend_forward_context.py

[BugFix]A2 MOE method&& layerwise MTP bugfix && Mamba gdn_metadata bugfix (#7364 )

2026-03-17 23:03:45 +08:00

batch_invariant.py

[CI] Add pre-commit check for patch logger (#7446 )

2026-03-19 16:53:20 +08:00

cpu_binding.py

[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (#6945 )

2026-03-03 17:20:52 +08:00

envs.py

[Misc] Drop Prefetch MLP Env (#7357 )

2026-03-19 14:27:27 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

[Feature][Quant] Reapply auto-detect quantization format and support remote model ID (#7111 )

2026-03-13 22:53:25 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[EPLB] Reduce the memory used for batch_isend_irecv (#7344 )

2026-03-20 12:25:58 +08:00