xc-llm-ascend/vllm_ascend at 52f0f9b5e431119d0970f784d9e8a39892393d9b - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

1kzk 52f0f9b5e4 [0.18.0][BugFix]: order acl graph updates before model forward for ENPU (#8317 )

<!-- Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->
For the ENPU scenario, it is required that device events follow the
principle of "record first, wait later", otherwise the inference process
may become stuck. However, in the current model_forward function,
event.wait precedes event.record. Therefore, for the ENPU scenario,
graph parameter updates should be performed before model execution.

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

---------

Signed-off-by: 1zzk <785396250@qq.com>
Signed-off-by: 1kzk <785396250@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

2026-04-16 16:26:59 +08:00

..

[BugFix][0.18.0][310p] fix post-sampling not working in graph mode on 310p (#8077 )

2026-04-09 16:31:38 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[0.18.0][BugFix] Add PrefillNoCache state in mla _forward_decode for short prompt (#8264 )

2026-04-15 09:23:52 +08:00

[Feat][SP] Suport SP for VL MoE models (#7044 )

2026-03-24 17:16:00 +08:00

[v0.18.0][Misc] Recompute scheduler upgrade to vLLM 0.18.0 (#7720 )

2026-03-27 18:24:53 +08:00

[A5][bugfix] Fix fused MoE A5 MXFP8 scale normalization, load-balance routing and gating_topk ops (#7573 )

2026-03-25 17:20:28 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[BugFix][v0.18.0] require piecewise cudagraph for layerwise AscendSto… (#8282 )

2026-04-16 10:40:14 +08:00

[V0.18.0][EPLB][BugFix] Fix moe_load precision in allgather (#7890 )

2026-04-02 09:20:31 +08:00

upgrade to 0.18.0 (#7502 )

2026-03-21 16:05:38 +08:00

[Bugfix][LoRA] Fix the bug when runs Qwen3-Reranker-0.6B with LoRA. (#7156 )

2026-03-15 17:55:42 +08:00

[ModelLoader][Feature] Add rfork support for fast model loading (#7392 )

2026-03-25 16:40:30 +08:00

[v0.18.0][BugFix] Fix Qwen3.5 MoE flash comm v1 shared expert shape error of mtp layer on A2 (#8004 )

2026-04-13 17:36:09 +08:00

[BugFix][Platform] Fix extra function name in final chunk of streaming tool calls (#8178 )

2026-04-15 17:50:10 +08:00

[v0.18.0]feat(quant): add C8 INT8 KV cache support for GQA attention models (#7474 ) (#8007 )

2026-04-08 10:51:58 +08:00

[releases/v0.18.0][Triton][Sampler] Add penalty-related Triton kernel for better performance of penalties (#7794 )

2026-03-31 19:01:51 +08:00

[v0.18.0][BugFix] Fix Qwen3.5 MoE flash comm v1 shared expert shape error of mtp layer on A2 (#8004 )

2026-04-13 17:36:09 +08:00

[0.18.0][BugFix]: order acl graph updates before model forward for ENPU (#8317 )

2026-04-16 16:26:59 +08:00

Main2main upgrade to vllm 0317 afternoon (#7409 )

2026-03-18 23:24:27 +08:00

__init__.py

[ModelLoader][Feature] Add rfork support for fast model loading (#7392 )

2026-03-25 16:40:30 +08:00

ascend_config.py

[feat] support dispatch_v2/combine_v2 hierarchy communication (#7698 )

2026-03-27 09:20:16 +08:00

ascend_forward_context.py

[Bugfix][eager][oom] fix rank0 load imbalance by no padding when multi dp (#7297 )

2026-03-23 17:05:02 +08:00

batch_invariant.py

[CI] Add pre-commit check for patch logger (#7446 )

2026-03-19 16:53:20 +08:00

cpu_binding.py

[BugFix] Enforce C locale for CPU binding subprocess parsing (#8261 )

2026-04-16 16:17:10 +08:00

envs.py

[Misc] Drop Prefetch MLP Env (#7357 )

2026-03-19 14:27:27 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

[BugFix] Add async communication check for capturing mode (#8149 )

2026-04-12 21:52:54 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[0.18.0][BugFix]: order acl graph updates before model forward for ENPU (#8317 )

2026-04-16 16:26:59 +08:00