xc-llm-ascend/vllm_ascend at 6ce1dc162a9bf464cc6074b98a500d97cd1d624f - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Debonet 6ce1dc162a [v0.18.0] fix(attention): reuse weight address in graph + RL scenario (#7715 )

### What this PR does / why we need it?

In graph + RL scenario, we only capture the graph once, and the weight
address is expected to be the same across iterations. However, when
calling .contiguous() on weight tensors, a new memory address may be
allocated, causing the graph to capture incorrect weight addresses.
This PR modifies the weight update logic in AscendMLAImpl and
AscendSFAImpl to use copy_() instead of reassignment, ensuring the
weight addresses remain consistent across iterations.

detailed in #7473

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Signed-off-by: Debonex <719893090@qq.com>

2026-03-27 14:11:20 +08:00

..

[310P]: add torch chunk gated delta rule and 910b parity ut (#7594 )

2026-03-25 16:46:43 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[v0.18.0] fix(attention): reuse weight address in graph + RL scenario (#7715 )

2026-03-27 14:11:20 +08:00

[Feat][SP] Suport SP for VL MoE models (#7044 )

2026-03-24 17:16:00 +08:00

[CI] Add pre-commit check for patch logger (#7446 )

2026-03-19 16:53:20 +08:00

[A5][bugfix] Fix fused MoE A5 MXFP8 scale normalization, load-balance routing and gating_topk ops (#7573 )

2026-03-25 17:20:28 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[Lint]Add lint hooks for clang-format, shellcheck, forbidden imports, and boolean context manager checks (#7511 )

2026-03-24 20:03:01 +08:00

[EPLB] Reduce the memory used for batch_isend_irecv (#7344 )

2026-03-20 12:25:58 +08:00

upgrade to 0.18.0 (#7502 )

2026-03-21 16:05:38 +08:00

[Bugfix][LoRA] Fix the bug when runs Qwen3-Reranker-0.6B with LoRA. (#7156 )

2026-03-15 17:55:42 +08:00

[ModelLoader][Feature] Add rfork support for fast model loading (#7392 )

2026-03-25 16:40:30 +08:00

[feat] support dispatch_v2/combine_v2 hierarchy communication (#7698 )

2026-03-27 09:20:16 +08:00

[v0.18.0][Bugfix][Platform] Fix MiniMax M2 reasoning token usage accounting (#7700 )

2026-03-27 10:45:28 +08:00

[A5][bugfix] Fix fused MoE A5 MXFP8 scale normalization, load-balance routing and gating_topk ops (#7573 )

2026-03-25 17:20:28 +08:00

[Feature] Add docs of batch invariance and make some extra operators patch (#6910 )

2026-03-05 09:12:40 +08:00

Qwen3.5 MoE supports flashcomm v1 (#7644 )

2026-03-25 23:09:33 +08:00

[v0.18.0][CI] Fix releases/v0.18.0 ci test only support vllm v0.18.0 (#7686 )

2026-03-26 18:36:04 +08:00

Main2main upgrade to vllm 0317 afternoon (#7409 )

2026-03-18 23:24:27 +08:00

__init__.py

[ModelLoader][Feature] Add rfork support for fast model loading (#7392 )

2026-03-25 16:40:30 +08:00

ascend_config.py

[feat] support dispatch_v2/combine_v2 hierarchy communication (#7698 )

2026-03-27 09:20:16 +08:00

ascend_forward_context.py

[Bugfix][eager][oom] fix rank0 load imbalance by no padding when multi dp (#7297 )

2026-03-23 17:05:02 +08:00

batch_invariant.py

[CI] Add pre-commit check for patch logger (#7446 )

2026-03-19 16:53:20 +08:00

cpu_binding.py

[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (#6945 )

2026-03-03 17:20:52 +08:00

envs.py

[Misc] Drop Prefetch MLP Env (#7357 )

2026-03-19 14:27:27 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

[feat] support dispatch_v2/combine_v2 hierarchy communication (#7698 )

2026-03-27 09:20:16 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[feat] support dispatch_v2/combine_v2 hierarchy communication (#7698 )

2026-03-27 09:20:16 +08:00