xc-llm-ascend/vllm_ascend at e18643f8a4d5bd9990727654318ad069ea0b56e2 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

pppeng 9a0b786f2b [bugfix][0.18.0] Fix race in non-blocking num_accepted_tokens (#8764 )

### What this PR does / why we need it?
The same fix from https://github.com/vllm-project/vllm/pull/36013.
In _update_states_after_model_execute, num_accepted_tokens is copied
from GPU to pinned CPU memory with non_blocking=True. The CPU-side numpy
view is later read in _build_attention_metadata during the next
execute_model call. With async scheduling, _bookkeeping_sync
deliberately avoids any CUDA synchronization (the whole point of async
scheduling), so there is no guarantee the DMA has landed before the CPU
read.

Signed-off-by: ppppeng <zepengliu912@qq.com>

2026-04-27 23:28:52 +08:00

..

[BugFix][310P][v0.18.0] Use CPU generator cache for sampling (#8624 )

2026-04-24 09:34:14 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[BugFix][0.18.0] Fix quant_bias missing in w8a8_static when flashcomm1 is enabled for GLM-5 (#8304 )

2026-04-17 22:46:36 +08:00

[v0.18.0][BugFix] PIECEWISE mode also requires synchronization (#8469 )

2026-04-21 16:22:32 +08:00

[v0.18.0][Misc] Recompute scheduler upgrade to vLLM 0.18.0 (#7720 )

2026-03-27 18:24:53 +08:00

[A5][bugfix] Fix fused MoE A5 MXFP8 scale normalization, load-balance routing and gating_topk ops (#7573 )

2026-03-25 17:20:28 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[releases/v0.18.0][Doc][Misc] Modifying Configuration Parameters (#8618 )

2026-04-23 16:23:31 +08:00

[V0.18.0][EPLB][BugFix] Fix moe_load precision in allgather (#7890 )

2026-04-02 09:20:31 +08:00

upgrade to 0.18.0 (#7502 )

2026-03-21 16:05:38 +08:00

[Bugfix][LoRA] Fix the bug when runs Qwen3-Reranker-0.6B with LoRA. (#7156 )

2026-03-15 17:55:42 +08:00

[ModelLoader][Feature] Add rfork support for fast model loading (#7392 )

2026-03-25 16:40:30 +08:00

[Bugfix][0.18.0] fix kernels in sample when mask is not static or draft_token_id is invalid (#8531 )

2026-04-23 23:04:19 +08:00

[v0.18.0][BugFix] Fix DSV3.1 W4A8 TTFT degradation (#8674 )

2026-04-27 23:27:34 +08:00

[BugFix][0.18.0] Fix quant_bias missing in w8a8_static when flashcomm1 is enabled for GLM-5 (#8304 )

2026-04-17 22:46:36 +08:00

[releases/v0.18.0][Triton][Sampler] Add penalty-related Triton kernel for better performance of penalties (#7794 )

2026-03-31 19:01:51 +08:00

[BugFix] fix hang in async scheduling while open ENPU (#8354 )

2026-04-18 00:07:15 +08:00

[bugfix][0.18.0] Fix race in non-blocking num_accepted_tokens (#8764 )

2026-04-27 23:28:52 +08:00

Main2main upgrade to vllm 0317 afternoon (#7409 )

2026-03-18 23:24:27 +08:00

__init__.py

[ModelLoader][Feature] Add rfork support for fast model loading (#7392 )

2026-03-25 16:40:30 +08:00

ascend_config.py

[BugFix][v0.18.0] Gate recompute/balance/fused_mc2 by PD mode (#8374 )

2026-04-18 18:06:42 +08:00

ascend_forward_context.py

[Bugfix][eager][oom] fix rank0 load imbalance by no padding when multi dp (#7297 )

2026-03-23 17:05:02 +08:00

batch_invariant.py

[CI] Add pre-commit check for patch logger (#7446 )

2026-03-19 16:53:20 +08:00

cpu_binding.py

[BugFix] Enforce C locale for CPU binding subprocess parsing (#8261 )

2026-04-16 16:17:10 +08:00

envs.py

[BugFix]Backport validate pd mode feature gates no fused mc2 v0.18.0 clean (#8583 )

2026-04-23 19:44:07 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

[BugFix]Backport validate pd mode feature gates no fused mc2 v0.18.0 clean (#8583 )

2026-04-23 19:44:07 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[Performance] Use forward_native for Conv3dLayer and add UT (#8375 )

2026-04-20 17:20:40 +08:00