xc-llm-ascend/vllm_ascend at 18495f44b23754e7fafdf8f04f855e201e26aebe - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Angazenn 18495f44b2 [BugFix] Fix max_num_tokens_across_dp calculation bugs in attention_v1_torchair (#1636 )

### What this PR does / why we need it?
This PR fixes a bug that is caused by max_num_tokens_across_dp
calculation. In earlier version, we compute this by graph_pad_size plus
max_num_tokens(actual). This will result in different
max_num_tokens_across_dp across dp ranks. If padding related is
required, this might cause a wrong padding.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed normally.

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>

2025-07-07 20:03:02 +08:00

..

[BugFix] Fix max_num_tokens_across_dp calculation bugs in attention_v1_torchair (#1636 )

2025-07-07 20:03:02 +08:00

[CI] Upgrade vllm to 0.9.1 (#1165 )

2025-06-11 16:33:11 +08:00

[ModelRunner] Use shared CachedRequestData cross request to fix ci (#1546 )

2025-07-02 06:05:21 +08:00

device_allocator

[Build] Add build info (#1386 )

2025-06-27 09:14:43 +08:00

[bugfix] some bugs maybe fail to run (#896 )

2025-06-03 11:07:33 +08:00

[Bugfix] fix import error (#600 )

2025-04-22 08:57:25 +08:00

[CORE]initial support for torchair with non-mla backend (#1506 )

2025-07-03 22:21:42 +08:00

[perf]: support dual-batch overlap(dbo) for deepseek (#941 )

2025-06-07 16:46:58 +08:00

[Bugfix] Support Qwen3-MOE on aclgraph mode (#1381 )

2025-07-06 15:29:36 +08:00

[CORE]initial support for torchair with non-mla backend (#1506 )

2025-07-03 22:21:42 +08:00

[Bugfix] Add func swap_states to fix MLA attention (#1580 )

2025-07-02 17:42:53 +08:00

[CORE]initial support for torchair with non-mla backend (#1506 )

2025-07-03 22:21:42 +08:00

Spec decode support for V1 Engine (#874 )

2025-05-23 14:25:46 +08:00

[Performance] Disable JIT and nd2nz to improve performance for Altlas 300I series (#1591 )

2025-07-05 16:29:21 +08:00

__init__.py

[CI] Patch torch.library.infer_schema for fused moe ops to fix CI (#854 )

2025-05-14 19:49:09 +08:00

ascend_config.py

[CI] Follow vLLM FusedMoEParallelConfig interface change and clean up unused config (#1625 )

2025-07-04 17:54:33 +08:00

envs.py

support fused_moe_allgather_ep (#1335 )

2025-06-23 22:03:38 +08:00

platform.py

[CORE]initial support for torchair with non-mla backend (#1506 )

2025-07-03 22:21:42 +08:00

utils.py

[Quantization]300I Duo support w8a8 quantization (#1560 )

2025-07-03 22:12:46 +08:00