xc-llm-ascend/ut at 6972df59514c2f04eb8e5dd6b7b2c25e0276a230 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

weijinqian0 6972df5951 [Feature] optimize sp & qwen3 next support sp. (#3225 )

This PR will accomplish the following tasks: 
**optimize SP**
In the old version implementation, the first layer was all_reduce, which
used rms to split chunks. We changed it to perform reduce_scatter on the
embedding side, replace one all_reduce operation and one chunk with one
reduce_scatter operation.
**Support qwen3 next**
Since Qwen3 Next includes a linear attention module, the prefix name of
this module cannot take effect directly.


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>

2025-10-13 23:02:12 +08:00

..

[Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125 )

2025-10-10 16:31:20 +08:00

[Test]Add unit test for compilation/acl_graph.py (#3039 )

2025-09-19 21:31:17 +08:00

[CORE] concurrent partial prefills (#2372 )

2025-09-24 17:12:55 +08:00

device_allocator

add ut for device allocator/camem and mutistream/layers (#2037 )

2025-07-31 19:17:27 +08:00

[Bugfix] TP size larger than KV cache head causes accuracy issues (#3366 )

2025-10-11 11:22:23 +08:00

[BugFix]Fix eplb problems when using dynamic eplb. (#3364 )

2025-10-11 14:04:02 +08:00

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

[Bugfix] TP size larger than KV cache head causes accuracy issues (#3366 )

2025-10-11 11:22:23 +08:00

[MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176 )

2025-10-09 14:12:46 +08:00

add ut for device allocator/camem and mutistream/layers (#2037 )

2025-07-31 19:17:27 +08:00

[Feature] optimize sp & qwen3 next support sp. (#3225 )

2025-10-13 23:02:12 +08:00

patch/worker/patch_common

[feat] support customized and separated hccl_buffer_size for process group initialization (#3073 )

2025-10-11 15:55:22 +08:00

[1/N][Feat] Add weight prefetch feature for Attention layers (#3146 )

2025-10-09 20:38:39 +08:00

[main] add pd transfer for ascend scheduler (#2753 )

2025-09-10 08:46:39 +08:00

[Feature] optimize sp & qwen3 next support sp. (#3225 )

2025-10-13 23:02:12 +08:00

[Aclgraph][DP] Fix dp dummy run not in aclgraph error (#3208 )

2025-09-30 11:14:51 +08:00

__init__.py

[2/4][Refactor] Refactor torchair utils (#1892 )

2025-07-21 19:43:30 +08:00

base.py

[Feature]: implement the fusion of allreduce and matmul in prefill phase when tp is enabled (#1926 )

2025-07-28 15:13:37 +08:00

conftest.py

[1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841 )

2025-07-18 23:07:14 +08:00

test_ascend_config.py

[Feature] Support moe multi-stream for aclgraph. (#2946 )

2025-09-19 11:06:45 +08:00

test_envs.py

[Misc] Remove redundant imported envs, using envs_ascend instead (#2193 )

2025-08-14 09:33:39 +08:00

test_platform.py

[refactor] refactor deepseek-related files (#2849 )

2025-09-16 14:13:07 +08:00

test_utils.py

[BugFix] Fix ACLgraph bug in Qwen3_32b_int8 case (#3204 )

2025-09-28 17:44:04 +08:00