xc-llm-ascend/singlecard at ec9ec78b53aa96e366ff16a438e44af3c272a547 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

whx 1b270a64bd [MoE][Multistream] Avoid performing communication in extra stream. (#3582 )

This PR moves the communication operation of shared experts out of extra
stream because I found that this might cause rtMemcpy related errors
when running shared experts multistream with aclgraph.

Furthermore, I utilize a global variable as extra stream object to avoid
allocating streams for each layer in full-graph mode.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: whx-sjtu <2952154980@qq.com>

2025-10-24 10:44:38 +08:00

..

Reapply "[MoE] [Refactor] Remove manual memory cleanup (#3365 )" (#3483 ) (#3512 )

2025-10-22 11:41:30 +08:00

[Fix] Fixes attribute error in MLA implementation (#3618 )

2025-10-23 09:12:50 +08:00

__init__.py

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

test_aclgraph.py

[Test] Temporarily skip flaky ACL graph test (#3577 )

2025-10-21 17:16:15 +08:00

test_ascend_scheduler.py

[Feat] Dynamic Batch Feature (#3490 )

2025-10-22 14:13:32 +08:00

test_bge_model.py

[Feat] Supports Aclgraph for bge-m3 (#3171 )

2025-10-14 23:07:45 +08:00

test_camem.py

ACLgraph enable: Test cases revisions for all features (#3388 )

2025-10-17 17:15:19 +08:00

test_chunked.py

ACLgraph enable: Test cases revisions for all features (#3388 )

2025-10-17 17:15:19 +08:00

test_embedding_aclgraph.py

[Feat] Supports Aclgraph for bge-m3 (#3171 )

2025-10-14 23:07:45 +08:00

test_embedding.py

ACLgraph enable: Test cases revisions for all features (#3388 )

2025-10-17 17:15:19 +08:00

test_guided_decoding.py

[Misc] Clean up useless patch (#3320 )

2025-10-09 14:07:26 +08:00

test_ilama_lora.py

ACLgraph enable: Test cases revisions for all features (#3388 )

2025-10-17 17:15:19 +08:00

test_multistream_overlap_shared_expert.py

[MoE][Multistream] Avoid performing communication in extra stream. (#3582 )

2025-10-24 10:44:38 +08:00

test_profile_execute_duration.py

Refactor e2e CI (#2276 )

2025-09-02 09:02:22 +08:00

test_quantization.py

ACLgraph enable: Test cases revisions for all features (#3388 )

2025-10-17 17:15:19 +08:00

test_sampler.py

Refactor e2e CI (#2276 )

2025-09-02 09:02:22 +08:00

test_vlm.py

ACLgraph enable: Test cases revisions for all features (#3388 )

2025-10-17 17:15:19 +08:00