xc-llm-ascend/examples at 1cd27da5fb6bae2376c93e1e50bef08eea29ab12 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

ttanzhiqiang 60519c71bd shared_experts+router_experts merge all_reduce(Improve TTOP 5ms) (#1395 )

### What this PR does / why we need it?
When all_reduce_merge is in progress, shared_experts does not do
all_reduce in mlp, but waits until shared_experts+router_experts are
completed before doing all_reduce
In prefill and decode, as long as shared_experts+router_experts are
all_reduce, there will be benefits.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
bash examples/run_dp_attention_etp16.sh
bash examples/run_dp_attention_etp16_benmark.sh
- vLLM version: v0.9.1
- vLLM main:
977180c912

---------

Signed-off-by: ttanzhiqiang <389825161@qq.com>

2025-07-10 12:07:05 +08:00

..

disaggregated_prefill

[fix] fix bug in 1p1d disaggregated_prefill example (#1184 )

2025-06-12 19:40:58 +08:00

[EPLB] support deepseek eplb strategy (#1196 )

2025-07-07 17:22:08 +08:00

offline_data_parallel.py

[DP] Tiny fix of dp and update example (#1273 )

2025-06-25 11:03:04 +08:00

offline_disaggregated_prefill_npu.py

[Feature] Add PD separation feature (#432 )

2025-04-15 15:11:35 +08:00

offline_distributed_inference_npu.py

[CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460 )

2025-04-17 14:59:56 +08:00

offline_dualbatch_overlap_npu.py

[perf]: support dual-batch overlap(dbo) for deepseek (#941 )

2025-06-07 16:46:58 +08:00

offline_embed.py

Fix lint in examples/offline_embed.py (#1618 )

2025-07-03 21:40:29 +08:00

offline_inference_audio_language.py

[Doc] Add qwen2-audio eager mode tutorial (#1371 )

2025-06-26 16:56:05 +08:00

offline_inference_npu_v0.py

feat: Improve the offline_inference npu v0/v1 scripts (#1669 )

2025-07-09 17:03:53 +08:00

offline_inference_npu_v1.py

[CI] Fix lint in CI (#1712 )

2025-07-10 10:47:18 +08:00

offline_inference_sleep_mode_npu.py

[Doc] Add sleep mode doc (#1295 )

2025-06-25 14:07:14 +08:00

offline_multi_step_custom_ops.py

Fix the device error when using ray as vllm-acend backend (#884 )

2025-06-16 21:03:16 +08:00

prompt_embedding_inference.py

[ModelRunner] Support embedding inputs (#916 )

2025-06-06 20:21:13 +08:00

run_dp_attention_etp16_benmark.sh

shared_experts+router_experts merge all_reduce(Improve TTOP 5ms) (#1395 )

2025-07-10 12:07:05 +08:00

run_dp_attention_etp16.sh

shared_experts+router_experts merge all_reduce(Improve TTOP 5ms) (#1395 )

2025-07-10 12:07:05 +08:00

run_dp_server.sh

[Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694 )

2025-05-01 22:31:36 +08:00