xc-llm-ascend/examples at 0c4aa2b4f1d20cb7ee94e1657832dc5569cd9617 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

xleoken 3ef45d0cc2 feat: Improve the offline_inference npu v0/v1 scripts (#1669 )

### What this PR does / why we need it?

Improve
- Keep the same file name format as v1, `offline_inference_npu_v0.py`,
`offline_inference_npu_v1.py`
- Use `VLLM_USE_V1` = 0/1 clearly in py scripts
- Fix some run errors in `offline_inference_npu_v1.py`, e.g.
`deepseekv3-lite-base-latest` not exists in modescope or hf.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- vLLM version: v0.9.2
- vLLM main:
baed180aa0

Signed-off-by: xleoken <xleoken@163.com>

2025-07-09 17:03:53 +08:00

..

disaggregated_prefill

[fix] fix bug in 1p1d disaggregated_prefill example (#1184 )

2025-06-12 19:40:58 +08:00

[EPLB] support deepseek eplb strategy (#1196 )

2025-07-07 17:22:08 +08:00

offline_data_parallel.py

[DP] Tiny fix of dp and update example (#1273 )

2025-06-25 11:03:04 +08:00

offline_disaggregated_prefill_npu.py

[Feature] Add PD separation feature (#432 )

2025-04-15 15:11:35 +08:00

offline_distributed_inference_npu.py

[CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460 )

2025-04-17 14:59:56 +08:00

offline_dualbatch_overlap_npu.py

[perf]: support dual-batch overlap(dbo) for deepseek (#941 )

2025-06-07 16:46:58 +08:00

offline_embed.py

Fix lint in examples/offline_embed.py (#1618 )

2025-07-03 21:40:29 +08:00

offline_inference_audio_language.py

[Doc] Add qwen2-audio eager mode tutorial (#1371 )

2025-06-26 16:56:05 +08:00

offline_inference_npu_v0.py

feat: Improve the offline_inference npu v0/v1 scripts (#1669 )

2025-07-09 17:03:53 +08:00

offline_inference_npu_v1.py

feat: Improve the offline_inference npu v0/v1 scripts (#1669 )

2025-07-09 17:03:53 +08:00

offline_inference_sleep_mode_npu.py

[Doc] Add sleep mode doc (#1295 )

2025-06-25 14:07:14 +08:00

offline_multi_step_custom_ops.py

Fix the device error when using ray as vllm-acend backend (#884 )

2025-06-16 21:03:16 +08:00

prompt_embedding_inference.py

[ModelRunner] Support embedding inputs (#916 )

2025-06-06 20:21:13 +08:00

run_dp_attention_etp16_benmark.sh

etp best a2 (#1101 )

2025-06-11 10:40:50 +08:00

run_dp_attention_etp16.sh

[Doc] remove environment variable VLLM_ENABLE_MC2 (#1406 )

2025-06-24 21:18:10 +08:00

run_dp_server.sh

[Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694 )

2025-05-01 22:31:36 +08:00