xc-llm-ascend/singlecard at 7a6fde80b1d06bd6b28d4c551023dca67e5a71a1 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

gh924 6880c1b383 [Feature] Support for cross-attention and whisper model (#5592 )

### What this PR does / why we need it?
To solve the problem of the
issue：https://github.com/vllm-project/vllm-ascend/issues/2262

- support for cross-attention when the model is encoder-decoder
- support for whisper model

- vLLM version: v0.13.0
- vLLM main:
7157596103

Signed-off-by: gh924 <guihao2@huawei.com>
Co-authored-by: Aoxuan Chen <43376869+chenaoxuan@users.noreply.github.com>

2026-01-11 11:38:45 +08:00

..

[CI] cleanup single/multi-card test (#5623 )

2026-01-07 14:13:34 +08:00

model_runner_v2

[Feature] support eager mode in model runner v2 (#5210 )

2025-12-29 15:28:34 +08:00

[E2E] Optimize the E2E test time. (#5294 )

2025-12-26 14:17:50 +08:00

[Feat][Bugfix][main] Adapted SP to eagle3 (#5562 )

2026-01-08 15:33:52 +08:00

__init__.py

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

test_aclgraph_accuracy.py

[OP] Enable custom op aclnnMoeInitRoutingCustom (#5332 )

2026-01-09 09:35:18 +08:00

test_aclgraph_mem.py

[refactor] refactor model runner capture model (#5230 )

2025-12-30 08:32:14 +08:00

test_async_scheduling.py

[CI] Add Triton Ascend in CI (#4921 )

2025-12-23 12:47:35 +08:00

test_batch_invariant.py

[Feature] implement basic framework for batch invariant (#5517 )

2026-01-07 09:11:26 +08:00

test_camem.py

[E2E] Optimize the E2E test time. (#5294 )

2025-12-26 14:17:50 +08:00

test_completion_with_prompt_embeds.py

[E2E] Optimize the E2E test time. (#5294 )

2025-12-26 14:17:50 +08:00

test_cpu_offloading.py

[BugFix] Fix npu-cpu offloading interface change bug. (#5290 )

2025-12-27 10:21:20 +08:00

test_guided_decoding.py

[E2E] Optimize the E2E test time. (#5294 )

2025-12-26 14:17:50 +08:00

test_ilama_lora.py

[BufFix]Fix the error when using Ascend custom operators with rank=128 (#5394 )

2026-01-09 15:57:43 +08:00

test_models.py

[Feature] Support for cross-attention and whisper model (#5592 )

2026-01-11 11:38:45 +08:00

test_multistream_overlap_shared_expert.py

[CI] Add skipped testcases. (#5254 )

2025-12-24 10:41:32 +08:00

test_profile_execute_duration.py

Refactor e2e CI (#2276 )

2025-09-02 09:02:22 +08:00

test_quantization.py

[CI] cleanup single/multi-card test (#5623 )

2026-01-07 14:13:34 +08:00

test_sampler.py

[E2E] Optimize the E2E test time. (#5294 )

2025-12-26 14:17:50 +08:00

test_vlm.py

[E2E] Optimize the E2E test time. (#5294 )

2025-12-26 14:17:50 +08:00

test_xlite.py

[1/N][CI] Refactor accuracy test (#5400 )

2026-01-07 20:58:15 +08:00

utils.py

[1/N][CI] Refactor accuracy test (#5400 )

2026-01-07 20:58:15 +08:00