xc-llm-ascend/ut at 554f16ae1fb89b35ad82b24e7f7fe5eaba0e80d0 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Shanshan Shen e52ebf8674 [MM][Model][Perf] Remove Qwen2.5-VL modeling files and add patch for VisionAttention (#4349 )

### What this PR does / why we need it?

- [x] Patch `Qwen2_5_VisionAttention` with
`AscendQwen2_5_VisionAttention`.
- [x] Replace `AscendQwen2_5_VisionTransformer` with
`Qwen2_5_VisionTransformer` in vllm.
- [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of
`Qwen2_5_VisionAttention`.
- [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative
form to intervals and move it to cpu (compatible for npu FA).
- [x] Remove Qwen2.5-VL modeling files.
- [x] Remove Qwen2.5-VL (without padding) modeling files.
- [x] Remove related UT.
- [x] Make `set_forward_context` pluggable when getting MM embedding.
Find more details at https://github.com/vllm-project/vllm/pull/29388.
- [x] Simplify padding logic for FA.
- [x] Add patch for https://github.com/vllm-project/vllm/pull/28798.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- [x] Functional test (eager mode)
- [x] Functional test (graph mode)
- [x] Benchmark


- vLLM version: v0.11.2

---------

Signed-off-by: shen-shanshan <467638484@qq.com>

2025-11-28 14:23:00 +08:00

..

[UT] Fix ut test (#4472 )

2025-11-26 21:37:47 +08:00

upgrade to vllm 0.11.2 (#4400 )

2025-11-26 11:48:58 +08:00

upgrade to vllm 0.11.2 (#4400 )

2025-11-26 11:48:58 +08:00

device_allocator

add ut for device allocator/camem and mutistream/layers (#2037 )

2025-07-31 19:17:27 +08:00

[Feat] flashcomm_v2 optim solution (#3232 )

2025-11-10 11:01:45 +08:00

eplb redundant expert bugfix (#4291 )

2025-11-21 14:24:35 +08:00

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

upgrade to vllm 0.11.2 (#4400 )

2025-11-26 11:48:58 +08:00

model_loader/netloader

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

[MM][Model][Perf] Remove Qwen2.5-VL modeling files and add patch for VisionAttention (#4349 )

2025-11-28 14:23:00 +08:00

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

patch/worker/patch_common

[Refactor] refactor patch module (#3555 )

2025-10-21 20:19:46 +08:00

[Quantization] Support compressed tensors w8a8 static and w8a8 dynamic weight (#4036 )

2025-11-28 14:09:39 +08:00

[UT] Fix test_sample_recovered_tokens_pytorch_autoregressive (#3434 )

2025-10-24 11:20:57 +08:00

[TEST] Add eagle proposer ut (#4447 )

2025-11-27 21:59:31 +08:00

[bugfix] fix ray start failed: local_world_size cannot little than visible device count error (#4457 )

2025-11-27 21:18:32 +08:00

[bugfix] fix ray start failed: local_world_size cannot little than visible device count error (#4457 )

2025-11-27 21:18:32 +08:00

__init__.py

[2/4][Refactor] Refactor torchair utils (#1892 )

2025-07-21 19:43:30 +08:00

base.py

[Feature]: implement the fusion of allreduce and matmul in prefill phase when tp is enabled (#1926 )

2025-07-28 15:13:37 +08:00

conftest.py

[1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841 )

2025-07-18 23:07:14 +08:00

test_ascend_config.py

oproj TP support acl graph (#4073 )

2025-11-11 19:39:06 +08:00

test_envs.py

[Misc] Remove redundant imported envs, using envs_ascend instead (#2193 )

2025-08-14 09:33:39 +08:00

test_platform.py

[Quantization] Support compressed tensors w8a8 static and w8a8 dynamic weight (#4036 )

2025-11-28 14:09:39 +08:00

test_utils.py

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00