xc-llm-ascend/vllm_ascend at 434059e4179371f196671ba878e56115e0c790df - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

fan2956 434059e417 [BugFix] Fix multimodal model support fullgraph error (#3425 )

### What this PR does / why we need it?
Because the update_attn_params function requires passing the num_tokens
parameter, and num_tokens is obtained via postions.shape[0]. However,
the multimodal model uses mrope (Multidimensional Rotary Position
Embedding), which results in the postions having a shape of 2.
Consequently, postions.shape[0] retrieves an incorrect value.We resolve
this issue by replacing positions.shape[0] with maybe_padded_num_tokens.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: fan2956 <zhoufan53@huawei.com>

2025-10-14 21:51:09 +08:00

..

[Feat] Unquantized Linear to nz and control all nz-cast (#3356 )

2025-10-14 17:39:26 +08:00

fix pagedattention to support fullgraph. (#3436 )

2025-10-14 16:10:09 +08:00

[BugFix] Fix ascend scheduler assert error (#3191 )

2025-09-28 18:22:08 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[KVCache] Refactor KVCache as page_size_bytes is ineffective (#3438 )

2025-10-14 21:28:41 +08:00

Bugfix: Expose the user policy type interface (#3336 )

2025-10-11 16:28:57 +08:00

[Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153 )

2025-09-28 17:30:50 +08:00

[Feat] Unquantized Linear to nz and control all nz-cast (#3356 )

2025-10-14 17:39:26 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

[2/N][Feat] Attention and MoE weight prefetch in Qwen3MoE models (#3203 )

2025-10-14 20:16:33 +08:00

[KVCache] Refactor KVCache as page_size_bytes is ineffective (#3438 )

2025-10-14 21:28:41 +08:00

[Feat] Unquantized Linear to nz and control all nz-cast (#3356 )

2025-10-14 17:39:26 +08:00

Drop 0.10.2 (#3284 )

2025-10-09 10:28:38 +08:00

bugfix for mtp (#3300 )

2025-10-09 19:22:46 +08:00

[bugfix][torchair] fix missing weight nz cast for w13_weight in torchair_w8a8_dynamic.py (#3446 )

2025-10-14 21:11:05 +08:00

[BugFix] Fix multimodal model support fullgraph error (#3425 )

2025-10-14 21:51:09 +08:00

__init__.py

【bugfix】fix connector register failed (#3335 )

2025-10-09 21:09:54 +08:00

ascend_config.py

[2/N][Feat] Attention and MoE weight prefetch in Qwen3MoE models (#3203 )

2025-10-14 20:16:33 +08:00

ascend_forward_context.py

[2/N][Feat] Attention and MoE weight prefetch in Qwen3MoE models (#3203 )

2025-10-14 20:16:33 +08:00

envs.py

[Feat] Unquantized Linear to nz and control all nz-cast (#3356 )

2025-10-14 17:39:26 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125 )

2025-10-10 16:31:20 +08:00

utils.py

[Feat] Unquantized Linear to nz and control all nz-cast (#3356 )

2025-10-14 17:39:26 +08:00