xc-llm-ascend

Files

rjg-lyh 585a494baa [Core] Disable the chunked prefill feature in Non-MLA LLMs (#2894 )

### What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature
in Non-MLA models, as the performance of operators supporting this
functionality is currently suboptimal. Unless the user has enabled
chunked prefill in the ascend_scheduler_config, we would allow this
feature.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Related: https://github.com/vllm-project/vllm-ascend/pull/2659

- vLLM version: main
- vLLM main:
d21a36f5f9

Signed-off-by: rjg-lyh <1318825571@qq.com>

2025-09-12 23:17:09 +08:00

e2e

[main] mlp weight prefetch in Qwen Dense Models (#2816 )

2025-09-11 21:20:09 +08:00

[Core] Disable the chunked prefill feature in Non-MLA LLMs (#2894 )

2025-09-12 23:17:09 +08:00

__init__.py

[SpecDecode] Add spec decode support (#500 )

2025-04-17 20:16:32 +08:00