xc-llm-ascend/vllm_ascend at 367edff5af202e378ad088090439d6389b40fa5d - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Mengqing Cao 367edff5af [HybridKV] Fix prefill disaggregation kvcache addr alignment & use hybrid kv cache only when running qwen3_next (#3007 )

### What this PR does / why we need it?
This pr fixes a few issues on prefill disaggregation:
1. Fix prefill disaggregation kvcache addr alignment issue, llmdatadist
needs the addr of tensors to be aligned with 2M
2. Fix prefill disaggregation kvcache shape error, llmdatadist requires
k/v tensors with shape [num_blocks, ...], however the implentment before
this pr is [2, num_blocks, ...], which will break prefill disaggregation
3. Use hybrid kv cache only when running qwen3_next to fix accuracy
issue on prefill disaggregation.

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
Tested locally by @liziyu179 

- vLLM version: v0.10.2
- vLLM main:
4f02b77de4

---------

Signed-off-by: MengqingCao <cmq0113@163.com>

2025-09-18 21:43:22 +08:00

..

Remove chunked_prefill_for_mla and fix ring_mla bug (#2781 )

2025-09-18 19:43:26 +08:00

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

main add ascend scheduler support multimodal (#2844 )

2025-09-14 09:38:51 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[Feat] A Connector that supports Mooncake store (#2913 )

2025-09-18 14:04:45 +08:00

Dynamic Expert Load Balance with Zero-like-overhead (#2956 )

2025-09-17 10:36:43 +08:00

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

[main] addrmsnorm + quant fusion optim in Dense Models (#2772 )

2025-09-16 22:31:38 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

refactor linear (#2867 )

2025-09-18 14:09:19 +08:00

[refactor] refactor deepseek-related files (#2849 )

2025-09-16 14:13:07 +08:00

Dynamic Expert Load Balance with Zero-like-overhead (#2956 )

2025-09-17 10:36:43 +08:00

[main] add pd transfer for ascend scheduler (#2753 )

2025-09-10 08:46:39 +08:00

[Feat][Graph] Support MTP for ACL Graph (#2932 )

2025-09-18 14:05:33 +08:00

Remove chunked_prefill_for_mla and fix ring_mla bug (#2781 )

2025-09-18 19:43:26 +08:00

[HybridKV] Fix prefill disaggregation kvcache addr alignment & use hybrid kv cache only when running qwen3_next (#3007 )

2025-09-18 21:43:22 +08:00

__init__.py

Bump torch version to 2.7.1 (#1562 )

2025-08-05 08:43:24 +08:00

ascend_config.py

Add an option of enable frozen parameter (#2869 )

2025-09-17 12:00:44 +08:00

ascend_forward_context.py

[main] addrmsnorm + quant fusion optim in Dense Models (#2772 )

2025-09-16 22:31:38 +08:00

envs.py

[Ops] Fix bug in register_custom_ops without forward_context (#2883 )

2025-09-12 16:58:08 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[HybridKV] Fix prefill disaggregation kvcache addr alignment & use hybrid kv cache only when running qwen3_next (#3007 )

2025-09-18 21:43:22 +08:00

utils.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00