xc-llm-ascend/vllm_ascend at 2a87b4cecbe94208aba8d12cd3b32eb6913c0f48 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

xuyexiong 2a87b4cecb [Bugfix] Fix specdecoding in chunkedprefill scenario (#3025 )

### What this PR does / why we need it?

The speculative decode phase of chunkedprefill has taken an incorrect
path, should always use TND layout for speculative decoding.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.10.2
- vLLM main:
6d8246aaff

Signed-off-by: xuyexiong <xuyexiong@huawei.com>

2025-09-19 14:05:08 +08:00

..

[Bugfix] Fix specdecoding in chunkedprefill scenario (#3025 )

2025-09-19 14:05:08 +08:00

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

main add ascend scheduler support multimodal (#2844 )

2025-09-14 09:38:51 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[Feat] A Connector that supports Mooncake store (#2913 )

2025-09-18 14:04:45 +08:00

Dynamic Expert Load Balance with Zero-like-overhead (#2956 )

2025-09-17 10:36:43 +08:00

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

[main] addrmsnorm + quant fusion optim in Dense Models (#2772 )

2025-09-16 22:31:38 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

[Feature] Support moe multi-stream for aclgraph. (#2946 )

2025-09-19 11:06:45 +08:00

[refactor] refactor deepseek-related files (#2849 )

2025-09-16 14:13:07 +08:00

Dynamic Expert Load Balance with Zero-like-overhead (#2956 )

2025-09-17 10:36:43 +08:00

[main] add pd transfer for ascend scheduler (#2753 )

2025-09-10 08:46:39 +08:00

[Feat][Graph] Support MTP for ACL Graph (#2932 )

2025-09-18 14:05:33 +08:00

[Bugfix] Fix specdecoding in chunkedprefill scenario (#3025 )

2025-09-19 14:05:08 +08:00

[BugFix] Async scheduling and PP compatibility with DP (#2796 )

2025-09-19 11:29:50 +08:00

__init__.py

Bump torch version to 2.7.1 (#1562 )

2025-08-05 08:43:24 +08:00

ascend_config.py

[Feature] Support moe multi-stream for aclgraph. (#2946 )

2025-09-19 11:06:45 +08:00

ascend_forward_context.py

[main] addrmsnorm + quant fusion optim in Dense Models (#2772 )

2025-09-16 22:31:38 +08:00

envs.py

[Ops] Fix bug in register_custom_ops without forward_context (#2883 )

2025-09-12 16:58:08 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[HybridKV] Fix prefill disaggregation kvcache addr alignment & use hybrid kv cache only when running qwen3_next (#3007 )

2025-09-18 21:43:22 +08:00

utils.py

[Feature] Support moe multi-stream for aclgraph. (#2946 )

2025-09-19 11:06:45 +08:00