xc-llm-ascend/vllm_ascend at 8bcc0ccd571a001bcf1f428aceb2445ba0375fac - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

linfeng-yuan 8bcc0ccd57 [bugfix] fix shared expert dp with hybrid kvcache (#2964 )

### What this PR does / why we need it?
https://github.com/vllm-project/vllm-ascend/pull/2849 moves the
implementation of `shared_expert_dp` to torchair deepseek_modeling.
However, the calling of `set_forward_context` with `enforce_eager` and
`shared_expert_dp` falls back to the implementation of
model_runner_v1.py and set the global attn_metadata as a dictionary. It
leads to a RuntimerError when attn_metadata is got from the forward
context and used in torchair_deepseek_v2.py. This PR fixes this problem
by introducing the transformation of attn_metadata in this file.

Note that current E2E testing lacks the case of deepseek with
`shared_expert_dp`. We need to add an ST with `shared_expert_dp` in
testing workflow.

### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
e2e vllm serving with `enable_shared_expert_dp: true` passed.

- vLLM version: v0.10.2
- vLLM main:
de3e53a75b

Signed-off-by: linfeng-yuan <1102311262@qq.com>

2025-09-17 20:01:47 +08:00

..

[Bugfix] Fix mtp torchair in pd Disaggregation scenario (#2951 )

2025-09-17 09:07:58 +08:00

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

main add ascend scheduler support multimodal (#2844 )

2025-09-14 09:38:51 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

fix mooncake connector adxl hostname usage (#2824 )

2025-09-13 14:38:48 +08:00

Dynamic Expert Load Balance with Zero-like-overhead (#2956 )

2025-09-17 10:36:43 +08:00

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

[main] addrmsnorm + quant fusion optim in Dense Models (#2772 )

2025-09-16 22:31:38 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

Dynamic Expert Load Balance with Zero-like-overhead (#2956 )

2025-09-17 10:36:43 +08:00

[refactor] refactor deepseek-related files (#2849 )

2025-09-16 14:13:07 +08:00

Dynamic Expert Load Balance with Zero-like-overhead (#2956 )

2025-09-17 10:36:43 +08:00

[main] add pd transfer for ascend scheduler (#2753 )

2025-09-10 08:46:39 +08:00

[Main] [Refactor] Enable MoECommMethod in Eager Mode (#2791 )

2025-09-16 11:06:00 +08:00

[bugfix] fix shared expert dp with hybrid kvcache (#2964 )

2025-09-17 20:01:47 +08:00

Dynamic Expert Load Balance with Zero-like-overhead (#2956 )

2025-09-17 10:36:43 +08:00

__init__.py

Bump torch version to 2.7.1 (#1562 )

2025-08-05 08:43:24 +08:00

ascend_config.py

Add an option of enable frozen parameter (#2869 )

2025-09-17 12:00:44 +08:00

ascend_forward_context.py

[main] addrmsnorm + quant fusion optim in Dense Models (#2772 )

2025-09-16 22:31:38 +08:00

envs.py

[Ops] Fix bug in register_custom_ops without forward_context (#2883 )

2025-09-12 16:58:08 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[Feat][Graph] Support DeepSeek with ACL Graph (#2707 )

2025-09-16 17:50:17 +08:00

utils.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00