xc-llm-ascend

Files

jinyuxin 583ad8f347 [main][refractor] Refractor forward metadata retrieval across DP nodes to reduce redundant padding. (#2062 )

Before refactoring cross-DP decoding metadata aggregation, clean up the
token‐padding logic .
### What this PR does：

1. First checks whether any DP instance is in the prefill phase.

2. If in the `decode` phase and `torchair_graph_enabled `is true, pads
each DP instance’s token count up to the global maximum.

3. If in the `prefill` phase, or in decode phase with graph mode
**disabled**, returns each DP instance’s original token count without
padding.

This reordering removes the previous two‐step padding/unpadding flow and
ensures padding only occurs when strictly necessary.

- vLLM version: v0.10.0
- vLLM main:
bd3db7f469

Signed-off-by: yx0716 <jinyx1007@foxmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>

2025-08-05 17:03:36 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

eagle_proposer_v1.py

[Misc] Fix logger bug (#2024 )

2025-07-28 15:59:09 +08:00

model_runner_v1.py

[main][refractor] Refractor forward metadata retrieval across DP nodes to reduce redundant padding. (#2062 )

2025-08-05 17:03:36 +08:00

mtp_proposer_v1.py

[main][refactor] Refactoring forward_context and model_runner_v1 (#1979 )