xc-llm-ascend

Files

Jingchun Gao b390e0ef78 [Bugfix] Fix PP+PCP and PP+flashcomm1 bugs (#5416 )

- Fixed the computing of final hidden_states when enabling pipeline
parallel and prefill context parallel at the same time. Only in the last
PP rank, hidden_states are required and have right tensor type.
- Fixed the shape of intermediate_tensors in the dummy_run when enabling
pipeline parallel and flashcomm1. The intermediate_tensors should be
divided by tp_size. Otherwise, the moe will raise issues.
- Fixed the shape of self.intermediate_tensors for sufficient slice
space

- vLLM version: release/v0.13.0
- vLLM main:
81786c8774

---------

Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>

2026-01-26 16:53:07 +08:00

model runner v2 support triton of penalty (#5854 )

2026-01-20 12:26:05 +00:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

block_table.py

[feature] support pcp + mtp in full graph (#4572 )

2025-12-22 16:13:39 +08:00

model_runner_v1.py

[Bugfix] Fix PP+PCP and PP+flashcomm1 bugs (#5416 )

2026-01-26 16:53:07 +08:00