xc-llm-ascend

Files

wangbj127 6bdc72949b Revert "[v0.18.0][BugFix] Fix dimension mismatch error when SP padding causes num_tokens_padded != num_tokens_unpadded" (#8413 )

Reverts vllm-project/vllm-ascend#8133
- Reversion of Logic: This pull request reverts the changes introduced
in a previous commit that attempted to handle dimension mismatches
during SP padding.

Signed-off-by: Wangbingjie <wangbj1207@126.com>

2026-04-18 20:43:42 +08:00

[v0.18.0][CI] Fix releases/v0.18.0 ci test only support vllm v0.18.0 (#7686 )

2026-03-26 18:36:04 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

block_table.py

[Hybrid] support prefix cache for Qwen3.5/Next with --mamba-cache-mode align (#7103 )

2026-03-15 09:44:09 +08:00

model_runner_v1.py

Revert "[v0.18.0][BugFix] Fix dimension mismatch error when SP padding causes num_tokens_padded != num_tokens_unpadded" (#8413 )

2026-04-18 20:43:42 +08:00

npu_input_batch.py

[Hybrid] support prefix cache for Qwen3.5/Next with --mamba-cache-mode align (#7103 )

2026-03-15 09:44:09 +08:00

pcp_utils.py

feat(attention_cp): support chunked prefill for Qwen3Next with PCP&DCP (#6900 )

2026-03-09 17:55:09 +08:00

worker.py

[0.18.0][profiler] profile AICore and MTE time with torch profiler (#7730 )

2026-03-27 16:37:54 +08:00