Files
xc-llm-ascend/vllm_ascend
weiguihua2 d752c030e9 [Bugfix] fix pcp 128K break (#5266)
### What this PR does / why we need it?
[Bugfix] Fixing the issue where 128K context does not work in long
sequence scenarios.

This issue is caused by not splitting num_token according to pcp_size
during profile_run.
During `profile_run`, a warm-up is performed based on
`self.max_num_tokens`. When PCP is enabled, each PCP group will only
schedule up to `self.max_num_tokens / pcp_size`. After `profile_run` is
completed, the original scheduling size needs to be restored.

This is a temporary workaround; once
https://github.com/vllm-project/vllm/pull/28988/files is implemented,
this part can be removed.

- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2025-12-25 11:58:52 +08:00
..
2025-12-20 17:03:25 +08:00
2025-12-02 22:10:52 +08:00
2025-12-11 18:45:43 +08:00
2025-12-25 09:17:06 +08:00
2025-12-02 17:35:47 +08:00
2025-12-23 23:52:11 +08:00