SILONG ZENG
78d5ce3e01
[Lint]Style: Convert example to ruff format ( #5863 )
...
### What this PR does / why we need it?
This PR fixes linting issues in the `example/` to align with the
project's Ruff configuration.
- vLLM version: v0.13.0
- vLLM main:
bde38c11df
Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain >
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain >
2026-01-13 20:46:50 +08:00
wangxiyuan
cb33b09179
[Doc]clean up ascend scheduler config from doc ( #4612 )
...
clean up ascend scheduler config from doc
- vLLM version: v0.11.2
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-12-02 14:22:56 +08:00
Mengqing Cao
517fd9272d
Revert "drop ascend scheduler" ( #4580 )
...
Reverts vllm-project/vllm-ascend#4498
- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2
2025-11-29 22:20:48 +08:00
wangxiyuan
f10acddb78
drop ascend scheduler ( #4498 )
...
Ascend scheduler was added for non chunk prefill case before, since that
the npu ops didn't work well with chunked prefill.
Now the ops with chunked prefill work better, it's time to remove the
ascend scheduler to use vLLM default scheduler.
- vLLM version: v0.11.2
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-29 16:18:34 +08:00
LookAround0301
b54d44e664
support cp&dcp ( #3260 )
...
### What this PR does / why we need it?
This PR adds the Prefill Context Parallelism (PCP) feature, which
corresponds to DCP. For specific implementation details, please refer to
the RFC https://github.com/vllm-project/vllm/issues/25749 .
TL;DR: PCP enhances long-sequence inference capabilities by partitioning
the sequence dimension during the prefill stage.
### Does this PR introduce _any_ user-facing change?
The current implementation primarily includes the following changes:
Modified ModelRunner.py for CP partitioning logic for tokens;
Modified attention_v1.py and mla_v1.py to adapt the GQA/MLA backend to
PCP.
Modified block_tables.py to extend the KV cache storage based on
DCP&PCP;
Added necessary command-line arguments to control parallelism for PCP;
### How was this patch tested?
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
---------
Signed-off-by: LookAround <lixushi@huawei.com >
Signed-off-by: chenjie <chenjie137@huawei.com >
Signed-off-by: Delphine-Nic <tanwenqin@huawei.com >
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com >
Signed-off-by: Feng Liu <liufeng248@huawei.com >
Signed-off-by: gaojc <1055866782@qq.com >
Signed-off-by: weiguihua2 <weiguihua2@huawei.com >
Signed-off-by: z50049692 <zhangmingwei11@huawei.com >
Co-authored-by: chenjie <chenjie137@huawei.com >
Co-authored-by: Delphine-Nic <tanwenqin@huawei.com >
Co-authored-by: zhangsicheng5 <zhangsicheng5@huawei.com >
Co-authored-by: Feng Liu <liufeng248@huawei.com >
Co-authored-by: gaojc <1055866782@qq.com >
Co-authored-by: weiguihua2 <weiguihua2@huawei.com >
Co-authored-by: z50049692 <zhangmingwei11@huawei.com >
Co-authored-by: w00896881 <wangzixuan40@huawei.com >
2025-10-24 10:32:01 +08:00