Commit Graph

6 Commits

Author SHA1 Message Date
Li Wang
02f89d166f [CI] Update vllm version to 20250922(5aeb925) (#3091)
### What this PR does / why we need it?
This pr bump vllm commit hash to
5aeb925452
fix issues:  
1. https://github.com/vllm-project/vllm/pull/25345 has remove v0
metadata
2. https://github.com/vllm-project/vllm/pull/25332
3. https://github.com/vllm-project/vllm/pull/25334
4. https://github.com/vllm-project/vllm/pull/23558, note that this vllm
commit update the model register logic, which will check all the model
registered have the `vllm.model_executor.models` path , which breaks our
custom registration of the deepseek_v3 model (it doesn't exist in the
vllm model path). so I move deepseek_v3 model registy to deepseek_v2 to
solve temporary

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
9607d5eb44

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-09-22 22:18:13 +08:00
fan2956
c5a502fd2e main add ascend scheduler support multimodal (#2844)
### What this PR does / why we need it?
On main, AscendScheduler does not support Multimodels, becuse of lacking
of scheduled_encoder_inputs which is need on multimodels inference

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?
vLLM version: main@93e28e6862669e3b5cf47cea9f782a65ec47e155

- vLLM version: v0.10.2rc2
- vLLM main:
15b8fef453

---------

Signed-off-by: fan2956 <zhoufan53@huawei.com>
Co-authored-by: zhoufan2956 <zhoufan2956@163.com>
2025-09-14 09:38:51 +08:00
rjg-lyh
585a494baa [Core] Disable the chunked prefill feature in Non-MLA LLMs (#2894)
### What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature
in Non-MLA models, as the performance of operators supporting this
functionality is currently suboptimal. Unless the user has enabled
chunked prefill in the ascend_scheduler_config, we would allow this
feature.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Related: https://github.com/vllm-project/vllm-ascend/pull/2659

- vLLM version: main
- vLLM main:
d21a36f5f9

Signed-off-by: rjg-lyh <1318825571@qq.com>
2025-09-12 23:17:09 +08:00
CaranLic
168ad600b5 [main] add pd transfer for ascend scheduler (#2753)
### What this PR does / why we need it?
For offline scenarios, adjust the scheduling process to prioritize the
prefill phase of all requests, then process the decode phase of all
requests.

### How was this patch tested?

```
max_num_seqs=24,
additional_config={
    "ascend_scheduler_config":{
        "enabled": True,
        "enable_pd_transfer": True,
        "decode_max_num_seqs": 24,
        "enable_chunked_prefill": False
    }
},
```
| input | output | num prompts | max_num_seqs | dp | tp | scheduler |
tps |
| ------ | ------ | ---------- | ---------------- | ---- | ---- |
---------------- | --------------- |
| dapo-math-17K | 2K | 384 | 24 | 2 | 1 | v1 | 234.06 |
| dapo-math-17K | 2K | 384 | 24 | 2 | 1 | pd transfer | 239.59(+2.4%) |
| dapo-math-17K| 2K | 384 | 24 | 4 | 1 | v1 | 222.85 |
| dapo-math-17K| 2K | 384 | 24 | 4 | 1 | pd transfer | 225.81(+1.3%) |


- vLLM version: v0.10.1.1
- vLLM main:
6fb2788163

---------

Signed-off-by: CaranLic <740821011@qq.com>
2025-09-10 08:46:39 +08:00
linfeng-yuan
4af5b80606 [Scheduler] validate max_num_batched_tokens and max_model_len in AscendSchedulerConfig (#2434)
### What this PR does / why we need it?
Add configuration check logic for ascend scheduler: if chunked_prefill
is disabled, max_num_batched_tokens couldn't be less than max_model_len,
following vLLM;

### Does this PR introduce _any_ user-facing change?
users cannot set max_num_batched_tokens smaller than max_model_len with
ascend scheduler
### How was this patch tested?
CI and vllm serving passed

- vLLM version: v0.10.0
- vLLM main:
f77a0802b7

Signed-off-by: linfeng-yuan <1102311262@qq.com>
2025-08-23 19:39:44 +08:00
JohnJan
ce4970eee0 [Test] Add unit test for schedule_config.py (#1590)
What this PR does / why we need it?
According to issue
https://github.com/vllm-project/vllm-ascend/issues/1298 , this pull
request adds unit test code for schedule_config.py.

Does this PR introduce any user-facing change?
No

How was this patch tested?
CI passed with new added/existing test.

- vLLM version: v0.9.2
- vLLM main:
8d0a01a5f2
2025-07-22 11:43:25 +08:00