xc-llm-ascend

Author	SHA1	Message	Date
Li Wang	02f89d166f	[CI] Update vllm version to 20250922(5aeb925) (#3091 ) ### What this PR does / why we need it? This pr bump vllm commit hash to `5aeb925452` fix issues: 1. https://github.com/vllm-project/vllm/pull/25345 has remove v0 metadata 2. https://github.com/vllm-project/vllm/pull/25332 3. https://github.com/vllm-project/vllm/pull/25334 4. https://github.com/vllm-project/vllm/pull/23558, note that this vllm commit update the model register logic, which will check all the model registered have the `vllm.model_executor.models` path , which breaks our custom registration of the deepseek_v3 model (it doesn't exist in the vllm model path). so I move deepseek_v3 model registy to deepseek_v2 to solve temporary ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: `9607d5eb44` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-09-22 22:18:13 +08:00
fan2956	c5a502fd2e	main add ascend scheduler support multimodal (#2844 ) ### What this PR does / why we need it? On main, AscendScheduler does not support Multimodels, becuse of lacking of scheduled_encoder_inputs which is need on multimodels inference ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? vLLM version: main@93e28e6862669e3b5cf47cea9f782a65ec47e155 - vLLM version: v0.10.2rc2 - vLLM main: `15b8fef453` --------- Signed-off-by: fan2956 <zhoufan53@huawei.com> Co-authored-by: zhoufan2956 <zhoufan2956@163.com>	2025-09-14 09:38:51 +08:00
rjg-lyh	585a494baa	[Core] Disable the chunked prefill feature in Non-MLA LLMs (#2894 ) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: https://github.com/vllm-project/vllm-ascend/pull/2659 - vLLM version: main - vLLM main: `d21a36f5f9` Signed-off-by: rjg-lyh <1318825571@qq.com>	2025-09-12 23:17:09 +08:00
CaranLic	168ad600b5	[main] add pd transfer for ascend scheduler (#2753 ) ### What this PR does / why we need it? For offline scenarios, adjust the scheduling process to prioritize the prefill phase of all requests, then process the decode phase of all requests. ### How was this patch tested? ``` max_num_seqs=24, additional_config={ "ascend_scheduler_config":{ "enabled": True, "enable_pd_transfer": True, "decode_max_num_seqs": 24, "enable_chunked_prefill": False } }, ``` \| input \| output \| num prompts \| max_num_seqs \| dp \| tp \| scheduler \| tps \| \| ------ \| ------ \| ---------- \| ---------------- \| ---- \| ---- \| ---------------- \| --------------- \| \| dapo-math-17K \| 2K \| 384 \| 24 \| 2 \| 1 \| v1 \| 234.06 \| \| dapo-math-17K \| 2K \| 384 \| 24 \| 2 \| 1 \| pd transfer \| 239.59(+2.4%) \| \| dapo-math-17K\| 2K \| 384 \| 24 \| 4 \| 1 \| v1 \| 222.85 \| \| dapo-math-17K\| 2K \| 384 \| 24 \| 4 \| 1 \| pd transfer \| 225.81(+1.3%) \| - vLLM version: v0.10.1.1 - vLLM main: `6fb2788163` --------- Signed-off-by: CaranLic <740821011@qq.com>	2025-09-10 08:46:39 +08:00
linfeng-yuan	4af5b80606	[Scheduler] validate max_num_batched_tokens and max_model_len in AscendSchedulerConfig (#2434 ) ### What this PR does / why we need it? Add configuration check logic for ascend scheduler: if chunked_prefill is disabled, max_num_batched_tokens couldn't be less than max_model_len, following vLLM; ### Does this PR introduce _any_ user-facing change? users cannot set max_num_batched_tokens smaller than max_model_len with ascend scheduler ### How was this patch tested? CI and vllm serving passed - vLLM version: v0.10.0 - vLLM main: `f77a0802b7` Signed-off-by: linfeng-yuan <1102311262@qq.com>	2025-08-23 19:39:44 +08:00
JohnJan	ce4970eee0	[Test] Add unit test for schedule_config.py (#1590 ) What this PR does / why we need it? According to issue https://github.com/vllm-project/vllm-ascend/issues/1298 , this pull request adds unit test code for schedule_config.py. Does this PR introduce any user-facing change? No How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.9.2 - vLLM main: `8d0a01a5f2`	2025-07-22 11:43:25 +08:00

6 Commits