xc-llm-ascend

Author	SHA1	Message	Date
realliujiaxu	d3c3538ddc	[Bugfix]fix bug when graph_size is not divisible by tp_size (#2719 ) ### What this PR does / why we need it? fix https://github.com/vllm-project/vllm-ascend/issues/2702 - A2: skip graph_size update that makes it to tp_size because dispatch/combine op support different batch size across EP ranks - A3: add `max_num_reqs = max(new_graph_batch_sizes)` to fix graph_size and max_num_reqs mismatch ### Does this PR introduce _any_ user-facing change? Nope ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: `e599e2c65e` --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com>	2025-09-08 14:52:33 +08:00
1092626063	5b3646ab21	[FEATURE][MTP] Support MTP > 1 (#2708 ) ### What this PR does / why we need it? [RFC：Support MTP > 1 for DeepSeek](https://github.com/vllm-project/vllm-ascend/issues/2745) - [x] dp1 tp16 - [x] dp4 tp4 - [x] dp2 tp 8 - [x] torchair graph - vLLM version: v0.10.1.1 - vLLM main: `c9f7081f9c` Signed-off-by: 1092626063 <1092626063@qq.com>	2025-09-05 09:11:22 +08:00
linfeng-yuan	90a75a90a9	[bugfix] fix torchair runtime error caused by configuration mismtaches and file missing (#2532 ) ### What this PR does / why we need it? This PR ports #2312 #2506 #2531 to main branch. Original implementation of torchair caching forces users to make everything prepared, fix all the configuration and enable `use_cached_npu_graph`, and it might cause some problems confusing to understand and tackle for users. It is better to compile the graph twice instead of reusing the old kvcaches and cached torchair graph. And the extra duration time is acceptable. Additionally, this pr fixes a recompilation problem of torchair graph mode caused by `running_in_graph` variable in `AscendMLATorchairImpl`. ### Does this PR introduce _any_ user-facing change? If users want to enabling torchair.cache_compile with high compilation speed, it is recommended to enable both `use_cached_kv_cache_bytes` and `use_cached_graph` in `torchair_graph_config`. Without `use_cached_kv_cache_bytes`, we'll compile torchair computation graph twice to avoid runtime error caused by configuration mismtaches (the second compilation will be much faster). Additionally, we've made a change to how the TORCHAIR_CACHE_HOME enviroment variable is utilized to enhance safety and prevent accidental file deletion by adding a suffix directory. ### How was this patch tested? CI and e2e vllm serving pass. - vLLM version: v0.10.1.1 - vLLM main: `70549c1245` --------- Signed-off-by: linfeng-yuan <1102311262@qq.com>	2025-09-03 17:56:12 +08:00
panchao-hub	ea53f9076e	support torchair mode (#2641 ) ### What this PR does / why we need it? support torchair mode ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: `5438967fbc` Signed-off-by: zhangdepeng <zhangdepeng2@huawei.com> Signed-off-by: p00465316 <panchao13@huawei.com> Co-authored-by: zhangdepeng <zhangdepeng2@huawei.com>	2025-09-01 15:49:07 +08:00
Wang Yixuan	c2c97f3079	[5/N][refactor]add torchair rotary ops (#2559 ) ### What this PR does / why we need it? Move torchair related rotary ops into torchair dir to make the code clear. Next step we'll remove all torchair related code outside of torchair rotary ops. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? vLLM version: main vLLM main: `ab9f2cfd19` - vLLM version: v0.10.1.1 - vLLM main: `81eea3d348` Signed-off-by: hust17yixuan <303660421@qq.com>	2025-09-01 09:09:21 +08:00
yiz-liu	aadc75c247	[Fix] Resolve data-parallel (DP) assertion errors in TorchAir (#2626 ) ### What this PR does / why we need it? It is confirmed that `num_input_tokens` must be assigned the value of `maybe_padded_num_tokens` under all circumstances. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? Waiting for daily test for TorchAir. - vLLM version: v0.10.1.1 - vLLM main: `006477e60b` Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-08-29 16:06:49 +08:00
yiz-liu	dfc7eb39ad	[Fix] Fix DP-related padding logic (#2582 ) ### What this PR does / why we need it? The determination of attention state, padding, and other forward metadata has been moved to an earlier stage within the input preparation process. This change enables us to utilize a single all-reduce operation, maximizing synchronization efficiency as early as possible. The logic for synchronizing metadata—such as the number of tokens, prefill status, and DBO status—across data parallel (DP) ranks has now been unified and simplified. For performance improvements, the all-reduce operation has been switched from the `gloo` backend to the `npu` backend, which results in an reduction of several milliseconds per step (approximately 10% performance gain for TPOT!). Additionally, the multi-DP server hang issue has been resolved, ensuring no more hangs occur when `num_requests < dp_size`. Alas, a relief. Finally, the miscalculated memory usage issue has been addressed by removing the unnecessary `DummyCommImpl`, allowing the system to use the real communication method when determining available memory. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? Maybe we should add an test case for multi-DP online server? @MengqingCao - vLLM version: v0.10.1.1 - vLLM main: `c5d004aaaf` --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-08-28 19:39:58 +08:00
Wang Yixuan	20a7bc4b71	[3/N][refactor] refactoer quantization (#2504 ) ### What this PR does / why we need it? Move torchair related qunatization section into torchair dir to make the code clear. Next step we'll remove all torchair related code outside of torchair quantization. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? vLLM version: main vLLM main: `ab9f2cfd19` - vLLM version: v0.10.1.1 - vLLM main: `959783fb99` Signed-off-by: hust17yixuan <303660421@qq.com>	2025-08-27 10:45:50 +08:00
weiguihua2	acdc53c2f6	[Bugfix] Fix the bug of cos invalid shape when dp (#2558 ) ### What this PR does / why we need it? Fix the bug of cos invalid shape when dp ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: `1fdc732419` Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2025-08-27 10:36:23 +08:00
wangxiyuan	de7649492d	[Refactor] cleanup converting_weight_acl_format_format (#2482 ) move maybe_converting_weight_acl_format_format to torchair module, it's only used with 310p+torchair - vLLM version: v0.10.1.1 - vLLM main: `49ab23b3cc` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-08-25 19:48:55 +08:00
weiguihua2	0dca4c6dbd	refact runner model v1 (#2461 ) refact model runner v1 ### What this PR does / why we need it? 1. Separate the execute model logic from the prepare input logic 2. Disassemble the torchchair in model runner v1 - vLLM version: v0.10.0 - vLLM main: `68fcd3fa73` --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2025-08-21 08:54:57 +08:00
Mengqing Cao	1327f9be1c	Fix some ci issue and refactor modelrunner (#2445 ) ### What this PR does / why we need it? Fix some ci issue and refactor modelrunner ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: `4d9c61993a` --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: weiguihua2 <weiguihua2@huawei.com>	2025-08-20 09:01:04 +08:00
linfeng-yuan	3fc31ee1cb	[1/N][refactor] torchair deepseek modeling refactor (#2384 ) ### What this PR does / why we need it? Move torchair related model arch into torchair moduel to make the code clear. Next step we'll remove all torchair related code outside of torchair moduel. ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.10.0 - vLLM main: `08d5f7113a` Signed-off-by: linfeng-yuan <1102311262@qq.com>	2025-08-18 15:00:37 +08:00
wangxiyuan	1a70564e7c	[5/N][Refactor] torchair model runner refactor (#2216 ) There is lot of torchair code in model runner leading the code hard for maintenance. We'll create new torchair_model_runner to split torchair related logic. Following the workflow #2203 What's this PR do: create common function `_capture_model` for capture_model - vLLM version: v0.10.0 - vLLM main: `1891a265d3` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-08-12 14:24:50 +08:00
wangxiyuan	c8b0f5f799	[4/N][Refactor] torchair model runner refactor (#2208 ) There is lot of torchair code in model runner leading the code hard for maintenance. We'll create new torchair_model_runner to split torchair related logic. Following the workflow #2203, this is the first PR. What's this PR do: create common function `_convert_torch_foramt` for initialize_kv_cache - vLLM version: v0.10.0 - vLLM main: `14a5d903ab` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-08-11 21:39:24 +08:00
wangxiyuan	881e36d6a9	[3/N][Refactor] torchair model runner refactor (#2207 ) There is lot of torchair code in model runner leading the code hard for maintenance. We'll create new torchair_model_runner to split torchair related logic. Following the workflow #2203, this is the first PR. What's this PR do: create common function `_build_attention_metadata` and `_generate_dummy_run_hidden_states` for dummy_run - vLLM version: v0.10.0 - vLLM main: `ebf7605b0d` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-08-11 18:03:19 +08:00
wangxiyuan	1ab15414bb	[2/N][Refactor] torchair model runner refactor (#2204 ) There is lot of torchair code in model runner leading the code hard for maintenance. We'll create new torchair_model_runner to split torchair related logic. Following the workflow #2203 What's this PR do: move `torchair` related logic into `_get_forward_metadata_across_dp` and override it in torchair model runner - vLLM version: v0.10.0 - vLLM main: `1b99028069` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-08-11 14:06:49 +08:00
wangxiyuan	292fb8f696	[1/N][Refactor] torchair model runner refactor (#2205 ) There is lot of torchair code in model runner leading the code hard for maintenance. We'll create new torchair_model_runner to split torchair related logic. Following the workflow #2203, this is the first PR. What this PR does: create the new torchair model runner, more function will be added later - vLLM version: v0.10.0 - vLLM main: `586f286789` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-08-05 18:43:04 +08:00

18 Commits