xc-llm-ascend

Author	SHA1	Message	Date
wangxiyuan	a1f142b7ad	Drop 0.11.0 support (#4377 ) There is a lot hack code for v0.11.0, which makes the code hard to upgrade to newer vLLM version. Since v0.11.0 will release soon. Let's drop v0.11.0 support first. Then we'll upgrade to v0.11.2 soon. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-24 17:08:20 +08:00
Icey	a7450db1bd	Upgrade to 0.11.1 newest vllm commit (#3762 ) ### What this PR does / why we need it? `c9461e05a4` Fix ```spec decode rejection sampler```, caused by https://github.com/vllm-project/vllm/pull/26060 Fix some ```import```, caused by https://github.com/vllm-project/vllm/pull/27374 Fix ```scheduler_config.send_delta_data```, caused by https://github.com/vllm-project/vllm-ascend/pull/3719 Fix ```init_with_cudagraph_sizes```, caused by https://github.com/vllm-project/vllm/pull/26016 Fix ```vl model```of replacing PatchEmbed's conv3d to linear layer, caused by https://github.com/vllm-project/vllm/pull/27418 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.11.0rc3 - vLLM main: `c9461e05a4` --------- Signed-off-by: Icey <1790571317@qq.com>	2025-10-28 14:55:03 +08:00
shaopeng-666	0c83eee9b1	fix vl float model not support NZ format weight error (#3533 ) ### What this PR does / why we need it? fix vl float model not support nz mm op ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: shaopeng666 <shaopeng666@noreply.gitcode.com> Co-authored-by: shaopeng666 <shaopeng666@noreply.gitcode.com>	2025-10-21 22:23:17 +08:00
elilzhu	f9535cc9e2	[BugFix] fix qwenVL quant assertion error (#3466 ) ### What this PR does / why we need it? This PR fixes issues: 1. Solve the problem that multimodal scene cannot do weight prefetching and throw an assertion error exception. 2. Standardize the grid_thw data type of qwen2VL to torch.int32. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? - ci & e2e - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: elilzhu <2435754260@qq.com> Co-authored-by: zhulei (AK) <z00692222@china.huawei.com>	2025-10-16 17:08:00 +08:00
wangxiyuan	f12f76d7ba	Drop 0.10.2 (#3284 ) Drop v0.10.2 support, we support vLLM 0.11.0rc3 now. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-09 10:28:38 +08:00
Peipei	3d21ed9ee8	[Bugfix]Fix quant_config input parameter bug in qwenvl series (#3220 ) ### What this PR does / why we need it? Fix quant_config input parameter bug in qwenvl series. Currently, non-instantiated variables should be passed. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 Signed-off-by: booker123456 <945658361@qq.com>	2025-09-28 14:08:24 +08:00
wangxiyuan	e9359bd8fa	[CI] Pin vLLM to releases/v0.11.0 (#3211 ) ### What this PR does / why we need it? - Pin vLLM commit to releases/v0.11.0 branch. - Fix the break change by vLLM commit `d4d9899860` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: `17b4c6685c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-27 10:41:48 +08:00
JohnJan	cfdd45ed00	[Bug] Fix duplicate 'torch.' prefix in qwen-vl (#1986 ) Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com> ### What this PR does / why we need it? Fix duplicate 'torch.' prefix in qwen2-vl, qwen2.5-vl - vLLM version: v0.9.2 - vLLM main: `dde295a934`	2025-07-24 20:16:00 +08:00
JohnJan	fa76a9b7bb	[Bug] Add prefix parameter to parent class initialization (#1934 ) Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com> ### What this PR does / why we need it? Add prefix parameter to parent class initialization to avoid parameter naming conflicts ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.9.2 - vLLM main: `32142b3c62`	2025-07-24 10:28:40 +08:00
zouyida2052	ba9714ccee	Optimize qwen2_vl and qwen2_5_vl (#701 ) ### What this PR does / why we need it? Optimize qwen2_vl and qwen2_5_vl. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Testing this PR on 1080p picture with tp=1, bs=1 on Qwen2-VL and Qwen2.5-VL, every fa op's during time lasting from 11ms to 9ms, got roughly 22% perf boost. --------- Signed-off-by: zouyida2052 <zouyida@huawei.com> Signed-off-by: zouyida2052 <zouyida2002@gmail.com> Co-authored-by: zouyida2052 <zouyida@huawei.com>	2025-04-30 14:22:38 +08:00
Yikun Jiang	2e20797934	[BUILD] Upgrade torch-npu to 2.5.1 (#661 ) ### What this PR does / why we need it? The torch-npu 2.5.1 are published: https://pypi.org/project/torch-npu/2.5.1/ It's time to remove all torch-npu dev version from vllm-ascend code base ### Does this PR introduce _any_ user-facing change? Yes, using torch-npu 2.5.1 ### How was this patch tested? - [ ] CI passed - [ ] Manually test - [ ] Grep all `dev2025` --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-27 17:28:29 +08:00
hfadzxy	9935d45728	[CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460 ) ### What this PR does / why we need it? Add model basic accuracy test(Qwen2.5-0.5B-Instruct) Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-04-17 14:59:56 +08:00
BAI Fan	122505208f	FastPatch: Optimized Patch Embedding for Qwen2VL (#345 ) ### What this PR does / why we need it? We proposed the FastPatch method, which optimized patch embedding (Conv3D) for Qwen2VL. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? We've tested it on benchmark, it meets our satisfaction and is better than original patch_embed layer. --------- Signed-off-by: baifanxxx <baifanxxx@gmail.com> Signed-off-by: zouyida <zouyida@huawei.com> Co-authored-by: zouyida <zouyida@huawei.com>	2025-03-26 14:28:20 +08:00
zouyida2002	12aa7115b5	bugfix for qwen2_vl (#301 ) ### What this PR does / why we need it? this pr fixes the error while inferring Qwen2_VL. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? We've tested it on benchmark, it meets our satisfaction and is equal to gpu. --------- Signed-off-by: zouyida <zouyida@huawei.com>	2025-03-12 08:39:50 +08:00
zouyida2002	faf8cd89cb	register qwen2_vl to rewrite qwen2_vl forwad (#241 ) Add qwen2-vl ascend impletation. --------- Signed-off-by: zouyida <zouyida@huawei.com>	2025-03-07 15:41:47 +08:00

15 Commits