xc-llm-ascend

Author	SHA1	Message	Date
6lazijiamo	bd3dedea61	support qwen25 vl w8a8 quantization (#2778 ) ### What this PR does / why we need it? support qwen25 vl w8a8 quantization ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: `62f66be1f7` --------- Signed-off-by: lijiaojiao <lijiaojiao990304@163.com> Co-authored-by: lijiaojiao <lijiaojiao990304@163.com>	2025-09-11 16:40:51 +08:00
Mengqing Cao	61866b8ac6	[Quickfix] update CachedRequestState as NewRequestData changed (#2367 ) ### What this PR does / why we need it? 1. update `CachedRequestState` as `NewRequestData` changed in https://github.com/vllm-project/vllm/pull/22570 2. drop maintenance of vllm v0.10.0 in the branch main ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: `92ff41abea` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-08-15 07:35:27 +08:00
Li Wang	ad366bf908	[Bugfix] Follow vLLM Qwen-Moe/VL and KV Connector change to fix broken CI (#2181 ) ### What this PR does / why we need it? This pr fix broken CI: 1. Fix the `ee2eb6ecd8` changes, in this commit, they fused the gate and up projections in the vision MLP, This can improve performance by reducing one matrix multiplication. so, this pr do the following things: - Specify that the two linear layers are fused as `mlp.gate_up_proj` when loading the weights. - Use a SiluAndMul activation function. 2. Fix `aefeea0fde`, Update ModelRunnerOutput parameters to adapt to its changes 3. Fix [vllm-commit](https://github.com/vllm-project/vllm/pull/20815/files#diff-3ffb829a39ab2b3e4706aa28f5e476815f36c3a87b98d6a66514ebedc8f3ffb4R354-R356), fix qwen moe ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: `fed5849d3f` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-08-04 21:37:50 +08:00
JohnJan	cfdd45ed00	[Bug] Fix duplicate 'torch.' prefix in qwen-vl (#1986 ) Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com> ### What this PR does / why we need it? Fix duplicate 'torch.' prefix in qwen2-vl, qwen2.5-vl - vLLM version: v0.9.2 - vLLM main: `dde295a934`	2025-07-24 20:16:00 +08:00
zouyida2052	05a471001b	bugfix for qwen2_5_vl (#805 ) ### What this PR does / why we need it? the interface of qwen2.5vl changes from column linear to qkv linear, this makes our weight pad func become abnormal, thus we optimize split_qkv func to fix this bug. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? with CI Signed-off-by: zouyida2052 <zouyida2002@gmail.com>	2025-05-29 17:20:39 +08:00
wangxiyuan	7326644513	[CI] Fix qwen2.5 vl CI failure (#888 ) The [vllm commit](`67da5720d4`) changed the input and rotary position embedding for qwen 2.5 vl which break CI. This PR fix the CI failure for qwen2.5 vl in quick Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-17 05:13:32 +08:00
zouyida2052	ba9714ccee	Optimize qwen2_vl and qwen2_5_vl (#701 ) ### What this PR does / why we need it? Optimize qwen2_vl and qwen2_5_vl. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Testing this PR on 1080p picture with tp=1, bs=1 on Qwen2-VL and Qwen2.5-VL, every fa op's during time lasting from 11ms to 9ms, got roughly 22% perf boost. --------- Signed-off-by: zouyida2052 <zouyida@huawei.com> Signed-off-by: zouyida2052 <zouyida2002@gmail.com> Co-authored-by: zouyida2052 <zouyida@huawei.com>	2025-04-30 14:22:38 +08:00

7 Commits