xc-llm-ascend

Author	SHA1	Message	Date
Icey	c5fe179cef	[0.11.0] [Cherry-pick #4058 ] Fixes Qwen3-Next enable nz accuracy problem (#4056 ) ### What this PR does / why we need it? - Fixes Qwen3-Next enable nz accuracy problem --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: Icey <1790571317@qq.com>	2025-11-10 20:56:39 +08:00
shaopeng-666	0c83eee9b1	fix vl float model not support NZ format weight error (#3533 ) ### What this PR does / why we need it? fix vl float model not support nz mm op ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: shaopeng666 <shaopeng666@noreply.gitcode.com> Co-authored-by: shaopeng666 <shaopeng666@noreply.gitcode.com>	2025-10-21 22:23:17 +08:00
whx	220df60c61	[Model][2/N] Remove deepseek_mtp modeling. (#3561 ) This PR is step 2 of deepseek model refactoring and removes deepseek_mtp. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>	2025-10-21 20:17:09 +08:00
whx	f8b52fe950	[Model][1/N] Delete deepseek v2/v3 modeling codes. (#3189 ) This PR deletes model codes of deepseek_v2 and deepseek_v3 to reuse the model file from vLLM. vLLM Ascend now uses custom ops register way instead of model file hard-coding. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>	2025-10-20 15:31:34 +08:00
zhaozx-cn	bf87606932	[Feat] Shared expert dp for deepseek and deepseek_mtp (#3495 ) ### What this PR does / why we need it? shared expert dp for deepseek and deepseek_mtp, could be combined with sp to improve performance. ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: zhaozx-cn <zhaozx2116@163.com> Co-authored-by: realliujiaxu <realliujiaxu@163.com>	2025-10-17 15:06:37 +08:00
realliujiaxu	f69a83b7ba	[Feat] Flash comm allgher ep (#3334 ) Support flash comm v1(Sequence Parallelism) for Allgather EP. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com> Co-authored-by: zhaozx-cn <zhaozx2116@163.com>	2025-10-15 19:36:32 +08:00
anon189Ty	07e39620ea	[Feat] Unquantized Linear to nz and control all nz-cast (#3356 ) ### What this PR does / why we need it? Currently, when executing to the Linear layer of models in vLLM-Ascend, the weights format is ND in unquantized case and skipped ascend case. This PR supplements the execution logic for Linear layer. We use a new global variable: VLLM_ASCEND_ENABLE_NZ. When VLLM_ASCEND_ENABLE_NZ=1 and CANN version is 8.3, the weights of the Linear layer will be converted to FRACTAL_NZ, in both unquantized case and skipped ascend case. We also use VLLM_ASCEND_ENABLE_NZ to control the existing NZ conversion, such as w8a8-quantized case. ### Does this PR introduce _any_ user-facing change? Add a new global variable VLLM_ASCEND_ENABLE_NZ. If you want to use NZ format, you should set VLLM_ASCEND_ENABLE_NZ=1. ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>	2025-10-14 17:39:26 +08:00
weichen	94dd832815	[MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176 ) ### What this PR does / why we need it? 1. Move additional functionalities from fused_moe.py to common_fused_moe.py and remove fused_moe.py 2. Remove unnecessary custom classes from qwen3_moe.py, and it will be completely removed after we release vllm-ascend v0.11.0 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Qwen3-30B-A3B/Qwen3-30B-A3B-W8A8/DeepSeek-V3-W4A8-Pruing/deepseek-mtp/pangu-pro-moe-pruing: 1. Enable/Disable EP 3. Aclgraph & eager 4. SP - vLLM version: v0.11.0 --------- Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com> Co-authored-by: weijinqian0 <12153182+weijinqian0@users.noreply.github.com>	2025-10-09 14:12:46 +08:00
wangxiyuan	a055183821	[CI] Upgrade vLLM version (#3139 ) Upgrade vLLM version to the newest commit. - Fix the break change introduced by `969b4da3a6` - Add a patch to quick fix torhcair `de94289a98` - fix the ut error introduced by `de94289a98` Close: https://github.com/vllm-project/vllm-ascend/issues/3138 - vLLM version: v0.10.2 - vLLM main: `f225ea7dd9` --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: MengqingCao <cmq0113@163.com>	2025-09-25 07:36:51 +08:00
Li Wang	02f89d166f	[CI] Update vllm version to 20250922(5aeb925) (#3091 ) ### What this PR does / why we need it? This pr bump vllm commit hash to `5aeb925452` fix issues: 1. https://github.com/vllm-project/vllm/pull/25345 has remove v0 metadata 2. https://github.com/vllm-project/vllm/pull/25332 3. https://github.com/vllm-project/vllm/pull/25334 4. https://github.com/vllm-project/vllm/pull/23558, note that this vllm commit update the model register logic, which will check all the model registered have the `vllm.model_executor.models` path , which breaks our custom registration of the deepseek_v3 model (it doesn't exist in the vllm model path). so I move deepseek_v3 model registy to deepseek_v2 to solve temporary ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: `9607d5eb44` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-09-22 22:18:13 +08:00
22dimensions	0942d9aaab	[3/N][Refactor][Quantization]remove packed_modules_mapping from models (#3021 ) ### What this PR does / why we need it? Some custom models in vllm-ascend define packed_modules_mapping, which prevent keeping same model class with vllm community. So move these custom packed_modules_mapping to quant utils.py. After this pr, some custom models can be removed. ### Does this PR introduce _any_ user-facing change? tested by CI ### How was this patch tested? tested by CI - vLLM version: v0.10.2 - vLLM main: `5089fd749c` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-09-19 20:50:14 +08:00
realliujiaxu	af2a886814	refactor linear (#2867 ) ### What this PR does / why we need it? The current linear.py has the following issues: - There is redundant conditional logic in the `comm_group` and `forward` selection for classes such as `AscendMergedColumnParallelLinear`. - Inconsistent comm_group selection logic exists among `AscendMergedColumnParallelLinear`, `AscendColumnParallelLinear`, and `AscendQKVParallelLinear`. To address these two issues, this PR encapsulates `comm_group` and `forward` into classes and extracts the classes selection logic into common functions. For future additions of custom communication groups or forward methods, it will only be necessary to extend `CustomColumnParallelOp` or `CustomRowParallelOp` and add new selection logic. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: `dd39baf717` --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com> Co-authored-by: weijinqian0 <weijinqian@huawei.com>	2025-09-18 14:09:19 +08:00
yiz-liu	88ca8a051c	[Feat][Graph] Support DeepSeek with ACL Graph (#2707 ) ### What this PR does / why we need it? In memory of #677 , a long overdue milestone. Now DeepSeek V3/R1 should be OK with ACL Graph. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? Working on it. - vLLM version: v0.10.2 - vLLM main: `68dbde5dbb` --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-09-16 17:50:17 +08:00
linfeng-yuan	1c5900327b	[refactor] refactor deepseek-related files (#2849 ) ### What this PR does / why we need it? This PR deletes ~2K lines of code about deepseek modeling. It falls back CustomDeepseekV2 modules to original vllm implementations and adapts some modifications in vllm about deepseek and moe. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? E2E vllm serving with torchair graph mode and eager mode. - vLLM version: v0.10.2 - vLLM main: `759ef49b15` --------- Signed-off-by: linfeng-yuan <1102311262@qq.com> Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: yiz-liu <136800916+yiz-liu@users.noreply.github.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-09-16 14:13:07 +08:00
weichen	18ca7861f6	[Main] [Refactor] Enable MoECommMethod in Eager Mode (#2791 ) ### What this PR does / why we need it? 1. Replace prepare/finalize operation in fused_moe.py by moe_comm_method.prepare()/finalize() 2. Replace unified_fused_experts by moe_comm_method.fused_experts() in fused_moe.py/w8a8_dynamic.py/w4a8_dynamic.py 3. Add calling _select_moe_comm_method in spec-decode proposers. 4. Currently, w4a8_dynamic does not support gatherep, use all2allv instead. 5. Remove redundant code. ### Does this PR introduce _any_ user-facing change? AllgatherEP switch is disabled in aclgraph/eager mode, just follow the rules in modelrunner_v1._select_moe_comm_method() ### How was this patch tested? e2e & ut - vLLM version: v0.10.2 - vLLM main: `7f6f2c1182` Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com> Co-authored-by: weijinqian0 <12153182+weijinqian0@users.noreply.github.com>	2025-09-16 11:06:00 +08:00
6lazijiamo	bd3dedea61	support qwen25 vl w8a8 quantization (#2778 ) ### What this PR does / why we need it? support qwen25 vl w8a8 quantization ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: `62f66be1f7` --------- Signed-off-by: lijiaojiao <lijiaojiao990304@163.com> Co-authored-by: lijiaojiao <lijiaojiao990304@163.com>	2025-09-11 16:40:51 +08:00
lidenghui1110	5a7181569c	[feat]: oproj tensor parallelism in pure DP and graph-mode scenarios. (#2167 ) ### What this PR does / why we need it? This PR introduces Oproj matrix tensor model parallel to achieve decreasing of memory consumption. It only support graph mode in pure DP scenario. In deepseek r1 w8a8 PD disagregated Decode instance, using pure DP, with oproj_tensor_parallel_size = 8, we have 1 ms TPOT increasing, saved 5.8 GB NPU memory per RANK. We got best performance when oproj_tensor_parallel_size=4 without TPOT increasing. performance data: <img width="1442" height="442" alt="image" src="https://github.com/user-attachments/assets/83270fc5-868a-4387-b0a9-fac29b4a376d" /> ### Does this PR introduce _any_ user-facing change? This PR introduces one new config in `additional_config`. \| Name \| Effect \| Required \| Type \| Constraints \| \| :---------------------------- \| :--------------------------------------- \| :------- \| :--- \| :----------------- \| \| oproj_tensor_parallel_size \| Split the o_proj matrix along the row dimension (head num * head dim) into oproj_tensor_parallel_size pieces. \| No \| int \| default value is None, once this value is set, the feature will be enabled, head num * head dim must be divisible by this value. \| example `--additional_config={"oproj_tensor_parallel_size": 8}` ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: `eddaafc1c7` --------- Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: zzh <zzh_201018@outlook.com>	2025-09-07 10:31:32 +08:00
Li Wang	3584306387	[Bugfix] Fix qwen2.5-vl-without-padding (#2623 ) ### What this PR does / why we need it? Correct `AscendQwen2_5_VLForConditionalGeneration_Without_Padding` override methods ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: `42dc59dbac` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-09-03 14:38:55 +08:00
lidenghui1110	600b08f754	[Feat]: Add custom lmhead tensor model parallel (#2309 ) ### What this PR does / why we need it? This PR introduces LMhead tensor model parallel to achieve decreasing of memory consumption, and TPOT performance improvement. It support both eager mode and graph mode. In deepseek r1 w8a8 PD disagregated Decode instance, using pure DP, with lmhead_tensor_parallel_size = 8, we have 1 ms TPOT optimization, saved 1.48 GB NPU memory per RANK. performance data: <img width="1444" height="438" alt="image" src="https://github.com/user-attachments/assets/3c5ef0d3-a7c7-46fd-9797-4de728eb0cb0" /> ### Does this PR introduce _any_ user-facing change? This PR introduces one new config in `additional_config`. \| Name \| Effect \| Required \| Type \| Constraints \| \| :---------------------------- \| :--------------------------------------- \| :------- \| :--- \| :----------------- \| \| lmhead_tensor_parallel_size \| Split the lm_head matrix along the column dimension (vocab_size) into lmhead_tensor_parallel_size pieces \| No \| int \| default value is None, once this value is set, the feature will be enabled, vocab_size must be divisible by this value. \| example `--additional_config={"lmhead_tensor_parallel_size": 8}` ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: `de533ab2a1` --------- Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: zhangzihang <zzh_201018@outlook.com>	2025-08-29 11:41:21 +08:00
weichen	320edde2df	[main] [refactor] refactor fused_moe.py to enable token_dispatchers (#2570 ) ### What this PR does / why we need it? Enable token_dispatcher to replace fused_experts_with_xxx in eager mode ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? e2e & ut - vLLM version: v0.10.1.1 - vLLM main: `704432af3c` Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com> Co-authored-by: sherie <963372609@qq.com> Co-authored-by: weijinqian0 <12153182+weijinqian0@users.noreply.github.com> Co-authored-by: shiyuan680 <72335504+shiyuan680@users.noreply.github.com>	2025-08-28 10:13:35 +08:00
Nicholas Tao	7bec1a9b9c	qwen3_moe/qwen25 support torchair graph (#2403 ) ### What this PR does / why we need it? Added support for the TorchAir graph mode in qwen3_moe and qwen2.5 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ```bash llm = LLM( model=model, tensor_parallel_size=GPUs_per_dp_rank, enforce_eager=False, enable_expert_parallel=True, max_model_len=4096, max_num_seqs=16, trust_remote_code=trust_remote_code, gpu_memory_utilization=0.4, additional_config={ "torchair_graph_config": { "enabled": True, "use_cached_graph": False, "graph_batch_sizes_init": False, "graph_batch_sizes": [16] }, "ascend_scheduler_config": { "enabled": True, "chunked_prefill_enabled":True, }, "refresh": True, }, ) ``` - vLLM version: v0.10.0 - vLLM main: `b87cb97a53` Signed-off-by: taoyuxiang <oui.nicholas.tao@gmail.com>	2025-08-20 11:23:50 +08:00
xuyexiong	26fc36b0e0	[V1] MTP supports torchair (#2145 ) ### What this PR does / why we need it? Support MTP with： - [x] V0 Scheduler - [x] TorchAir - [x] Single DP - [x] Multi DP - [x] Disaggregate PD Known issues： - [ ] Not support V1 Scheduler (chunked prefill), will be supported in a few weeks - [ ] vllm v0.10.0 does not support metrics with `DP > 1` right now, need to comment out the line 171-175 in file `vllm/vllm/v1/metrics/loggers.py` ``` if (len(self.engine_indexes) > 1 and vllm_config.speculative_config is not None): raise NotImplementedError("Prometheus metrics with Spec Decoding " "with >1 EngineCore per AsyncLLM is not " "supported yet.") ``` To start an online server with torchair enabled, here is an example: ``` python -m vllm.entrypoints.openai.api_server \ --model="/weights/DeepSeek-R1_w8a8/" \ --trust-remote-code \ --max-model-len 40000 \ --tensor-parallel-size 4 \ --data_parallel_size 4 \ --max-num-seqs 16 \ --no-enable-prefix-caching \ --enable_expert_parallel \ --served-model-name deepseekr1 \ --speculative-config '{"num_speculative_tokens": 1, "method":"deepseek_mtp"}' \ --quantization ascend \ --host 0.0.0.0 \ --port 1234 \ --additional-config '{"ascend_scheduler_config":{"enabled":true,"enable_chunked_prefill":false},"torchair_graph_config":{"enabled":true,"graph_batch_sizes":[16]},"enable_weight_nz_layout":true}' \ --gpu_memory_utilization 0.9 ``` offline example with torchair enabled ``` from vllm import LLM, SamplingParams prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] # Create a sampling params object. sampling_params = SamplingParams(max_tokens=16, temperature=0) # Create an LLM. llm = LLM( model="/home/data/DeepSeek-R1_w8a8/", tensor_parallel_size=16, max_num_seqs=16, gpu_memory_utilization=0.9, distributed_executor_backend="mp", enable_expert_parallel=True, speculative_config={ "method": "deepseek_mtp", "num_speculative_tokens": 1, }, trust_remote_code=True, enforce_eager=False, max_model_len=2000, additional_config = { 'torchair_graph_config': { 'enabled': True, "graph_batch_sizes": [16], 'enable_multistream_shared_expert': False, }, "ascend_scheduler_config": { "enabled": True }, # 'expert_tensor_parallel_size': 16, } ) # Generate texts from the prompts. # llm.start_profile() outputs = llm.generate(prompts, sampling_params) # llm.stop_profile() for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` - vLLM version: v0.10.0 - vLLM main: `302962e806` --------- Signed-off-by: xuyexiong <xuyexiong@huawei.com>	2025-08-06 19:37:43 +08:00
Ronald1995	e8660d7978	ut:add ut for qwen2_5_vl (#2143 ) ### What this PR does / why we need it? add ut for qwen2_5_vl ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? not involved - vLLM version: v0.10.0 - vLLM main: `2836dd73f1` Signed-off-by: Ronald1995 <ronaldautomobile@163.com>	2025-07-31 20:46:17 +08:00
CaranLic	7c90ba5fe8	[Test] add ut for decorator.py/deepseek_mtp.py (#2127 ) ### What this PR does / why we need it? add ut for decorator.py/deepseek_mtp.py ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed with new tests - vLLM version: v0.10.0 - vLLM main: `055bd3978e` --------- Signed-off-by: CaranLic <740821011@qq.com>	2025-07-31 15:21:15 +08:00
Joey Gao	6192bc95c0	[Bugfix] fix tensor not same device in qwen2_5_vl_without_padding (#2051 ) bugfix cherry-pick from v0.9.1-dev https://github.com/vllm-project/vllm-ascend/pull/2007 ### What this PR does / why we need it? Minimum reproducing code： ```python # test.py from vllm import LLM, SamplingParams prompts = [ "Hello, my name is", "The future of AI is", ] sampling_params = SamplingParams(temperature=0.8, top_p=0.95) llm = LLM(model="Qwen2.5-VL-7B-Instruct", max_model_len=26240) outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` ```bash export USE_OPTIMIZED_MODEL=0 python test.py ``` exception as follow: ``` [rank0]: File "/home/xxx/vllm_ascend/models/qwen2_5_vl_without_padding.py", line 84, in forward [rank0]: q = torch_npu.npu_rotary_mul(q, cos, sin) [rank0]: File "/home/anaconda3/envs/xxx/lib/python3.10/site-packages/torch/_ops.py", line 1116, in __call__ [rank0]: return self._op(args, (kwargs or {})) [rank0]: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, npu:0 and cpu! (when checking argument for argument r1 in method wrapper__npu_rotary_mul) ``` In `AscendQwen2_5_VisionAttention_Without_Padding`, `torch_npu.npu_rotary_mul(q, cos, sin)`， `cos`/`sin` on cpu, but `q` on npu, so there will be an error. `qwen2_5_vl_without_padding.py` need this bugfix, because `AscendQwen2_5_VisionTransformer_Without_Padding.rot_pos_emb` in wen2_5_vl_without_padding.py is from vllm and `inv_freq` will create on cpu. `40d86ee412/vllm/model_executor/models/qwen2_5_vl.py (L482)` ```python inv_freq = 1.0 / (theta(torch.arange(0, dim, 2, dtype=torch.float, device='cpu') / dim)) ``` `qwen2_5_vl.py` do not need, because `AscendQwen2_5_VisionRotaryEmbedding` in qwen2_5_vl.py rewrite `AscendQwen2_5_VisionRotaryEmbedding` and `inv_freq` will create on device. ```python inv_freq = 1.0 / (theta*(torch.arange(0, dim, 2, dtype=torch.float) / dim)) ``` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.10.0 - vLLM main: `18cc33dd60` Signed-off-by: pjgao <gaopengju3@huawei.com> Co-authored-by: pjgao <gaopengju3@huawei.com>	2025-07-31 15:18:54 +08:00
Ronald1995	3386e09a40	ut:add ut for qwen2_vl.py (#2096 ) ### What this PR does / why we need it? add ut for qwen2_vl.py ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? not involved - vLLM version: v0.10.0 - vLLM main: `555e7225bc` Signed-off-by: Ronald1995 <ronaldautomobile@163.com>	2025-07-30 22:31:47 +08:00
huangxialu	1a25b0a2dd	[Test] add ut for qwen3_moe.py (#2055 ) ### What this PR does / why we need it? Add ut for qwen3_moe.py ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.10.0 - vLLM main: `18cc33dd60` Signed-off-by: huangxialu <huangxialu1@huawei.com>	2025-07-28 17:37:13 +08:00
zzzzwwjj	ba3dfbd59e	[main][refactor] Refactoring forward_context and model_runner_v1 (#1979 ) ### What this PR does / why we need it? A refactoring of forward_context and model_runner_v1, add some context which is necessary in model inference into forward_context, and refactor dummy_run logic, make it more reasonable. Some details for this PR: Add `ascend_forward_context`; Update mc2_v2 op, and support `active_mask` param; Update scripts in examples dir; refactor `dummy_run` logic; Add soc_version for A2 and A3; ### Does this PR introduce _any_ user-facing change? No change at user-facing. ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: `57c22e57f9` Signed-off-by: zzzzwwjj <1183291235@qq.com>	2025-07-28 14:06:20 +08:00
Ronald1995	e561a2c6ec	ut:add ut for qwen2_5_vl_without_padding.py (#1988 ) ### What this PR does / why we need it? this pr is to add ut for qwen2_5_vl_without_padding.py ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? this is only a ut test - vLLM version: v0.9.2 - vLLM main: `9c8b2c2a8a` Signed-off-by: Ronald1995 <ronaldautomobile@163.com>	2025-07-25 14:12:44 +08:00
Zac	2ffe051859	[Test]add ut for deepseek_v2. (#1964 ) What this PR does / why we need it? Add uts for deepseek_v2 Does this PR introduce any user-facing change? No How was this patch tested? - vLLM version: v0.9.2 - vLLM main: `f3137cdd81` --------- Signed-off-by: 张帮政 <zhangbangzheng@huawei.com>	2025-07-24 10:27:50 +08:00

30 Commits