xc-llm-ascend

Author	SHA1	Message	Date
Li Wang	a63ef031af	[Doc] Upgrade some outdated doc (#5062 ) ### What this PR does / why we need it? Upgrade some outdated doc to make run happily Signed-off-by: wangli <wangli858794774@gmail.com>	2025-12-16 11:48:19 +08:00
UnifiedCacheManager	195eac665b	[Core][Worker] Add UCMConnector for KV Cache Offloading (#4411 ) ### What this PR does / why we need it? This PR introduces the initial integration of UCM (Unified Cache Management) into the vllm-ascend distributed KV-cache system. Specifically, it adds: - A new `UCMConnector` implementation under the distributed KV-transfer framework. - Support for offloading KV-cache blocks to external UCM backends (DRAM / NFS / Localdisk), depending on UCM configuration). - Integration with vLLM V1 KV connector interface, including metadata handling and role registration. Why it is needed: - UCM provides a unified, high-performance storage layer for KV-cache externalization. - This enables vllm-ascend to support out-of-core KV-cache workloads, improve memory efficiency, and leverage hardware-accelerated storage paths (RDMA / NFS / hybrid modes). - This connector is a required component to allow future work on multi-node inference + UCM-based scaling. --- ### Does this PR introduce _any_ user-facing change? Yes, but limited: - A new `kv_connector=UCMConnector` option becomes available through the configuration interface. - When selected, vllm-ascend workers may initialize UCM and offload KV-cache blocks externally. - No default behaviors are changed. Users must explicitly enable this connector. This PR does not modify: - existing APIs, - default execution paths, - model runner behavior, - user workflow unless `UCMConnector` is configured. --- ### How was this patch tested? --- ### Prefix Caching Benchmark We provide preliminary measurements for TTFT (ms) under VLLM benchmark. Tests run on 2 * Ascend 910B3, vllm-ascend 0.11.0, Tensor Parallel size 2, with UCM (Localdisk) enabled. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: UnifiedCacheManager <unifiedcachem@163.com>	2025-12-16 10:53:30 +08:00
Li Wang	6063853ead	[Misc] Upgrade vllm commit hash to 1215 (#5029 ) ### What this PR does / why we need it? Upgrade vllm commit hash to `4429d934de3c5cc327b0d7aec8e473aeba38db90` - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-12-16 09:23:02 +08:00
InSec	a5cb8e40f5	[doc]Modify quantization tutorials (#5026 ) ### What this PR does / why we need it? Modify quantization tutorials to correct a few mistakes: Qwen3-32B-W4A4.md and Qwen3-8B-W4A8.md Qwen3-8B-W4A8: need to set one idle npu card. Qwen3-32B-W4A4: need to set two idle npu cards for the flatquant training and modify the calib_file path which does not match the ModeSlim version. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: IncSec <1790766300@qq.com>	2025-12-15 20:12:06 +08:00
Li Wang	8d2998d0e4	[Misc] Upgrade vllm hash to 12_14 (#5000 ) ### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix https://github.com/vllm-project/vllm/pull/27938 2. fix https://github.com/vllm-project/vllm/pull/27145 pooling models now supports chunked prefill and prefix caching, 3. fix https://github.com/vllm-project/vllm/pull/30181 define the CPU fields in the field config where they really belong. 4. fix https://github.com/vllm-project/vllm/pull/28168 define the CPU fields in the field config where they really belong. 5. fix https://github.com/vllm-project/vllm/pull/30201 some moudle rename 6. fix https://github.com/vllm-project/vllm/pull/29067 fusedmoe moudle refactor 7. fix https://github.com/vllm-project/vllm/pull/29066 fusedmoe moudle refactor 8. fix https://github.com/vllm-project/vllm/pull/29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-12-15 19:54:23 +08:00
fluctlux	6de4bedd04	update release note for suffix decoding (#5009 ) update release note for suffix decoding - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: fluctlux <38945811+fluctlux@users.noreply.github.com>	2025-12-15 17:22:19 +08:00
Chao Lei	b75bfc58f6	[Doc ] Supplement kvpool user guide (#5013 ) ### What this PR does / why we need it? Supplement detailed descriptions for `ASCEND_CONNECT_TIMEOUT` and `ASCEND_TRANSFER_TIMEOUT` in kvpool. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: LCAIZJ <leichao139636@163.com>	2025-12-15 14:24:39 +08:00
ming1212	98b9e2e18e	Add Qwen3-Next tutorials (#4607 ) ### What this PR does / why we need it? This PR provides an introduction to the Qwen3-Next model, details on the features supported by the model in the current version, the model deployment process, as well as methods for performance testing and accuracy testing. With this document, the deployment and testing of the Qwen3-Next model can be implemented more easily. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: ming1212 <2717180080@qq.com> Signed-off-by: ming1212 <104972349+ming1212@users.noreply.github.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-12-15 11:48:22 +08:00
Li Wang	2497bbbaf6	[Misc] Update pooling example (#5002 ) ### What this PR does / why we need it? Since the param `task` has been depprecated, we should use the latest unified standard parameters for pooling models, this should be more clear - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-12-15 08:36:19 +08:00
wangxiyuan	8090914d69	[CI] CI refactor (#4928 ) 1. rename workflow to better name 2. fix lint error 3. remove accuracy report doc and test - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-14 11:09:56 +08:00
wangxiyuan	42ceaf08a1	add release note for 0.12.0 (#4995 ) Add release note for v0.12.0rc1 Update deepseek3.2 tutorial doc - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-13 22:09:59 +08:00
lilinsiman	31c94b7e7b	[doc][main] Correct more doc mistakes (#4958 ) ### What this PR does / why we need it? Correct more doc mistakes - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-12-13 18:36:58 +08:00
lilinsiman	fc818f1509	[doc][main] Correct mistakes in doc (#4945 ) ### What this PR does / why we need it? Correct mistakes in doc - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-12-12 19:17:10 +08:00
liziyu	716c4dacfe	update qwen2.5vl readme (#4938 ) ### What this PR does / why we need it? fix qwen2.5vl readme, del gen ranktable and add install mooncake - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: liziyu <liziyu16@huawei.com>	2025-12-12 15:40:07 +08:00
Li Wang	4ae7588c52	[Doc] Upgrade outdated doc (#4957 ) ### What this PR does / why we need it? Updated some issues that caused sleep mode document content to be unavailable due to changes/outdated environment variables. --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-12-12 15:38:29 +08:00
1092626063	62a9fea7af	【doc】Add model feature matrix (#4950 ) ### What this PR does / why we need it? doc tutorials add model feature matrix： DeepSeekR1 DeepSeekV3.1 Qwen3-Dense Qwen3-Moe Qwen3-Next Qwen2.5 Qwen2.5-VL Qwen3-VL ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: 1092626063 <1092626063@qq.com>	2025-12-12 15:37:39 +08:00
lidenghui1110	d65fb194d9	[Feat] Add custom Embedding tensor model parallel (#2616 ) Similar to #2309 , this PR introduces Embedding tensor model parallel to achieve decreasing of memory consumption. It support both eager mode and graph mode. And this PR refactor module tensor parallel configurations supported in #2309, #2167, #2120, merge all config into `finegrained_tp_config` in `additional_config`, including: `lmhead_tensor_parallel_size` `oproj_tensor_parallel_size` `embedding_tensor_parallel_size` `mlp_tensor_parallel_size` - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: zzhx1 <zzh_201018@outlook.com> Signed-off-by: zzhxx <zhangzihang23@mails.ucas.ac.cn> Co-authored-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: chenxiao <Jaychou1620@Gmail.com> Co-authored-by: zzhxx <zhangzihang23@mails.ucas.ac.cn> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>	2025-12-12 14:41:20 +08:00
wangxiyuan	e538fa6f9c	[Doc] Update tutorial index (#4920 ) Update tutorial index and remove useless doc - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-11 20:53:13 +08:00
Shanshan Shen	551069e53a	[Doc] Update structured output doc with upstream link (#4015 ) ### What this PR does / why we need it? Currently, the usage of structured output feature in vllm-ascend is totally the same as that in vllm. Thus, IMO, it's better to remove this doc directly to avoid some case that there are some changes in the upstream doc and we don't update our doc in time, which can be misleading to users. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: shen-shanshan <467638484@qq.com>	2025-12-11 19:14:29 +08:00
yangxiaoman8	e1bb6f47ec	[doc] Add Qwen2.5 tutorials (#4636 ) ### What this PR does / why we need it? Add qwen2.5 turorial - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: yangshihao6 <yangshihao6@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-11 17:30:05 +08:00
wangxiyuan	bb76f7962c	cleanup useless torchair logic (#4856 ) This PR clean up useless torchair logic in model runner. The moge doc is only for torchair, it can be removed as well. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-12-11 11:21:13 +08:00
zhangyiming	c95c271538	[E2E] Optimize nightly testcase. (#4886 ) ### What this PR does / why we need it? Optimize nightly testcase. Changes: - tests/e2e/nightly/multi_node/config/models/Qwen3-235B-A3B.yaml: Add accuracy and performance benchmark - tests/e2e/models/configs/Qwen3-8B-Base.yaml: Delete - tests/e2e/models/configs/internlm-7b.yaml: Change to internlm3-8b-instruct - tests/e2e/nightly/models/test_deepseek_r1_w8a8_eplb.py: Change to DeepSeek-R1-0528-W8A8 model - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: menogrey <1299267905@qq.com>	2025-12-11 10:15:39 +08:00
zhangyiming	66b0781840	[E2E] Refactor the e2e testcases. (#4789 ) ### What this PR does / why we need it? Refactor the e2e testcases. - tests/e2e/multicard/test_weight_loader.py: Remove the unused code. - tests/e2e/singlecard/multi-modal/test_internvl.py: Move to accuracy test. - tests/e2e/singlecard/test_aclgraph.py: Rename the file. - tests/e2e/singlecard/test_embedding_aclgraph.py : Combine with tests/e2e/singlecard/test_bge_model.py - tests/e2e/singlecard/test_completion_with_prompt_embeds.py: Delete eager mode and modify model to Qwen3-0.6B - tests/e2e/singlecard/test_quantization.py: Modify model to Qwen3-0.6B-W8A8 - tests/e2e/singlecard/test_vlm.py: Modify model to Qwen3-VL-8B - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: menogrey <1299267905@qq.com>	2025-12-11 10:15:00 +08:00
Nengjun Ma	0eefbe75b6	[Doc] Add local running multi-node nightly test case guide (#4884 ) ### What this PR does / why we need it? Add local running multi-node nightly test case guide, help running locally at developer env. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? Test with local running multi-node test. Using this document can successfully start multi-node night e2e in locall - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-12-11 08:56:27 +08:00
SILONG ZENG	ff7d703192	[Doc]Add tutorial document for qwen-VL-Dense (#3516 ) ### What this PR does / why we need it? This document employs the qwen3-vl-8b model and qwen2.5-vl-32b to demonstrate the primary verification steps for the Qwen-VL series dense models, including supported features, feature configuration, environment preparation, NPU deployment, and accuracy and performance evaluation. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2025-12-11 08:55:23 +08:00
Leaf	89a8607b30	add DeepSeek-R1 tutorial. (#4666 ) ### What this PR does / why we need it? This PR adds tutorials for the DeepSeeK-R1 series models, including the A2 and A3 series, and provides accuracy validation results. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: Gongdayao <gongdayao@foxmail.com>	2025-12-11 08:52:27 +08:00
wangxiyuan	37db0844f5	Remove COMPILE_CUSTOM_KERNELS env (#4864 ) With more and more custom ops merged, disable `COMPILE_CUSTOM_KERNELS ` for vllm ascend seems useless now. Let's enable csrc compile by default. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-10 23:48:03 +08:00
wangxiyuan	c77dca54b2	[CI] fix lint (#4888 ) Fix lint CI error Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-10 16:57:24 +08:00
wind-all	1a443f2772	add multi_npu_qwen3_dense tutorials (#4543 ) ### What this PR does / why we need it? This PR adds tutorials for the Qwen3-Dense series models, including the A2 and A3 series, and provides accuracy validation results. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wind-all <anyuting@h-partners.com>	2025-12-10 16:09:56 +08:00
Ruri	ce5872705e	[Feat] Support native Kimi-K2-Thinking native W4A16 quantized experts weights (#4516 ) ### What this PR does / why we need it? Adds W4A16 quantization method for the Kimi-K2-Thinking model and updates relevant modules to support the new quantization method. - Implements complete W4A16 quantization method including weight packing/unpacking, per-group quantization parameter generation, post-processing logic and MoE method application. - Adds parameters `use_int4_w4a16`, `w1_offset` and `w2_offset`, adjusts `with_quant` conditional logic to support W4A16 matrix multiplication. - Adds `packed_modules_model_mapping` for Kimi-K2-Thinking model and processing logic for `weight_packed` field. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> Signed-off-by: Ruri <33858552+zhoux77899@users.noreply.github.com> Signed-off-by: Ruri <zhouxiang100@huawei.com>	2025-12-10 15:58:52 +08:00
lianyibo	e32014ac1d	[Model] Support pooling models (#3122 ) ### What this PR does / why we need it? Support pooling models (like `bge-reranker-v2-m3`) in vllm-ascend, this pr covered the three model types of embed (cls_token, mean_token, lasttoken). After this [commit](`17373dcd93`), vllm has provided support for adapting pooling models on the v1 engine. This PR includes corresponding adaptations on the vllm-ascend side. Fixes #1960 - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: lianyibo <lianyibo1@kunlunit.com> Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: MengqingCao <cmq0113@163.com>	2025-12-10 11:37:57 +08:00
wangxiyuan	835b4c8f1d	Drop torchair (#4814 ) aclgraph is stable and fast now. Let's drop torchair graph mode now. TODO: some logic to adapt torchair should be cleaned up as well. We'll do it in the following PR. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-12-10 09:20:40 +08:00
wangxiaoteng888	a77045f355	[P/D][main]Offline the llmdatadist connector related parts of the code and files. (#4780 ) ### What this PR does / why we need it? As support for the mooncake connector is now available, the llmdatadist connector is no longer being maintained, so the llmdatadist-related files need to be retired. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-12-09 22:36:43 +08:00
linfeng-yuan	56f01820e8	[Docs]fix the configuration conflicts in documentation (#4823 ) ### What this PR does / why we need it? Fix configuration error in our documentations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? NA. Signed-off-by: linfeng-yuan <1102311262@qq.com>	2025-12-09 15:37:38 +08:00
xuyexiong	193dc1703f	[Doc] Add Qwen3-235B tutorial (#4358 ) ### What this PR does / why we need it? Add Qwen3-235B tutorial including the following examples - Single-node Online Deployment for 128k context inference - Multi-node Deployment with MP - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: xuyexiong <xuyexiong@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-08 20:06:46 +08:00
LuLina	2be0fe2691	[Feat] Add Euler xlite graph wrapper support (#4526 ) ### What this PR does / why we need it? This patch adds support for the xlite graph wrapper to vllm_ascend. Xlite provides operator implementations of the transformer network on Ascend hardware. For details about xlite, please refer to the following link: https://gitee.com/openeuler/GVirt/blob/master/xlite/README.md The latest performance comparison data between xlite and the default aclgraph mode is as follows: ## Qwen3 32B TPS 910B3(A2) Online Inference Performance Comparison - aclgraph: main(`c4a71fc6`) - xlite-full: main(`c4a71fc6`) + xlite-full - xlite-decode-only: main(`c4a71fc6`) + xlite-decode-only - diff1: Performance comparison between xlite-full and aclgraph - diff2: Performance comparison between xlite-decode-only and aclgraph ### Does this PR introduce _any_ user-facing change? Enable the xlite graph mode by setting xlite_graph_config: --additional-config='{"xlite_graph_config": {"enabled": true}}' # Enabled for decode only --additional-config='{"xlite_graph_config": {"enabled": true, "full_mode": true}}' # Enabled for prefill and decode - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: lulina <lina.lulina@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-08 08:27:46 +08:00
liziyu	688b1332da	[P/D] check kv extra config and del hccl backend (#4547 ) ### What this PR does / why we need it? check kv extra config & del hccl backend - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-07 15:19:42 +08:00
GuoRen868	4bd1030842	[Kernel] add custom op DispatchGmmCombineDecode (#4139 ) #### What this PR does / why we need it? add custom opapi DispatchGmmCombineDecode for A3, include kernel inpl, python Api, pytest. vLLM version: v0.11.0 vLLM main: `24d6314718` - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangqiankun <wangqiankun13@huawei.com> Co-authored-by: wangqiankun <wangqiankun13@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-06 17:33:14 +08:00
mazhixin000	3740b3edfc	【main】[Doc]add 2P1D instruction for single node (#4716 ) ### What this PR does / why we need it? Add the description for 2P1D， keeping it consistent with the content in the dev branch. ### Does this PR introduce _any_ user-facing change? no - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0 Signed-off-by: mazhixin000 <mazhixinkorea@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-05 18:35:18 +08:00
Chen Chen	7f33838e6e	Update comment doc (#4731 ) ### What this PR does / why we need it? Translate remaining Chinese comments in the `dispatch_ffn_combine` code to English and update the installation guide to remind users to initialize submodules when building from source. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: mojave2 <chenchen145@huawei.com> Signed-off-by: Chen Chen <0109chenchen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-05 15:07:31 +08:00
wangxiyuan	00b4fb80de	[Doc] Update vLLM version in doc (#4691 ) Correct vLLM version in doc - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-12-05 08:59:41 +08:00
Li Wang	752a55473c	[Misc] Upgrade vllm vllm commit to 2025_12_04 (#4690 ) ### What this PR does / why we need it? As title shows, upgrade vllm commit hash to `ad32e3e` - vLLM version: v0.12.0 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-04 22:31:45 +08:00
wangxiyuan	3f4c0ea0a0	upgrade vLLM to 0.12.0 tag (#4647 ) Upgrade vLLM to v0.12.0 tag - vLLM version: 86e178f7c4d8c3b0eaf3c8e3f810a83f63b90e24 - vLLM main: `86e178f7c4` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-03 23:43:05 +08:00
wangxiyuan	3f81c4bb25	fix typo (#4657 ) typo fix for release title - vLLM version: 86e178f7c4d8c3b0eaf3c8e3f810a83f63b90e24 - vLLM main: `86e178f7c4` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-03 11:56:47 +08:00
wangxiyuan	9a73c22b1c	[Doc] add release note for v0.11.0rc3 (#4646 ) Add release note for 0.11.0rc3. We'll release it today. - vLLM version: 86e178f7c4d8c3b0eaf3c8e3f810a83f63b90e24 - vLLM main: `86e178f7c4` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-03 11:49:44 +08:00
wangxiyuan	7f2673ea2d	upgrade vLLM to main (#4608 ) 1. fix https://github.com/vllm-project/vllm/pull/28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix https://github.com/vllm-project/vllm/pull/29121 the output token now type changed from np to `list[list[int]]` 3. fix https://github.com/vllm-project/vllm/pull/29262 `xformers` backend for multimodal now has been deprecated 4. fix https://github.com/vllm-project/vllm/pull/29342 5. fix https://github.com/vllm-project/vllm/pull/28579 6. fix https://github.com/vllm-project/vllm/pull/28718 7. fix https://github.com/vllm-project/vllm/issues/28665 8. fix https://github.com/vllm-project/vllm/pull/26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix http://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix https://github.com/vllm-project/vllm/pull/29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>	2025-12-02 22:10:52 +08:00
1092626063	b84c9afbf5	【doc fix】doc fix: deepseekv3.1 (#4645 ) ### What this PR does / why we need it? fix deepseekv3.1 doc to recomand developers to use Mooncake instead of LLMDatadist ### Does this PR introduce _any_ user-facing change? <!-- Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Signed-off-by: AiChiMomo <1092626063@qq.com>	2025-12-02 21:49:13 +08:00
1092626063	eabedf43aa	[Doc] Refactor the DeepSeek-V3.1 tutorial. (#4399 ) ### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <1092626063@qq.com>	2025-12-02 18:46:30 +08:00
yeyifan	8907010815	[Doc] Add tutorial for Qwen3-Coder-30B-A3B (#4391 ) ### What this PR does / why we need it? Add tutorial for Qwen3-Coder-30B-A3B - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: nsdie <yeyifan@huawei.com> Signed-off-by: herizhen <you@example.com> Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Signed-off-by: weijinqian_v1 <weijinqian@huawei.com> Signed-off-by: weijinqian0 <1184188277@qq.com> Co-authored-by: Li Wang <wangli858794774@gmail.com> Co-authored-by: herizhen <59841270+herizhen@users.noreply.github.com> Co-authored-by: herizhen <you@example.com> Co-authored-by: Yizhou <136800916+yiz-liu@users.noreply.github.com> Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: XiaoxinWang <963372609@qq.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: weijinqian0 <1184188277@qq.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>	2025-12-02 16:03:37 +08:00
wangxiyuan	cb33b09179	[Doc]clean up ascend scheduler config from doc (#4612 ) clean up ascend scheduler config from doc - vLLM version: v0.11.2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-02 14:22:56 +08:00

... 4 5 6 7 8 ...

617 Commits