xc-llm-ascend

Author	SHA1	Message	Date
Canlin Guo	052cc4e61b	[Docs] Fix GLM-5 deploy command (#6711 ) This pull request refines the GLM-5 deployment documentation by updating the Docker run command to include a more comprehensive set of device mappings and by removing an extraneous quantization flag from the `vllm serve` commands. These changes aim to correct and clarify the deployment instructions, ensuring users can successfully set up and run the GLM-5 model as intended. - vLLM version: v0.15.0 - vLLM main: `9562912cea` Signed-off-by: Canlin Guo <961750412@qq.com>	2026-02-12 08:55:48 +08:00
rika	b86ea66b0a	[doc]add GLM5.md (#6709 ) ### What this PR does / why we need it? Add GLM5 doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `9562912cea` Signed-off-by: nakairika <982275964@qq.com>	2026-02-12 04:00:40 +08:00
wangxiyuan	7d4833bce9	[Doc][Misc] Restructure tutorial documentation (#6501 ) ### What this PR does / why we need it? This PR refactors the tutorial documentation by restructuring it into three categories: Models, Features, and Hardware. This improves the organization and navigation of the tutorials, making it easier for users to find relevant information. - The single `tutorials/index.md` is split into three separate index files: - `docs/source/tutorials/models/index.md` - `docs/source/tutorials/features/index.md` - `docs/source/tutorials/hardwares/index.md` - Existing tutorial markdown files have been moved into their respective new subdirectories (`models/`, `features/`, `hardwares/`). - The main `index.md` has been updated to link to these new tutorial sections. This change makes the documentation structure more logical and scalable for future additions. ### Does this PR introduce _any_ user-facing change? Yes, this PR changes the structure and URLs of the tutorial documentation pages. Users following old links to tutorials will encounter broken links. It is recommended to set up redirects if the documentation framework supports them. ### How was this patch tested? These are documentation-only changes. The documentation should be built and reviewed locally to ensure all links are correct and the pages render as expected. - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-10 15:03:35 +08:00
Cao Yi	1c7d1163f5	[main][Docs] Fix spelling errors across documentation (#6649 ) Fix various spelling mistakes in the project documentation to improve clarity and correctness. - vLLM version: v0.15.0 - vLLM main: `d7e17aaacd` --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>	2026-02-10 11:14:57 +08:00
Li Wang	d018aeb5fa	[Image] Bump mooncake version to v0.3.8.post1 (#6428 ) ### What this PR does / why we need it? This patch bump the mooncake version to the latest [release](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.8.post1) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? test is locally >>> from mooncake.engine import TransferEngine - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-02-06 10:54:03 +08:00
Nengjun Ma	78fad4e348	[Refactor] MLP weight prefetch to consistency with MoE Model's prefetching in terms of code and usage (#6442 ) ### What this PR does / why we need it? Refactor MLP weight prefetch to consistency with MoE Model's prefetching in terms of code and usage. Environments VLLM_ASCEND_ENABLE_PREFETCH_MLP, VLLM_ASCEND_MLP_DOWN_PREFETCH_SIZE and VLLM_ASCEND_MLP_GATE_UP_PREFETCH_SIZE is removed, usage as following: --additional-config '{"weight_prefetch_config": { "enabled": true, "prefetch_ratio": {"mlp": { "gate_up": 1.0, "down": 1.0} }}}' ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2026-02-04 09:08:18 +08:00
zhangguinan	be5b66de6d	[Doc] Contributing a Benchmark Tutorial for Suffix Speculative Decoding (#6323 ) ### What this PR does / why we need it? Suffix Decoding is a CPU-based speculative decoding optimization that accelerates inference by pattern matching and frequency-based prediction from both prompts and generated content. This document provides a step-by-step guide for deploying and evaluating Suffix Speculative Decoding on the Ascend platform. By analyzing performance gains across diverse datasets, it demonstrates the significant advantages of this technology in inference acceleration. Our goal is to empower developers to achieve high-efficiency model optimization using Ascend hardware. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` --------- Signed-off-by: zhangmuzhibangde <1037640609@qq.com>	2026-02-03 14:52:38 +08:00
meihanc	c08364f761	[Bugfix] Fix intermittent kv_port conflict with AscendDirectTransport (#6455 ) ### What this PR does / why we need it? When using Mooncake on Ascend NPU, AscendDirectTransport randomly allocates ports within range `[20000, 20000 + npu_per_node × 1000)`. Reference: [ascend_direct_transport.cpp#L554](https://github.com/kvcache-ai/Mooncake/blob/v0.3.7.post2/mooncake-transfer-engine/src/transport/ascend_transport/ascend_direct_transport/ascend_direct_transport.cpp#L475) If `kv_port` overlaps with this range, users may encounter intermittent startup failures: ```bash zmq.error.ZMQError: Address already in use (addr='tcp://x.x.x.x:30012') RuntimeError: KV Cache sending/receiving thread failed to start. ``` This pr fix intermittent kv_port conflict with AscendDirectTransport in `Qwen3-235B-W8A8-EPLB.yaml`, and add Added `kv_port Configuration Guide` section in `pd_disaggregation_mooncake_multi_node.md`. test Results(tests/e2e/nightly/multi_node/config/Qwen3-235B-W8A8-EPLB.yaml): https://github.com/vllm-project/vllm-ascend/actions/runs/21540138907/job/62073265259 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2026-02-02 17:31:21 +08:00
Nengjun Ma	597091be9f	[Doc] Reranker guide remove deprecated task option (#6385 ) ### What this PR does / why we need it? Reranker guide remove deprecated task option. - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` Signed-off-by: leo-pony <nengjunma@outlook.com>	2026-01-29 16:00:26 +08:00
Nengjun Ma	f910cebe04	[Doc] 310P Documents update (#6246 ) ### What this PR does / why we need it? 310P support guides updates, as currently has supported in main branch. --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2026-01-26 14:33:21 +08:00
Li Wang	c26ad78f86	[CI][lint] Add rule `codespell` back (#6236 ) ### What this PR does / why we need it? After removing codepsell a while, we discovered that typo had a problem correctly recognizing certain misspelled words, so I suggested adding it back. - vLLM version: v0.14.1 - vLLM main: `d68209402d` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-01-26 14:12:33 +08:00
Shanshan Shen	e3eefdecbd	[Doc] Update `max_tokens` to `max_completion_tokens` in all docs (#6248 ) ### What this PR does / why we need it? Fix: ``` DeprecationWarning: max_tokens is deprecated in favor of the max_completion_tokens field. ``` - vLLM version: v0.14.1 - vLLM main: `d68209402d` Signed-off-by: shen-shanshan <467638484@qq.com>	2026-01-26 11:57:40 +08:00
liziyu	14bef9af6f	[P/D] Remove restrictions on mooncake for IPv6 (#5946 ) ### What this PR does / why we need it? Remove restrictions on mooncake for IPv6 Dependencies: cann8.5、mooncake v0.3.8.post1 - vLLM version: v0.13.0 - vLLM main: `2c24bc6996` --------- Signed-off-by: liziyu <liziyu16@huawei.com>	2026-01-24 11:30:22 +08:00
zhangyiming	56d8f088dd	[Doc] Update DeepSeek-V3.2 tutorail, add single-node and multi-node deployment (#6196 ) ### What this PR does / why we need it? [Doc] Update DeepSeek-V3.2 tutorail, add single-node and multi-node deployment - vLLM version: v0.14.0 - vLLM main: `d68209402d` Signed-off-by: menogrey <1299267905@qq.com>	2026-01-24 11:29:07 +08:00
Angazenn	1e116829ac	[doc]update --max-num-seqs in Qwen3-235b tutorial (#6197 ) ### What this PR does / why we need it? This pr update --max-num-seqs in Qwen3-235b single-node-deployment tutorial to ensure running into graph mode correctly. - vLLM version: v0.14.0 - vLLM main: `d68209402d` Signed-off-by: Angazenn <supperccell@163.com>	2026-01-23 17:11:10 +08:00
Li Wang	4d780a8b01	[Misc] Revert "[Misc] Bump mooncake version to v0.3.8.post1 (#6110 )" (#6164 ) ### What this PR does / why we need it? The new version of moonkcake lead to the image build failure. see https://github.com/vllm-project/vllm-ascend/actions/runs/21236469259/job/61105443733, we should revert it first ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `d68209402d` Signed-off-by: wangli <wangli858794774@gmail.com>	2026-01-23 09:53:32 +08:00
meihanc	e54d294df3	[CI]Install clang in dokerfile for triton ascend (#4409 ) ### What this PR does / why we need it? Install clang in dokerfile for triton ascend - vLLM version: v0.13.0 - vLLM main: `d68209402d` Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2026-01-22 19:01:28 +08:00
Li Wang	37a9cf818a	[Misc] Bump mooncake version to v0.3.8.post1 (#6110 ) ### What this PR does / why we need it? Since the mooncake has the newer [release](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.8.post1), we pin the tag to latest release ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `d68209402d` Signed-off-by: wangli <wangli858794774@gmail.com>	2026-01-22 11:03:16 +08:00
wangxiyuan	69740039b7	[CI] Upgrade CANN to 8.5.0 (#6070 ) ### What this PR does / why we need it? 1. Upgrade CANN to 8.5.0 2. move triton-ascend 3.2.0 to requirements note: we skipped the two failed e2e test, see https://github.com/vllm-project/vllm-ascend/issues/6076 for more detail. We'll fix it soon. ### How was this patch tested? Closes: https://github.com/vllm-project/vllm-ascend/issues/5494 - vLLM version: v0.13.0 - vLLM main: `d68209402d` --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-22 09:29:50 +08:00
Nengjun Ma	ab676413e6	Default enable MLAPO (#5952 ) ### What this PR does / why we need it? 1) Default enable MLAPO for deepseek MLA Attention W8A8 models on PD disagregation D Instance, for example: DeepSeekV3-W8A8, DeepSeek-R1-W8A8. 2) Default enable MLAPO for DeepSeek SFA Attention W8A8 models, currently is DeepSeek-V3.2-W8A8. ### Does this PR introduce _any_ user-facing change? Don't need use manully to VLLM_ASCEND_ENABLE_MLAPO=1, to enable MLAPO feature for deepseek w8a8 model The effect of enabling MLAPO SFA model deployed on a single A3 Node: Test with:tests/e2e/nightly/single_node/models/test_deepseek_v3_2_exp_w8a8.py dataset: gsm8k-lite，without set MTP, FULL GRAPH, has 19% promote：未默认开启 MLAPO 时： ├─────────────────────────┤ │ TTFT │ 14055.8836 ms │ ├─────────────────────────┤ │ ITL │ 66.8171 ms. │ ├─────────────────────────┤ │ Output Token Throughput │ 104.9105 token/s │ ├─────────────────────────┤ 默认开启 MLAPO 时： ├─────────────────────────┤ │ TTFT │ 3753.1547 ms │ ├─────────────────────────┤ │ ITL. │ 61.4236 ms. │ ├─────────────────────────┤ │ Output Token Throughput │ 125.2075 token/s│ ├─────────────────────────┤ - vLLM version: v0.13.0 - vLLM main: `2c24bc6996` --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2026-01-22 09:26:39 +08:00
MengLong Chen	a15a5f6aa5	[Doc] Supplement PD separation parameters of DeepSeek V3.1 (#6053 ) ### What this PR does / why we need it? Supplement PD separation parameters of DeepSeek V3.1 The recommended parameter configuration for DeepSeek V3.1 in the EP32 scenario after PD separation has been adjusted, and the core parameters have been described in detail. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `d68209402d` Signed-off-by: chenmenglong <chenmenglong1@huawei.com>	2026-01-22 08:53:44 +08:00
meihanc	53bfb38192	[CI]Update triton ascend version in 3.2.0 (#6067 ) ### What this PR does / why we need it? update triton ascend version in 3.2.0 - vLLM version: v0.13.0 - vLLM main: `d68209402d` Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2026-01-21 16:02:23 +08:00
Canlin Guo	afabb49f00	[Docs][Model] Support Qwen3-VL-Embedding & Qwen3-VL-Reranker (#6034 ) ### What this PR does / why we need it? Add docs for Qwen3-VL-Embedding & Qwen3-VL-Reranker. - vLLM version: v0.13.0 - vLLM main: `2c24bc6996` --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2026-01-20 17:36:31 +08:00
starmountain1997	0664c6e67a	[Doc] Add layer_sharding additional config for DeepSeek-V3.2-W8A8 (#5921 ) ### What this PR does / why we need it? #### Documentation Improvements New Configuration: Added the layer_sharding parameter to the DeepSeek-V3.2-W8A8 deployment tutorial. This guides users to include `["q_b_proj", "o_proj"]` in their prefill node setup for better resource utilization. #### CI and Testing Updates Test Config Update: Updated the multi-node E2E test configuration file: tests/e2e/nightly/multi_node/config/DeepSeek-V3_2-W8A8-A3-dual-nodes.yaml. including disable `FLASHCOMM` and enable `FULL_DECODE_ONLY` and update performance baseline. ### Does this PR introduce any user-facing change? Yes. The documentation now recommends a more optimized startup command for DeepSeek-V3.2-W8A8. Users following the updated tutorial will see improved performance in multi-node PD disaggregation environments. ### How was this patch tested? CI Validation: The updated E2E test configuration has been verified through the nightly CI pipeline. Environment: * vLLM version: v0.13.0 Base Commit: [11b6af5](`11b6af5280`) Hardware: Ascend A3/A2 multi-node cluster. --------- Signed-off-by: guozr <guozr1997@hotmail.com> Co-authored-by: guozr <guozr1997@hotmail.com>	2026-01-20 12:40:54 +08:00
meihanc	9cad1a8349	[Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (#5928 ) ### What this PR does / why we need it? Migrate the torch profiler configuration from deprecated environment variables (`VLLM_TORCH_PROFILER_DIR`, `VLLM_TORCH_PROFILER_WITH_STACK`, `VLLM_TORCH_PROFILER_WITH_PROFILE_MEMORY`) to the explicit `ProfilerConfig` object, aligning with vLLM's configuration best practices. The profiler environment variable approach is deprecated in vLLM and will be removed in v0.14.0 or v1.0.0. ### Does this PR introduce _any_ user-facing change? yes, for deverlopers who want to fetch profiler, he should use `--profiler-config` instead of `VLLM_TORCH_PROFILER_DIR` ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `11b6af5280` Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2026-01-19 09:27:55 +08:00
Shanshan Shen	efa0f64f22	[Doc] Add tutorials for Qwen3-VL-30B-A3B-Instruct (#5331 ) ### What this PR does / why we need it? Add tutorials for `Qwen3-VL-30B-A3B-Instruct`. - vLLM version: release/v0.13.0 - vLLM main: `bc0a5a0c08` --------- Signed-off-by: shen-shanshan <467638484@qq.com>	2026-01-15 10:56:19 +08:00
SILONG ZENG	4811ba62e0	[Lint]Style: reformat markdown files via markdownlint (#5884 ) ### What this PR does / why we need it? reformat markdown files via markdownlint - vLLM version: v0.13.0 - vLLM main: `bde38c11df` --------- Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>	2026-01-15 09:06:01 +08:00
lty	295018ec0f	[Refactor]Refactor of vllm_ascend/distributed module (#5719 ) ### What this PR does / why we need it? Based on the RFC:https://github.com/vllm-project/vllm-ascend/issues/5604 This PR is a refactoring of vllm_ascend/distributed, moving all kv_transfer realtaed codes into a dedicated folder, which has already been done in vLLM ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` --------- Signed-off-by: lty <linhebiwen@gmail.com>	2026-01-15 08:57:40 +08:00
herizhen	d31170496b	[doc]index display by category (#5852 ) ### What this PR does / why we need it? upgrade tutorial doc index display by category ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-14 16:50:49 +08:00
liziyu	451bbdc292	[Doc] add tls check to pd disaggregation readme (#5638 ) ### What this PR does / why we need it? update pd disaggregation multi_node readme, update the environment check command for A3, add tls check ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `8be6432bda` Signed-off-by: liziyu <liziyu16@huawei.com>	2026-01-12 15:49:18 +08:00
1092626063	3ba064f804	[Doc] Add GLM4.5 GLM4.6 doc (#5740 ) ### What this PR does / why we need it? Add GLM4.5 GLM4.6 doc - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` Signed-off-by: 1092626063 <1092626063@qq.com>	2026-01-09 16:40:49 +08:00
zyz111222	98c788a65a	[Doc] add PaddleOCR-VL tutorials guide (#5556 ) ### What this PR does / why we need it? 1. add PaddleOCR-VL.md in the `docs/source/tutorials/` 2. add PaddleOCR-VL index in `docs/source/tutorials/index.md` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by CI - vLLM version: v0.13.0 - vLLM main: `7157596103` Signed-off-by: zouyizhou <zouyizhou@huawei.com>	2026-01-09 11:01:25 +08:00
meihanc	503822c56c	[Doc] Add Qwen3-Omni-30B-A3B-Thinking Tutorials (#3991 ) ### What this PR does / why we need it? Add Qwen3-Omni-30B-A3B-Thinking Tutorials ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `5326c89803` --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2026-01-08 16:57:20 +08:00
meihanc	c1dcddce3f	[CI]update bisheng version (#5621 ) ### What this PR does / why we need it? update bisheng version in 20260105 - vLLM version: v0.13.0 - vLLM main: `8be6432bda` Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2026-01-06 15:22:22 +08:00
huqi	2d22700d69	Docs: Add A3 Docker image guidance for Atlas A3 machines (#5256 ) Fixes #3386 - Update Qwen3-30B-A3B.md to use A3-specific image tag - Update Qwen3-Dense.md to provide both A2 and A3 image options - Update Qwen3-Next.md to use A3-specific image for Atlas A3 environments Previously, documentation only mentioned A2 images (vllm-ascend:version) but Atlas A3 machines require A3-specific images (vllm-ascend:version-a3). This change ensures users select the correct image for their hardware. 🤖 Generated with [Claude Code](https://claude.com/claude-code) - vLLM version: release/v0.13.0 - vLLM main: `ad32e3e19c` Signed-off-by: hu-qi <huqi1024@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-01-05 19:42:42 +08:00
zhangmuzhi_yuwan	6c1a685b30	[Doc] add new doc for mooncake: PD-Colocated cross-node multi-instance validation of Mooncake's KV Cache reuse and performance. (#5415 ) ### What this PR does / why we need it? This documentation provides a comprehensive technical guide for deploying vLLM-Ascend using a Prefill-Decode (PD) colocated architecture integrated with Mooncake, a high-performance distributed KV Cache transfer engine. As Large Language Model (LLM) serving scales, managing KV Cache efficiently across distributed nodes is essential for reducing latency and optimizing hardware utilization. The tutorial focuses on a multi-instance setup using Huawei Atlas 800T A2 nodes. By leveraging Mooncake’s distributed memory pooling, vLLM instances can achieve seamless cross-node KV Cache reuse. This capability allows an instance to retrieve precomputed cache from a remote node's DRAM via high-speed RoCE networks, effectively bypassing redundant prefill computations. ### Does this PR introduce _any_ user-facing change? No - vLLM version: release/v0.13.0 - vLLM main: `0bfd7484fd` --------- Signed-off-by: zhangmuzhibangde <1037640609@qq.com> Signed-off-by: zhangmuzhi_yuwan <1037640609@qq.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2026-01-05 14:19:57 +08:00
meihanc	fbb93ad8f2	[bugfix]update bishengir source envs (#5582 ) ### What this PR does / why we need it? Due to the update of the Bisheng version's installation path, the corresponding source path in the environment variables needs to be updated. - vLLM version: v0.13.0 - vLLM main: `7157596103` --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2026-01-05 09:13:40 +08:00
Cao Yi	749c4a3deb	[Doc] Fix typo in ASCEND_RT_VISIBLE_DEVICES (#5581 ) Fixed a typo in the environment variable name. `ASCEBD_RT_VISIBLE_DEVICES` -> `ASCEND_RT_VISIBLE_DEVICES` Fixes #5580 Signed-off-by: SlightwindSec <slightwindsec@gmail.com>	2026-01-04 17:01:02 +08:00
TmacAaron	fd4b4fd06f	[Doc] Fix spelling mistake of environment variable name ASCEND_RT_VISIBLE_DEVICES in Doc (#5570 ) ### What this PR does / why we need it? Spelling mistake of Environment Variable "ASCEND_RT_VISIBLE_DEVICES" in [Doc](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/DeepSeek-V3.1.html#prefill-decode-disaggregation). - vLLM version: v0.13.0 - vLLM main: `7157596103` Signed-off-by: TmacAaron <yangyit139@gmail.com>	2026-01-04 11:52:58 +08:00
huqi	c85cc045f8	Docs: Remove deprecated --task parameter for embedding models (#5257 ) Fixes #3376 - Remove --task embed from vllm serve command in Qwen3_embedding.md - Remove task='embed' parameter from LLM constructor in Python example The --task parameter has been deprecated in recent vLLM versions in favor of automatic model type detection. - vLLM version: release/v0.13.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: hu-qi <huqi1024@gmail.com>	2025-12-30 16:09:07 +08:00
meihanc	8c4e9bb76b	[CI]update triton ascend version (#5392 ) ### What this PR does / why we need it? update triton-ascend version to 1229 and bisheng version in 1225; - vLLM version: release/v0.13.0 - vLLM main: `254f6b9867` --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2025-12-30 09:51:45 +08:00
weiguihua2	c30c3dc831	[Doc]modify pcp tutorial doc (#5440 ) ### What this PR does / why we need it? modify pcp tutorial doc Because some optimization points have been submitted as PRs and haven't been merged yet, I'll update the performance data now and refresh it again after the PRs are merged. - vLLM version: release/v0.13.0 - vLLM main: `81786c8774` Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2025-12-27 17:47:09 +08:00
MengLong Chen	b8b5521f5b	[Doc] Update DeepSeek V3.1/R1 2P1D doc (#5387 ) ### What this PR does / why we need it? The PR updates the documentation for DeepSeek-V3.1 and DeepSeek-R1 in the scenario of prefill-decode disaggregation. Updated some PD separation-related setting parameters and optimal configurations. This script has been verified. - vLLM version: release/v0.13.0 - vLLM main: `bc0a5a0c08` Signed-off-by: chenmenglong <chenmenglong1@huawei.com>	2025-12-27 17:28:43 +08:00
cookieyyds	843751768e	[DOC]Fix model weight download links (#5436 ) Updated download links for DeepSeek-V3.2 model weights. - vLLM version: release/v0.13.0 - vLLM main: `81786c8774` Signed-off-by: cookieyyds <126683903+cookieyyds@users.noreply.github.com>	2025-12-27 17:14:31 +08:00
Zhu Yi Lin	04104031d0	[Doc] Modify DeepSeek-R1/V3.1 documentation (#5426 ) ### What this PR does / why we need it? Modify DeepSeek-R1/V3.1 documentation. Mainly update the mtp size and some other configs. Signed-off-by: GDzhu01 <809721801@qq.com>	2025-12-27 17:13:58 +08:00
Angazenn	eab306b09c	[doc] Update Qwen3-235B doc for reproducing latest performance (#5323 ) ### What this PR does / why we need it? This PR updates Qwen3-235B doc to give a simple recipe for repreducing our latest perfomance on Atlas A3 servers. - vLLM version: release/v0.13.0 - vLLM main: `5fbfa8d9ef` --------- Signed-off-by: Angazenn <supperccell@163.com>	2025-12-27 15:55:58 +08:00
Zhu Yi Lin	be2a947521	[Doc] delete environment variable HCCL_OP_EXPANSION_MODE in DeepSeekV3.1/R1 (#5419 ) ### What this PR does / why we need it? Currently, HCCL_OP_EXPANSION_MODE="AIV" is causing some freezing issues on A2.so we have temporarily removed it from the documentation. Signed-off-by: GDzhu01 <809721801@qq.com>	2025-12-27 12:44:50 +08:00
LookAround0301	ca31d6823e	[Doc] add long_sequence feature user guide (#5343 ) ### What this PR does / why we need it? add long_sequence feature user guide - vLLM version: release/v0.13.0 - vLLM main: `bc0a5a0c08` --------- Signed-off-by: LookAround <lixushi@huawei.com>	2025-12-27 10:44:43 +08:00
weiguihua2	69f96950e1	[Doc] modify pcp tutorials (#5411 ) ### What this PR does / why we need it? modify pcp tutorials modify pcp perf statistics and add note: Context parallel feature currently is only supported on Atlas A3 device, and will be supported on Atlas A2 in the future. - vLLM version: release/v0.13.0 - vLLM main: `81786c8774` --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2025-12-27 10:36:10 +08:00
weiguihua2	ce52e17bf3	[Doc]add long sequence tutorials (#5364 ) ### What this PR does / why we need it? Provide sample guidance for running long-sequence DeepSeek across multiple nodes To guide users on using the context parallel feature, a practical example is provided. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: `bc0a5a0c08` Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2025-12-27 09:52:11 +08:00

1 2 3 4

173 Commits