Commit Graph

214 Commits

Author SHA1 Message Date
Nengjun Ma
f910cebe04 [Doc] 310P Documents update (#6246)
### What this PR does / why we need it?
310P support guides updates, as currently has supported in main branch.

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2026-01-26 14:33:21 +08:00
Li Wang
c26ad78f86 [CI][lint] Add rule codespell back (#6236)
### What this PR does / why we need it?
After removing codepsell a while, we discovered that typo had a problem
correctly recognizing certain misspelled words, so I suggested adding it
back.

- vLLM version: v0.14.1
- vLLM main:
d68209402d

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-01-26 14:12:33 +08:00
Shanshan Shen
e3eefdecbd [Doc] Update max_tokens to max_completion_tokens in all docs (#6248)
### What this PR does / why we need it?

Fix:

```
DeprecationWarning: max_tokens is deprecated in favor of the max_completion_tokens field.
```

- vLLM version: v0.14.1
- vLLM main:
d68209402d

Signed-off-by: shen-shanshan <467638484@qq.com>
2026-01-26 11:57:40 +08:00
liziyu
14bef9af6f [P/D] Remove restrictions on mooncake for IPv6 (#5946)
### What this PR does / why we need it?
Remove restrictions on mooncake for IPv6
Dependencies: cann8.5、mooncake v0.3.8.post1

- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
2026-01-24 11:30:22 +08:00
zhangyiming
56d8f088dd [Doc] Update DeepSeek-V3.2 tutorail, add single-node and multi-node deployment (#6196)
### What this PR does / why we need it?
[Doc] Update DeepSeek-V3.2 tutorail, add single-node and multi-node
deployment

- vLLM version: v0.14.0
- vLLM main:
d68209402d

Signed-off-by: menogrey <1299267905@qq.com>
2026-01-24 11:29:07 +08:00
Angazenn
1e116829ac [doc]update --max-num-seqs in Qwen3-235b tutorial (#6197)
### What this PR does / why we need it?
This pr update --max-num-seqs in Qwen3-235b single-node-deployment
tutorial to ensure running into graph mode correctly.

- vLLM version: v0.14.0
- vLLM main:
d68209402d

Signed-off-by: Angazenn <supperccell@163.com>
2026-01-23 17:11:10 +08:00
Li Wang
4d780a8b01 [Misc] Revert "[Misc] Bump mooncake version to v0.3.8.post1 (#6110)" (#6164)
### What this PR does / why we need it?
The new version of moonkcake lead to the image build failure. see
https://github.com/vllm-project/vllm-ascend/actions/runs/21236469259/job/61105443733,
we should revert it first
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
d68209402d

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-01-23 09:53:32 +08:00
meihanc
e54d294df3 [CI]Install clang in dokerfile for triton ascend (#4409)
### What this PR does / why we need it?
Install clang in dokerfile for triton ascend

- vLLM version: v0.13.0
- vLLM main:
d68209402d

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
2026-01-22 19:01:28 +08:00
Li Wang
37a9cf818a [Misc] Bump mooncake version to v0.3.8.post1 (#6110)
### What this PR does / why we need it?
Since the mooncake has the newer
[release](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.8.post1),
we pin the tag to latest release

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
d68209402d

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-01-22 11:03:16 +08:00
wangxiyuan
69740039b7 [CI] Upgrade CANN to 8.5.0 (#6070)
### What this PR does / why we need it?
1. Upgrade CANN to 8.5.0
2. move triton-ascend 3.2.0 to requirements

note: we skipped the two failed e2e test, see
https://github.com/vllm-project/vllm-ascend/issues/6076 for more detail.
We'll fix it soon.


### How was this patch tested?
Closes: https://github.com/vllm-project/vllm-ascend/issues/5494

- vLLM version: v0.13.0
- vLLM main:
d68209402d

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-01-22 09:29:50 +08:00
Nengjun Ma
ab676413e6 Default enable MLAPO (#5952)
### What this PR does / why we need it?
1) Default enable MLAPO for deepseek MLA Attention W8A8 models on PD
disagregation D Instance, for example: DeepSeekV3-W8A8,
DeepSeek-R1-W8A8.
2) Default enable MLAPO for DeepSeek SFA Attention W8A8 models,
currently is DeepSeek-V3.2-W8A8.

### Does this PR introduce _any_ user-facing change?
Don't need use manully to VLLM_ASCEND_ENABLE_MLAPO=1, to enable MLAPO
feature for deepseek w8a8 model

The effect of enabling MLAPO SFA model deployed on a single A3 Node:
Test
with:tests/e2e/nightly/single_node/models/test_deepseek_v3_2_exp_w8a8.py
dataset: gsm8k-lite,without set MTP, FULL GRAPH, has 19% promote:
未默认开启 MLAPO 时:
├─────────────────────────┤
│                TTFT                      │ 14055.8836 ms   │
├─────────────────────────┤
│                ITL                         │ 66.8171 ms.          │
├─────────────────────────┤
│ Output Token Throughput  │ 104.9105 token/s │
├─────────────────────────┤
默认开启 MLAPO 时:
├─────────────────────────┤
│                TTFT                      │ 3753.1547 ms   │
├─────────────────────────┤
│                ITL.                        │ 61.4236  ms.       │
├─────────────────────────┤
│ Output Token Throughput  │ 125.2075 token/s│
├─────────────────────────┤

- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2026-01-22 09:26:39 +08:00
MengLong Chen
a15a5f6aa5 [Doc] Supplement PD separation parameters of DeepSeek V3.1 (#6053)
### What this PR does / why we need it?
Supplement PD separation parameters of DeepSeek V3.1
The recommended parameter configuration for DeepSeek V3.1 in the EP32
scenario after PD separation has been adjusted, and the core parameters
have been described in detail.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
d68209402d

Signed-off-by: chenmenglong <chenmenglong1@huawei.com>
2026-01-22 08:53:44 +08:00
meihanc
53bfb38192 [CI]Update triton ascend version in 3.2.0 (#6067)
### What this PR does / why we need it?
update triton ascend version in 3.2.0

- vLLM version: v0.13.0
- vLLM main:
d68209402d

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
2026-01-21 16:02:23 +08:00
Canlin Guo
afabb49f00 [Docs][Model] Support Qwen3-VL-Embedding & Qwen3-VL-Reranker (#6034)
### What this PR does / why we need it?

Add docs for Qwen3-VL-Embedding & Qwen3-VL-Reranker.

- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

---------

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
2026-01-20 17:36:31 +08:00
starmountain1997
0664c6e67a [Doc] Add layer_sharding additional config for DeepSeek-V3.2-W8A8 (#5921)
### What this PR does / why we need it?

#### Documentation Improvements

New Configuration: Added the layer_sharding parameter to the
DeepSeek-V3.2-W8A8 deployment tutorial. This guides users to include
`["q_b_proj", "o_proj"]` in their prefill node setup for better resource
utilization.

#### CI and Testing Updates

Test Config Update: Updated the multi-node E2E test configuration file:
tests/e2e/nightly/multi_node/config/DeepSeek-V3_2-W8A8-A3-dual-nodes.yaml.

including disable `FLASHCOMM` and enable `FULL_DECODE_ONLY` and update
performance baseline.

### Does this PR introduce any user-facing change?

Yes. The documentation now recommends a more optimized startup command
for DeepSeek-V3.2-W8A8. Users following the updated tutorial will see
improved performance in multi-node PD disaggregation environments.

### How was this patch tested?
CI Validation: The updated E2E test configuration has been verified
through the nightly CI pipeline.

Environment: * vLLM version: v0.13.0

Base Commit:
[11b6af5](11b6af5280)

Hardware: Ascend A3/A2 multi-node cluster.

---------

Signed-off-by: guozr <guozr1997@hotmail.com>
Co-authored-by: guozr <guozr1997@hotmail.com>
2026-01-20 12:40:54 +08:00
meihanc
9cad1a8349 [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (#5928)
### What this PR does / why we need it?

Migrate the torch profiler configuration from deprecated environment
variables (`VLLM_TORCH_PROFILER_DIR`, `VLLM_TORCH_PROFILER_WITH_STACK`,
`VLLM_TORCH_PROFILER_WITH_PROFILE_MEMORY`) to the explicit
`ProfilerConfig` object, aligning with vLLM's configuration best
practices.
The profiler environment variable approach is deprecated in vLLM and
will be removed in v0.14.0 or v1.0.0.

### Does this PR introduce _any_ user-facing change?
yes, for deverlopers who want to fetch profiler, he should use `--profiler-config` instead of `VLLM_TORCH_PROFILER_DIR`
### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
11b6af5280

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
2026-01-19 09:27:55 +08:00
Shanshan Shen
efa0f64f22 [Doc] Add tutorials for Qwen3-VL-30B-A3B-Instruct (#5331)
### What this PR does / why we need it?

Add tutorials for `Qwen3-VL-30B-A3B-Instruct`.

- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2026-01-15 10:56:19 +08:00
SILONG ZENG
4811ba62e0 [Lint]Style: reformat markdown files via markdownlint (#5884)
### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
bde38c11df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
2026-01-15 09:06:01 +08:00
lty
295018ec0f [Refactor]Refactor of vllm_ascend/distributed module (#5719)
### What this PR does / why we need it?
Based on the RFC:https://github.com/vllm-project/vllm-ascend/issues/5604

This PR is a refactoring of vllm_ascend/distributed, moving all
kv_transfer realtaed codes into a dedicated folder, which has already
been done in vLLM

### Does this PR introduce _any_ user-facing change?
NA

### How was this patch tested?


- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: lty <linhebiwen@gmail.com>
2026-01-15 08:57:40 +08:00
herizhen
d31170496b [doc]index display by category (#5852)
### What this PR does / why we need it?
upgrade tutorial doc index display by category

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: herizhen <1270637059@qq.com>
Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-14 16:50:49 +08:00
liziyu
451bbdc292 [Doc] add tls check to pd disaggregation readme (#5638)
### What this PR does / why we need it?

update pd disaggregation multi_node readme, update the environment check
command for A3, add tls check
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.13.0
- vLLM main:
8be6432bda

Signed-off-by: liziyu <liziyu16@huawei.com>
2026-01-12 15:49:18 +08:00
1092626063
3ba064f804 [Doc] Add GLM4.5 GLM4.6 doc (#5740)
### What this PR does / why we need it?
Add GLM4.5 GLM4.6 doc

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: 1092626063 <1092626063@qq.com>
2026-01-09 16:40:49 +08:00
zyz111222
98c788a65a [Doc] add PaddleOCR-VL tutorials guide (#5556)
### What this PR does / why we need it?
1. add PaddleOCR-VL.md in the `docs/source/tutorials/`
2. add PaddleOCR-VL index in  `docs/source/tutorials/index.md`

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by CI

- vLLM version: v0.13.0
- vLLM main:
7157596103

Signed-off-by: zouyizhou <zouyizhou@huawei.com>
2026-01-09 11:01:25 +08:00
meihanc
503822c56c [Doc] Add Qwen3-Omni-30B-A3B-Thinking Tutorials (#3991)
### What this PR does / why we need it?
Add Qwen3-Omni-30B-A3B-Thinking Tutorials 

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
5326c89803

---------

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
2026-01-08 16:57:20 +08:00
meihanc
c1dcddce3f [CI]update bisheng version (#5621)
### What this PR does / why we need it?
update bisheng version in 20260105

- vLLM version: v0.13.0
- vLLM main:
8be6432bda

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
2026-01-06 15:22:22 +08:00
huqi
2d22700d69 Docs: Add A3 Docker image guidance for Atlas A3 machines (#5256)
Fixes #3386

- Update Qwen3-30B-A3B.md to use A3-specific image tag
- Update Qwen3-Dense.md to provide both A2 and A3 image options  
- Update Qwen3-Next.md to use A3-specific image for Atlas A3
environments

Previously, documentation only mentioned A2 images (vllm-ascend:version)
but Atlas A3 machines require A3-specific images
(vllm-ascend:version-a3). This change ensures users select the correct
image for their hardware.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

Signed-off-by: hu-qi <huqi1024@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-01-05 19:42:42 +08:00
zhangmuzhi_yuwan
6c1a685b30 [Doc] add new doc for mooncake: PD-Colocated cross-node multi-instance validation of Mooncake's KV Cache reuse and performance. (#5415)
### What this PR does / why we need it?
This documentation provides a comprehensive technical guide for
deploying **vLLM-Ascend** using a **Prefill-Decode (PD) colocated
architecture** integrated with **Mooncake**, a high-performance
distributed KV Cache transfer engine. As Large Language Model (LLM)
serving scales, managing KV Cache efficiently across distributed nodes
is essential for reducing latency and optimizing hardware utilization.

The tutorial focuses on a multi-instance setup using Huawei **Atlas 800T
A2** nodes. By leveraging Mooncake’s distributed memory pooling, vLLM
instances can achieve seamless **cross-node KV Cache reuse**. This
capability allows an instance to retrieve precomputed cache from a
remote node's DRAM via high-speed **RoCE** networks, effectively
bypassing redundant prefill computations.

### Does this PR introduce _any_ user-facing change?
No

- vLLM version: release/v0.13.0
- vLLM main:
0bfd7484fd

---------

Signed-off-by: zhangmuzhibangde <1037640609@qq.com>
Signed-off-by: zhangmuzhi_yuwan <1037640609@qq.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2026-01-05 14:19:57 +08:00
meihanc
fbb93ad8f2 [bugfix]update bishengir source envs (#5582)
### What this PR does / why we need it?
Due to the update of the Bisheng version's installation path, the
corresponding source path in the environment variables needs to be
updated.

- vLLM version: v0.13.0
- vLLM main:
7157596103
---------
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
2026-01-05 09:13:40 +08:00
Cao Yi
749c4a3deb [Doc] Fix typo in ASCEND_RT_VISIBLE_DEVICES (#5581)
Fixed a typo in the environment variable name.
`ASCEBD_RT_VISIBLE_DEVICES` -> `ASCEND_RT_VISIBLE_DEVICES`
Fixes #5580

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
2026-01-04 17:01:02 +08:00
TmacAaron
fd4b4fd06f [Doc] Fix spelling mistake of environment variable name ASCEND_RT_VISIBLE_DEVICES in Doc (#5570)
### What this PR does / why we need it?
Spelling mistake of Environment Variable "ASCEND_RT_VISIBLE_DEVICES" in
[Doc](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/DeepSeek-V3.1.html#prefill-decode-disaggregation).


- vLLM version: v0.13.0
- vLLM main:
7157596103

Signed-off-by: TmacAaron <yangyit139@gmail.com>
2026-01-04 11:52:58 +08:00
huqi
c85cc045f8 Docs: Remove deprecated --task parameter for embedding models (#5257)
Fixes #3376

- Remove --task embed from vllm serve command in Qwen3_embedding.md
- Remove task='embed' parameter from LLM constructor in Python example

The --task parameter has been deprecated in recent vLLM versions 
in favor of automatic model type detection.
- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: hu-qi <huqi1024@gmail.com>
2025-12-30 16:09:07 +08:00
meihanc
8c4e9bb76b [CI]update triton ascend version (#5392)
### What this PR does / why we need it?
update triton-ascend version to 1229 and bisheng version in 1225;

- vLLM version: release/v0.13.0
- vLLM main:
254f6b9867
---------
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
2025-12-30 09:51:45 +08:00
weiguihua2
c30c3dc831 [Doc]modify pcp tutorial doc (#5440)
### What this PR does / why we need it?
modify pcp tutorial doc

Because some optimization points have been submitted as PRs and haven't
been merged yet, I'll update the performance data now and refresh it
again after the PRs are merged.

- vLLM version: release/v0.13.0
- vLLM main:
81786c8774

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2025-12-27 17:47:09 +08:00
MengLong Chen
b8b5521f5b [Doc] Update DeepSeek V3.1/R1 2P1D doc (#5387)
### What this PR does / why we need it?
The PR updates the documentation for DeepSeek-V3.1 and DeepSeek-R1 in
the scenario of prefill-decode disaggregation.

Updated some PD separation-related setting parameters and optimal
configurations. This script has been verified.

- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

Signed-off-by: chenmenglong <chenmenglong1@huawei.com>
2025-12-27 17:28:43 +08:00
cookieyyds
843751768e [DOC]Fix model weight download links (#5436)
Updated download links for DeepSeek-V3.2 model weights.

- vLLM version: release/v0.13.0
- vLLM main:
81786c8774

Signed-off-by: cookieyyds <126683903+cookieyyds@users.noreply.github.com>
2025-12-27 17:14:31 +08:00
Zhu Yi Lin
04104031d0 [Doc] Modify DeepSeek-R1/V3.1 documentation (#5426)
### What this PR does / why we need it?
Modify DeepSeek-R1/V3.1 documentation. Mainly update the mtp size and some other configs.

Signed-off-by: GDzhu01 <809721801@qq.com>
2025-12-27 17:13:58 +08:00
Angazenn
eab306b09c [doc] Update Qwen3-235B doc for reproducing latest performance (#5323)
### What this PR does / why we need it?
This PR updates Qwen3-235B doc to give a simple recipe for repreducing
our latest perfomance on Atlas A3 servers.

- vLLM version: release/v0.13.0
- vLLM main:
5fbfa8d9ef
---------
Signed-off-by: Angazenn <supperccell@163.com>
2025-12-27 15:55:58 +08:00
Zhu Yi Lin
be2a947521 [Doc] delete environment variable HCCL_OP_EXPANSION_MODE in DeepSeekV3.1/R1 (#5419)
### What this PR does / why we need it?
Currently, HCCL_OP_EXPANSION_MODE="AIV" is causing some freezing issues
on A2.so we have temporarily removed it from the documentation.

Signed-off-by: GDzhu01 <809721801@qq.com>
2025-12-27 12:44:50 +08:00
LookAround0301
ca31d6823e [Doc] add long_sequence feature user guide (#5343)
### What this PR does / why we need it?
add long_sequence feature user guide

- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

---------

Signed-off-by: LookAround <lixushi@huawei.com>
2025-12-27 10:44:43 +08:00
weiguihua2
69f96950e1 [Doc] modify pcp tutorials (#5411)
### What this PR does / why we need it?
modify pcp tutorials

modify pcp perf statistics and add note: Context parallel feature
currently is only supported on Atlas A3 device, and will be supported on
Atlas A2 in the future.

- vLLM version: release/v0.13.0
- vLLM main:
81786c8774
---------
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2025-12-27 10:36:10 +08:00
weiguihua2
ce52e17bf3 [Doc]add long sequence tutorials (#5364)
### What this PR does / why we need it?
Provide sample guidance for running long-sequence DeepSeek across
multiple nodes

To guide users on using the context parallel feature, a practical
example is provided.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2025-12-27 09:52:11 +08:00
LeeWenquan
7685d0c239 rollback causal_conv1d_fn to torch ops & update qwen3Next doc (#5391)
### What this PR does / why we need it?
Rollback causal_conv1d_fn ops from triton to torch version to fix
hanging issues,meanwhile update Qwen3Next doc

- vLLM version: release/v0.13.0
- vLLM main:
254f6b9867
---------
Signed-off-by: SunnyLee219 <3294305115@qq.com>
2025-12-26 19:57:38 +08:00
Zhu Yi Lin
06732dbf5b [Doc] update R1/V3.1 doc (#5383)
### What this PR does / why we need it?
This PR updates DeepSeek-R1/V3.1 doc to give a simple recipe for
repreducing our latest perfomance on Atlas A3/A2 servers.
### Does this PR introduce any user-facing change?
No.

Signed-off-by: GDzhu01 <809721801@qq.com>
2025-12-26 17:09:22 +08:00
cookieyyds
2da8038dd2 [doc] update using command (#5373)
### What this PR does / why we need it?
Update the configuration for optimal performance of deepseek v3.2 in the usage tutorial.

- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08
---------
Signed-off-by: cookieyyds <126683903+cookieyyds@users.noreply.github.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-12-25 22:28:35 +08:00
wangxiyuan
2ae0bad96d Remove VLLM_ASCEND_ENABLE_DENSE_OPTIMIZE (#5272)
`VLLM_ASCEND_ENABLE_DENSE_OPTIMIZE` is only used together with
`VLLM_ASCEND_ENABLE_PREFETCH_MLP` which is useless totally. This PR
remove it.
- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-25 11:09:56 +08:00
ZYang6263
a3f65b938f [Doc] Add pa_shape_list description to qwen dense tutorial (#5225)
### What this PR does / why we need it?
Add pa_shape_list description to qwen dense tutorial.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

Signed-off-by: ZYang6263 <zy626375@gmail.com>
Co-authored-by: zzzzwwjj <34335947+zzzzwwjj@users.noreply.github.com>
2025-12-24 14:40:20 +08:00
rongfu.leng
c9b5881bcd [Doc] fix docs set rope_theta value is 10e6 in qwen3-235b model (#5258)
### What this PR does / why we need it?

Fixes https://github.com/vllm-project/vllm-ascend/issues/5201

### Does this PR introduce _any_ user-facing change?
No, doc only

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

Signed-off-by: rongfu.leng <lenronfu@gmail.com>
2025-12-23 10:21:46 +08:00
zhangyiming
f883a2edb9 [Doc] Update the weight download URL. (#5238)
### What this PR does / why we need it?
Update the weight download URL. Because the model was renamed.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: menogrey <1299267905@qq.com>
2025-12-23 08:53:30 +08:00
zhangyiming
dc047489c7 [Doc] Fix DeepSeek-V3.2 tutorial. (#5190)
### What this PR does / why we need it?
Fix DeepSeek-V3.2 tutorial.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: menogrey <1299267905@qq.com>
2025-12-22 11:30:17 +08:00
luluxiu520
bc05a81bf2 Add Qwen3-VL-235B-A22B-Instruct tutorials (#5167)
### What this PR does / why we need it?

This PR provides an introduction to the Qwen3-VL-235B-A22B-Instruct
model, details on the features supported by the model in the current
version, the model deployment process, as well as methods for
performance testing and accuracy testing.

With this document, the deployment and testing of the
Qwen3-VL-235B-A22B-Instruct model can be implemented more easily.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: luluxiu520 <l2625793@outlook.com>
2025-12-19 14:56:17 +08:00