Commit Graph

250 Commits

Author SHA1 Message Date
LI SHENGYONG
0cb4bca1ff [Doc] EPLB update the documentation (#8795)
### What this PR does / why we need it?
1. Update documentation:In the current version, we recommend using the
following: policy of swift balancer(2).

### Does this PR introduce _any_ user-facing change?
  --additional-config '{ "eplb_config": {
    "dynamic_eplb": true,
    "expert_heat_collection_interval": 600,
    "algorithm_execution_interval": 50,
    "eplb_policy_type": 2,
    "num_redundant_experts": {ep_size}
    }}'

### How was this patch tested?
Test in DSV3.1 EP32

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
2026-04-29 17:15:29 +08:00
SILONG ZENG
2e2aaa2fae [Doc][v0.18.0] Fix documentation formatting and improve code examples (#8701)
### What this PR does / why we need it?
This PR fixes various documentation issues and improves code examples
throughout the project.

Signed-off-by: MrZ20 <2609716663@qq.com>
2026-04-28 09:01:25 +08:00
Lucky1
bd3774d601 [Doc][Misc] Improve documentation quality by revising specific content. (#8603)
### What this PR does / why we need it?

To improve the quality of certain docs by revising specific content.

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

- vLLM version: v0.19.0
- vLLM main:
6f786f2c50

---------

Signed-off-by: Lucky1 <144669645+verylucky01@users.noreply.github.com>
2026-04-24 15:40:41 +08:00
Wangbei25
571edc58fa [Doc]Update DeepSeekOCR2.md for releases/v0.18.0 (#8604)
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->
Update DeepSeekOCR2.md for releases/v0.18.0
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->
NO
### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->
vLLM version: v0.18.0
vLLM main:
bcf2be9612

---------

Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Co-authored-by: Wangbei25 <wangbei41@huawie.com>
2026-04-23 23:48:03 +08:00
zzzzwwjj
1fc7bc056d [0.18.0][Doc] Add NPU soft partitioning + cudagraph.piecewise limitation (#8595)
### What this PR does / why we need it?

Added NPU soft partitioning + cudagraph.piecewise limitation in graph
mode user guide doc.

### Does this PR introduce _any_ user-facing change?


### How was this patch tested?

Signed-off-by: zzzzwwjj <1183291235@qq.com>
2026-04-23 19:09:55 +08:00
sunshine202600
786eaf8b07 [Doc][Misc] Improve readability and fix typos in documentation (#8633)
### What this PR does / why we need it?

This PR improves the readability of the documentation by fixing typos,
correcting command extensions.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Documentation changes only.

Signed-off-by: sunshine202600 <sunshine202600@163.com>
2026-04-23 18:45:17 +08:00
Frank Chen
ce92be29d2 [Doc] Clarify irqbalance service management (#8614)
### What this PR does / why we need it?

This PR clarifies the CPU binding documentation for managing the
`irqbalance` service.

The previous wording only mentioned Ubuntu while the command shown is
specific to systemd-based Linux distributions. This update describes the
command as applicable to Ubuntu and other systemd-based distributions,
and adds a note for non-systemd systems to use the distribution-specific
service-management command.

### Does this PR introduce _any_ user-facing change?

No. This is a documentation-only update and does not change vLLM or
vllm-ascend runtime behavior.

### How was this patch tested?

Signed-off-by: chenchuw886 <chenchuw@huawei.com>
Co-authored-by: chenchuw886 <chenchuw@huawei.com>
2026-04-23 16:10:07 +08:00
pz1116
30d08ced2d [Doc][0.18.0] Fix kv pool CLI flag typo and formatting (#8608)
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
Fix kv pool CLI flag typo and formatting

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>
2026-04-23 12:07:47 +08:00
wangxiaoteng888
363febb6cb [BugFix][v0.18.0] Gate recompute/balance/fused_mc2 by PD mode (#8374)
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
- Enforce recompute scheduler only in PD-disaggregated mode.
- Enforce balance scheduling only in PD-mixed mode.
- Enforce fused MC2 only on PD-disaggregated D-side (kv_consumer).
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

### Does this PR introduce _any_ user-facing change?
No
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

### How was this patch tested?
By ci
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

---------

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
2026-04-18 18:06:42 +08:00
herizhen
76cc2204bd [Doc] Sensitive word modification (#8303)
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
This PR updates the documentation to replace specific hardware terms
(e.g., HBM, 910B, 310P) with more generic or branded terms (e.g.,
on-chip memory, Atlas inference products) to comply with sensitive word
requirements.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

---------

Signed-off-by: herizhen <1270637059@qq.com>
Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
2026-04-17 16:30:00 +08:00
sunshine202600
1dd1de8153 [Doc][Misc] Improve readability and fix typos in documentation (#8340)
### What this PR does / why we need it?

This PR improves the readability of the documentation by fixing typos,
correcting command extensions, and fixing broken links in the Chinese
README.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Documentation changes only.

---------

Signed-off-by: sunshine202600 <sunshine202600@163.com>
2026-04-17 08:54:38 +08:00
herizhen
95726d20eb [Doc][Misc] Correcting the document and uploading the model deployment template (#8287)
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
Correcting the document and uploading the model deployment template

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

---------

Signed-off-by: herizhen <1270637059@qq.com>
Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
2026-04-15 16:03:11 +08:00
Frank Chen
31186a3a9d [BugFix] Add async communication check for capturing mode (#8149)
### What this PR does / why we need it?
Introduce a check to not using asynchronous communication under
`enable_dsa_cp_with_layer_shard` branch on capturing mode. This change
prevents potential stream and event issues when operating in
graph/capturing mode, ensuring safer communication practices.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
E2E test with dsv32 + FC1 + FULL_DECODE_ONLY +
kv_transfer_config(kv_both)

---------

Signed-off-by: chenchuw886 <chenchuw@huawei.com>
Co-authored-by: chenchuw886 <chenchuw@huawei.com>
2026-04-12 21:52:54 +08:00
SparrowMu
c1f323ee46 [Doc] Add new intro to MiniMax-M2.5/M2.7 (#8169)
### What this PR does / why we need it?

1. This PR cherry pick commit that contains current best performance at
3.5k/1.5k and 128k/1k from main to 0.18.0 branch.
2. This PR introduce MiniMax-M2.7 0day information to users.
3. To finish previous step we also changes MiniMax doc name from
MiniMax-M2.5.md to MiniMax-M2.md


---------

Signed-off-by: limuyuan <limuyuan3@huawei.com>
Co-authored-by: limuyuan <limuyuan3@huawei.com>
2026-04-12 21:45:07 +08:00
herizhen
0d1424d81a [Doc][Misc] Comprehensive documentation cleanup and grammatical fixes (#8073)
What this PR does / why we need it?
This pull request performs a comprehensive cleanup of the vLLM Ascend
documentation. It fixes numerous typos, grammatical errors, and phrasing
issues across community guidelines, developer documents, hardware
tutorials, and feature guides. Key improvements include correcting
hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code
examples (removing duplicate flags and trailing commas), and improving
the clarity of technical explanations. These changes are necessary to
ensure the documentation is professional, accurate, and easy for users
to follow.

Does this PR introduce any user-facing change?
No, this PR contains documentation-only updates.

How was this patch tested?
The changes were manually reviewed for accuracy and grammatical
correctness. No functional code changes were introduced.

---------

Signed-off-by: herizhen <1270637059@qq.com>
Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
2026-04-09 15:37:57 +08:00
yydyzr
8ce4cfdae7 [Doc][Misc][v0.18.0] Add GLM5 to supported model list and update deployment document for GLM5 (#7963)
### What this PR does / why we need it?
1. Add version notes for GLM5.
2. Add paramter modification for GLM5.
3. Add GLM5 to supported model list.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.18.0
- vLLM main:
35141a7eed

---------

Signed-off-by: yydyzr <liuyuncong1@huawei.com>
Signed-off-by: Zhu Jiyang <zhujiyang2@huawei.com>
Co-authored-by: Zhu Jiyang <zhujiyang2@huawei.com>
2026-04-03 10:15:39 +08:00
shaopeng-666
3218eb9fe1 [DOC]update Qwen3.5 user guide (#7934)
This pr cherry pick from #7866. Update the model user guide

---------
Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
2026-04-02 22:09:00 +08:00
pz1116
14411e911e [Doc][0.18.0][KV Pool]add mooncake rdma timeout (#7784)
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
- Add `default_kv_lease_ttl` to the `mooncake.json` example in the KV
Pool guide.
- Document `default_kv_lease_ttl` semantics and clarify that it should
be larger than `ASCEND_CONNECT_TIMEOUT` and `ASCEND_TRANSFER_TIMEOUT`.
- Add `HCCL_RDMA_TIMEOUT` explanation for Mooncake RDMA retransmission
timeout, including the recommended constraint note.
- Add `HCCL_RDMA_TIMEOUT=16` to relevant KV Pool environment setup
examples for consistency.

<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

---------

Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>
2026-03-31 20:17:03 +08:00
zzzzwwjj
a40eee2ba1 [feat] support dispatch_v2/combine_v2 hierarchy communication (#7698)
### What this PR does / why we need it?

This PR adds support for hierarchical communication for `dispatch_v2`
and `combine_v2` MoE operations. This is achieved by introducing a new
configuration `enable_mc2_hierarchy_comm`. When enabled, the
communication algorithm is set to "hierarchy", which support mc2 op comm
between two super pod.

The changes include:
- Adding `enable_mc2_hierarchy_comm` to `AscendConfig`.
- Modifying `TokenDispatcherWithMC2` to pass `comm_alg: "hierarchy"` to
the underlying `torch_npu` ops when the new config is enabled.
- Adding validation to ensure that this feature is only used with
compatible PTA/CANN versions and is not used with the conflicting
`fused_mc2` op.
- Updating `is_hierarchical_communication_enabled` to respect the new
configuration flag.

### Does this PR introduce _any_ user-facing change?

Yes, this PR introduces a new user-facing configuration option
`enable_mc2_hierarchy_comm` in `additional_config` to enable
hierarchical communication for MoE.

### How was this patch tested?

- vLLM version: v0.18.0

Signed-off-by: zzzzwwjj <1183291235@qq.com>
2026-03-27 09:20:16 +08:00
Marck
17da96658f [ModelLoader][Feature] Add rfork support for fast model loading (#7392)
### What this PR does / why we need it?
Support an new load format: RFORK

For implementation details of this feature, please refer to #7441


### Does this PR introduce _any_ user-facing change?

add an new options for load-format: rfork

e.g.
```bash
vllm serve /workspace/models/Qwen3-8B --load-format rfork
```

### How was this patch tested?

- vLLM version: v0.17.0
- vLLM main:
4034c3d32e

Signed-off-by: Marck <1412354149@qq.com>
2026-03-25 16:40:30 +08:00
rjg-lyh
d1a83a72f7 [doc] add enable_sparse_c8 option in configuration options (#7600)
### What this PR does / why we need it?
This PR adds enable_sparse_c8 option in configuration options

- vLLM version: v0.18.0
- vLLM main:
ed359c497a

Signed-off-by: rjg-lyh <1318825571@qq.com>
2026-03-24 19:36:34 +08:00
realliujiaxu
5d12446573 [Feat][SP] Suport SP for VL MoE models (#7044)
### What this PR does / why we need it?

2nd PR for https://github.com/vllm-project/vllm-ascend/issues/5712,
extend SP to VL MoE models.


### Does this PR introduce _any_ user-facing change?
remove `sp_threshold` in additional config and reuse `sp_min_token_num`
from vLLM.


### How was this patch tested?
- Model: Qwen3-VL-30B-A3B, 
- TP4 DP2
- 100 reqs
- max concurrency 1

| Seq length | Mean TTFT (ms) main | Mean TTFT (ms) this PR |
|------------|---------------------|------------------------|
| 4k         | 429.40               | 323.3                  |
| 16k        | 1297.01              | 911.74                |

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
2026-03-24 17:16:00 +08:00
Zetong Li
a253235a59 [Doc] Add note for unsupported PCP + FULL (#7559)
### What this PR does / why we need it?
This PR aims to add note in doc that FULL mode is not supported in PCP
scenario.

Signed-off-by: Zetong Li <slippersss@126.com>
2026-03-23 17:34:51 +08:00
aipaes
5e65062973 [doc] Fix issues in the GLM4.7 documentation (#7457)
### What this PR does / why we need it?
Fix issues in the GLM4.7 documentation and add some missing
explanations.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

document test


- vLLM version: v0.17.0
- vLLM main:
8a680463fa

---------

Signed-off-by: zjks98 <zhangjiakang4@huawei.com>
Co-authored-by: zjks98 <zhangjiakang4@huawei.com>
2026-03-19 16:42:59 +08:00
pz1116
6fc190b44a [Doc][KV Pool]Revision KV Pool User Guide [2/2] (#7456)
### What this PR does / why we need it?
Revise the KV Pool user guide:
4. Revise parameters for Memcache for better clarity, at notification
that currently heterogeneous protocol setting is not supported (e.g.
enable `device_rdma` and `device_sdma` at the same time, a example
scenario would be data transfer by memcache across different super pods)
5. Modify the condition for Mooncakestore warmup, warmup is now needed
only when `ASCEND_BUFFER_POOL` is enabled.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.17.0
- vLLM main:
8a680463fa

---------

Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>
Co-authored-by: Chao Lei <leichao139636@163.com>
2026-03-19 16:17:34 +08:00
pz1116
3effc4bc70 [Doc][KV Pool]Revision KV Pool User Guide (#7434)
### What this PR does / why we need it?
Revise the KV Pool user guide:
1. Revise Mooncake environment variables and kvconnector extra configs.
2. Delete `use_ascend_direct` in kv connector extra config as it is
deprecated
3. Delete `kv_buffer_device` and `kv_rank` in P2P mooncake config
4. Unifies default `max-model-len` and `max-num-batch-tokens` in
examples given.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.17.0
- vLLM main:
4497431df6

---------

Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>
Co-authored-by: Chao Lei <leichao139636@163.com>
2026-03-19 10:13:13 +08:00
SILONG ZENG
adc57c5951 [release] Add GLM5 known issue for 2-node PD mixed deployment (#7436)
### What this PR does / why we need it?
Documented an issue in the 2-node PD mixed deployment scenario where
inference may hang when concurrency exceeds 8.(GLM5)

Noted that the issue has been fixed in PR:
- #7235 
- #7290.
---------
Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2026-03-18 10:03:18 +00:00
liuhy1213-cell
58725b8b24 [doc] add Prefill-Decode Disaggregation doc for GLM5.md (#7300)
### What this PR does / why we need it?
add Prefill-Decode Disaggregation doc for GLM5.md
w8a8  65k-1.5k 
Concurrency: 80 
prefixcache: 90%
tps: 2054

- vLLM version: v0.17.0

- vLLM main:
4034c3d32e
---------
Signed-off-by: liuhaiyang27 <liuhaiyang27@huawei.com>
Co-authored-by: liuhaiyang27 <liuhaiyang27@huawei.com>
2026-03-18 17:00:31 +08:00
Mengqing Cao
e20f0b1a0d [ReleaseNote] Add release note for v0.17.0rc1 (#7240)
### What this PR does / why we need it?
This pull request adds the release notes for `v0.17.0rc1`. It also
updates version numbers across various documentation files, including
`README.md`, `README.zh.md`,
`docs/source/community/versioning_policy.md`, and `docs/source/conf.py`
to reflect the new release.

- vLLM version: v0.17.0
- vLLM main:
4034c3d32e
2026-03-15 22:47:47 +08:00
Junyuan
6852a2e267 [feat] add LMCacheAscendConnector (#6882)
### What this PR does / why we need it?

LMCache-Ascend is LMCache's solution on the Ascend platform and one of
the KVCache pooling solutions for Ascend. We hope to integrate
LMCache-Ascend into the vLLM-Ascend community as one of the official
KVCache pooling solutions for vLLM-Ascend.

We added a new LMCacheAscendConnector in vLLM-Ascend and registered it.

### Does this PR introduce _any_ user-facing change?

Users can specify the kvconnector using `--kv-transfer-config`, allowing
them to freely choose which kvconnector to use, without any user-facing
change.

### How was this patch tested?

Test by specifying `--kv-transfer-config
'{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'`

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: chloroethylene <jjysama@gmail.com>
2026-03-13 17:41:35 +08:00
shaopeng-666
592661e787 [Doc] EPD doc and load-balance proxy example (#6221)
Add EPD doc and load-balance proxy example

- vLLM version: v0.14.0
- vLLM main:
d68209402d

---------

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
2026-03-12 16:17:17 +08:00
Canlin Guo
a78a00e0b1 [Doc][ReleaseNote] Add release notes for v0.16.0rc1 (#7067)
Add release notes for v0.16.0rc1

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: Canlin Guo <961750412@qq.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2026-03-10 22:45:05 +08:00
Frank Chen
14c71b19e1 [Doc][CPU binding] Add user/developer guide for CPU binding (#7045)
### What this PR does / why we need it?
This PR adds comprehensive documentation for the CPU binding feature on
Ascend NPUs. It includes:

- A detailed developer guide
(`docs/source/developer_guide/feature_guide/cpu_binding.md`) covering
the design, internal logic, allocation examples, and troubleshooting for
the CPU binding mechanism.
- A concise user guide
(`docs/source/user_guide/feature_guide/cpu_binding.md`) explaining the
core concepts, usage, and common issues for end-users.
- An update to `additional_config.md` to use consistent terminology for
binding strategies (`global-slicing` and `topo-affinity`).

This documentation is needed to help both developers and users
understand, use, and debug the CPU binding feature, which is critical
for performance on ARM+Ascend platforms.

### Does this PR introduce _any_ user-facing change?
No. This is a documentation-only update.

### How was this patch tested?
The documentation has been reviewed for clarity and technical accuracy.
The examples and descriptions align with the implementation in
`vllm_ascend/cpu_binding.py`.

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: chenchuw886 <chenchuw@huawei.com>
Signed-off-by: c00818886 <chenchuwei@huawei.com>
Co-authored-by: chenchuw886 <chenchuw@huawei.com>
2026-03-10 15:59:31 +08:00
NJX
bb7ed759d4 [Doc] Fix broken chunked-prefill URL in supported features (#6963)
## What this PR does / why we need it?

Fixes the broken URL for chunked-prefill in the supported features
documentation page.

The chunked prefill documentation URL was moved from
`performance/optimization.html` to `configuration/optimization.html` in
upstream vLLM docs. This PR updates the link to point to the correct
location.

**Before**:
https://docs.vllm.ai/en/stable/performance/optimization.html#chunked-prefill
(404)
**After**:
https://docs.vllm.ai/en/stable/configuration/optimization.html#chunked-prefill
(working)

## Does this PR introduce _any_ user-facing change?

Yes - fixes a broken documentation link that users encounter when
clicking 'Chunked Prefill' in the supported features page.

## How was this patch tested?

- Verified the new URL resolves correctly
- Documentation change only

Closes #4217
- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: NJX-njx <3771829673@qq.com>
2026-03-10 10:10:07 +08:00
pz1116
a7820d20f4 [Doc][KV Pool]Update Memcache local service config example: increase default world size to 256 and update description (#7025)
### What this PR does / why we need it?
Update Memcache local service config example: increase default world
size to 256 and update the description for better clarity.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>
2026-03-06 10:23:55 +08:00
LI SHENGYONG
ccd00798f3 [EPLB] Display the expert hotness comparison before and after eplb. (#6877)
### What this PR does / why we need it?
To intuitively show the effect of the eplb algorithm, we print the
expert heat before and after eplb.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

![Snipaste_2026-02-28_17-23-42](https://github.com/user-attachments/assets/db1dadd1-cf96-44da-af34-57d41ccf412f)


- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
2026-03-06 09:53:29 +08:00
fems14
ae394767d4 【main】ADXL/HIXL supports FabricMem Mode (#6806)
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

---------

Signed-off-by: fems14 <1804143737@qq.com>
2026-03-05 21:04:11 +08:00
wangxiyuan
13777bf3f0 [Spec Decode]clean up spec decode interface (#6947)
This pull request refactors the speculative decoding proposer interface
to align with upstream vLLM, removing the local `Proposer` interface and
renaming methods to `propose`.

This is the first step. In the future we should remove the class
register and just add few Ascend specified method once the arch in vLLM
is ready.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-03-05 14:30:10 +08:00
Ronald
77e009d9fc [Feature] Add docs of batch invariance and make some extra operators patch (#6910)
### What this PR does / why we need it?

This PR add docs of batch invariance and make some extra operators
according to validation result.
please see https://github.com/vllm-project/vllm-ascend/issues/5487 to
track progress.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
2026-03-05 09:12:40 +08:00
zzzzwwjj
f19f7b1fe2 [doc] fix supported_models (#6930)
### What this PR does / why we need it?

Add Experimental supported model/feature for supported_models.md

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: zzzzwwjj <1183291235@qq.com>
2026-03-03 09:47:50 +08:00
whx
16c879cdf7 [Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518)
### What this PR does / why we need it?
Add muls_add triton kernel with related fusion pass. What's more, this
PR refactors `AscendCompilationConfig` and delete `NpugraphExConfig`.

### Does this PR introduce _any_ user-facing change?
None

### How was this patch tested?
CI passed with new added test.


- vLLM version: v0.13.0
- vLLM main:
45c1ca1ca1

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
2026-03-02 17:54:25 +08:00
wangxiyuan
3d563292f3 clean 0.15.0 support (#6852)
Clean up vllm 0.15.0 related code

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-02-28 09:20:57 +08:00
wangxiyuan
9cd0d6c33d [Doc][Misc] Update release notes for v0.15.0rc1 (#6859)
### What this PR does / why we need it?

This PR updates the release notes for `v0.15.0rc1` to:
- Mark the `310P MoE and W8A8 Support` feature as experimental.
- Add a note for `Kimi-K2.5 Model Support` clarifying that it has known
issues in vLLM 0.15.0 and requires manual patching to work correctly.

### Does this PR introduce _any_ user-facing change?

No, this is a documentation-only update.

### How was this patch tested?

N/A (documentation change).

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-02-27 22:35:09 +08:00
wangxiyuan
3d43ed997e add release note for 0.15.0rc1 (#6839)
Add release note for 0.15.0rc1

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-02-27 11:55:55 +08:00
wangxiyuan
a95c0b8b82 [Doc] fix the nit in docs (#6826)
Refresh the doc, fix the nit in the docs

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-02-27 11:50:27 +08:00
realliujiaxu
5def28dcd3 [Feat]support sequence parallelism by pass for VL models (#5632) 2026-02-27 08:27:41 +08:00
Frank Chen
3da2ba22eb [Platform] Enable ARM-only CPU binding with NUMA-balanced A3 policy and update docs/tests (#6686)
### What this PR does / why we need it?

- Keeps enable_cpu_binding default on, but skips binding on non‑ARM CPUs
inside bind_cpus, with a clear log.
- Uses a table-driven binding policy: A3 uses NUMA‑balanced binding;
other device types use NUMA‑affinity binding.
- Updates docs to reflect the exact behavior and adds/updates unit tests
for the new logic.

### Does this PR introduce _any_ user-facing change?

- Yes. CPU binding is now enabled by default via additional_config, and
documented in the user guide.
- CPU binding behavior differs by device type (A3 vs. others).

### How was this patch tested?

Added/updated unit tests:

test_cpu_binding.py
1.   test_binding_mode_table covers A2 vs A3 binding mode mapping.
2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when
affinity info is missing.
3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch
detection.
4.   test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus.

test_worker_v1.py
1. Updated mocks for enable_cpu_binding default True to align with new
config default.

- vLLM version: v0.14.1
- vLLM main: d7de043

---------

Signed-off-by: chenchuw886 <chenchuw@huawei.com>
Co-authored-by: chenchuw886 <chenchuw@huawei.com>
2026-02-25 11:15:14 +08:00
zzzzwwjj
5c8ab7af39 [main]update release note & support matrix (#6759)
### What this PR does / why we need it?

Update release note & support matrix to add experimental tag for
features and models.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

0.13.0 branch: https://github.com/vllm-project/vllm-ascend/pull/6751

Signed-off-by: zzzzwwjj <1183291235@qq.com>
2026-02-24 17:39:35 +08:00
SILONG ZENG
e2237819a9 [CI]Fixed the spell check function in typos.toml (#6753)
### What this PR does / why we need it?
The incorrect regular expression syntax `.*[UE4M3|ue4m3].*` actually
ignores all words containing any of the following characters: `u, e, 4,
m, 3, |`

```yaml
extend-ignore-identifiers-re = [".*Unc.*", ".*_thw",
    ".*UE8M0.*", ".*[UE4M3|ue4m3].*", ".*eles.*", ".*fo.*", ".*ba.*",
    ".*ot.*", ".*[Tt]h[rR].*"]
```
===fix===>
```yaml
extend-ignore-identifiers-re = [".*Unc.*", ".*_thw",
    ".*UE8M0.*", ".*(UE4M3|ue4m3]).*", ".*eles.*", ".*fo.*", ".*ba.*",
    ".*ot.*", ".*[Tt]h[rR].*"]
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

Signed-off-by: MrZ20 <2609716663@qq.com>
2026-02-14 11:57:26 +08:00
Cao Yi
6de207de88 [main][Docs] Fix typos across documentation (#6728)
## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
2026-02-13 15:50:05 +08:00