### What this PR does / why we need it?
fix deepseekv3.1 doc to recomand developers to use Mooncake instead of LLMDatadist
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->
### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->
Signed-off-by: AiChiMomo <1092626063@qq.com>
### What this PR does / why we need it?
add hyperlink
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut
- vLLM version: v0.11.2
---------
Signed-off-by: herizhen <you@example.com>
Co-authored-by: herizhen <you@example.com>
### What this PR does / why we need it?
Fix DeepSeek-V3.2-Exp doc, add docker command.
- vLLM version: v0.11.2
Signed-off-by: menogrey <1299267905@qq.com>
Ascend scheduler was added for non chunk prefill case before, since that
the npu ops didn't work well with chunked prefill.
Now the ops with chunked prefill work better, it's time to remove the
ascend scheduler to use vLLM default scheduler.
- vLLM version: v0.11.2
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
1.In short, we renamed the existing MooncakeStoreConnector to
AscendStoreConnector and extracted the storage engine interaction logic
into a new Backend class.
Associated RFC:https://github.com/vllm-project/vllm-ascend/issues/4329
2.Fixed the issue where the number of input parameters for the connector
was incorrect, introduced in vllm 0.11.2
### Does this PR introduce _any_ user-facing change?
change MooncakeStoreConnector to AscendStoreConnector
### How was this patch tested?
- vLLM version: v0.11.2
---------
Signed-off-by: fems14 <1804143737@qq.com>
### What this PR does / why we need it?
This PR introduces support for adding custom CANN `aclnn` ops to
`vllm-ascend`, allowing users to define and use their own custom
operators.
Key changes include:
- Building and installing custom ops into the `vllm-ascend`-specified
directory
- Binding the `aclnn` op interface to the `torch.ops._C_ascend` module
- Enabling invocation of these ops within `vllm-ascend`
This PR includes a sample custom op:
`aclnnGroupedMatmulSwigluQuantWeightNzTensorList`, which is adapted from
the CANN operator
[`aclnnGroupedMatmulSwigluQuantWeightNZ`](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/API/aolapi/context/aclnnGroupedMatmulSwigluQuantWeightNZ.md).
Its input parameters `weight` and `weight_scale` now accept
`list[torch.Tensor]` (i.e., `at::TensorList`).
### Does this PR introduce _any_ user-facing change?
No.
- vLLM version: v0.11.2
---------
Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>
### What this PR does / why we need it?
Delete equals sign in doc
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut
- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2
---------
Signed-off-by: herizhen <you@example.com>
Co-authored-by: herizhen <you@example.com>
### What this PR does / why we need it?
Add readme for PD separation
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
---------
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
### What this PR does / why we need it?
While using the LLM Compressor quantization tool from the VLLM community
to generate quantized weights, the VLLM Ascend engine needs to be
adapted to support the compressed tensors quantization format.
1. Add AscendCompressedTensorsConfig to replace CompressedTensorsConfig
in vllm.
2. Support CompressedTensorsW8A8 static weight.
- weight: per-channel, int8, symmetric; activation: per-tensor, int8,
symmetric.
4. Support CompressedTensorsW8A8Dynamic weight.
- weight: per-channel, int8, symmetric; activation: per-token, int8,
symmetric, dynamic.
5. Modify the override_quantization_method in AscendQuantConfig.
Co-authored-by: taoqun110 taoqun@huawei.com
Co-authored-by: chenxi-hh chen464822955@163.com
- vLLM version: v0.11.2
---------
Signed-off-by: LHXuuu <scut_xlh@163.com>
Signed-off-by: chenxi-hh <chen464822955@163.com>
Signed-off-by: chenxi-hh <32731611+chenxi-hh@users.noreply.github.com>
Co-authored-by: chenxi-hh <chen464822955@163.com>
Co-authored-by: chenxi-hh <32731611+chenxi-hh@users.noreply.github.com>
### What this PR does / why we need it?
Upgrade cann to 8.3rc2
### Does this PR introduce _any_ user-facing change?
Yes, docker image will use 8.3.RC2
- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2
---------
Signed-off-by: MrZ20 <2609716663@qq.com>
### What this PR does / why we need it?
When running 'python example.py',connection issues often occur.The
solution is to comment out the first line the code.
Complete the specific names of machines A2 and A3.
Standardize document format,a space should be added after the colon.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut
- vLLM version: v0.11.2
---------
Signed-off-by: herizhen <you@example.com>
Co-authored-by: herizhen <you@example.com>
### What this PR does / why we need it?
The "g" at the beginning of the current sentence is redundant and needs
to be deleted
"MindIE Turbo" is no longer required to be displayed.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut
- vLLM main:
2918c1b49c
---------
Signed-off-by: herizhen <you@example.com>
Co-authored-by: herizhen <you@example.com>
### What this PR does / why we need it?
vllm-ascend need to dump data during model execution to debug some
precision problems, here msprobe provide the corresponding abilities, so
msprobe will join vllm-ascend to make debug easier
### Does this PR introduce _any_ user-facing change?
```
'dump_config': '/path/to/config.json'
```
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
---------
Signed-off-by: Tjh-UKN <2559659915@qq.com>
### What this PR does / why we need it?
The first letter of the English title should be capitalized
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
Signed-off-by: herizhen <you@example.com>
Co-authored-by: herizhen <you@example.com>
### What this PR does / why we need it?
When we are using `Ascend scheduler`, the param `max_num_batched_tokens`
should be larger than `max_model_len`, otherwise, will encountered the
follow error:
```shell
Value error, Ascend scheduler is enabled without chunked prefill feature. Argument max_num_batched_tokens (4096) is smaller than max_model_len (32768). This effectively limits the maximum sequence length to max_num_batched_tokens and makes vLLM reject longer sequences. Please increase max_num_batched_tokens or decrease max_model_len. [type=value_error, input_value=ArgsKwargs((), {'model_co...g': {'enabled': True}}}), input_type=ArgsKwargs]
```
### Does this PR introduce _any_ user-facing change?
Users/Developers who running the model according to the
[tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/multi_node.html),
the parameters can be specified correctly.
### How was this patch tested?
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
add single node PD disaggregation instructions for Qwen 2.5VL model.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
---------
Signed-off-by: mazhixin <mazhixin7@huawei.com>
Signed-off-by: mazhixin000 <mazhixinkorea@163.com>
Co-authored-by: mazhixin <mazhixin7@huawei.com>
### What this PR does / why we need it?
This PR adds a load-balance dp proxy server which can be used in
external DP scenario without Disaggregated-Prefill enabled. What's more,
add a doc of external dp and load-balance dp proxy server.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
See the new doc.
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
---------
Signed-off-by: whx-sjtu <2952154980@qq.com>
### What this PR does / why we need it?
Redundant experts bugfix
### Does this PR introduce _any_ user-facing change?
After configuring the path for experts_map, users do not need to
configure iinit_redundancy_expert.
### How was this patch tested?
The accuracy of EPLB was tested with and without the use of redundant
experts.
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
---------
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
### What this PR does / why we need it?
Add some of the pitfalls I ran into when using AISBench to test
multi-modal models.
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
---------
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
### What this PR does / why we need it?
Add the parameter "register_buffer" for PD Aggregated Scenario in the
given example.
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>
### What this PR does / why we need it?
pd proxy support ipv6, mooncake connector check whether the IPv6 address
is used and notify the user.
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
---------
Signed-off-by: liziyu <liziyu16@huawei.com>
### What this PR does / why we need it?
Correct the mistake in information documents
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
---------
Signed-off-by: lilinsiman <lilinsiman@gmail.com>
### What this PR does / why we need it?
To support the data collection capabilities of the msServiceProfiler on
vLLM-ascned framework and enable customization of data collection points
via configuration file, a default profiling configuration has been added
to vllm-ascend, facilitating debugging and optimization for developers
and users.
### Does this PR introduce _any_ user-facing change?
None
### How was this patch tested?
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: minghangc <29514143@qq.com>
Drop VLLM_USE_V1 usage. This env has been removed from vLLM already.
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Use pip installation in installation doc and change related doctest to
validate.
### Does this PR introduce _any_ user-facing change?
No, doc only
### How was this patch tested?
Doctest related CI passed
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
v0.11.0rc1 will introduce w4a4 quantization feature, so add this
tutorial.
### Does this PR introduce _any_ user-facing change?
No
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: 22dimensions <waitingwind@foxmail.com>
### What this PR does / why we need it?
First-generation model:uses"LLama",subsequent models use"Llama"
The second"L"here should be lowercase.Other instances of "LLama"on
this page should be corrected accordingly
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: herizhen <you@example.com>
Co-authored-by: herizhen <you@example.com>
### What this PR does / why we need it?
Closes#3728, #3657.
The main branch is now aligned with the vllm `releases/v0.11.1` branch,
which no longer supports `Python 3.9`. Check it
[here](https://github.com/vllm-project/vllm/blob/releases/v0.11.1/pyproject.toml).
### Does this PR introduce _any_ user-facing change?
The newest version of vllm-ascend don't support Python 3.9.
### How was this patch tested?
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
### What this PR does / why we need it?
Remove extra MLAPO installation step for DeepSeek-V3.2.
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: menogrey <1299267905@qq.com>
### What this PR does / why we need it?
Corrected the errors in the information
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: lilinsiman <lilinsiman@gmail.com>
### What this PR does / why we need it?
Add model feature matrix table.
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: menogrey <1299267905@qq.com>
### What this PR does / why we need it?
Since we have upgraded to CANN 8.3rc1, we will no longer use the
privately maintained Mooncake repository, but instead use the official
release released by Mooncake:
https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2 .
Next step: this is only a temporary solution. We will integrate mooncake
into the vllm-ascend base image later for easier use. see
https://github.com/vllm-project/vllm-ascend/pull/3989
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
- global_segment_size and local_buffer_size use constants for unified
management.
- Newly added support for input formats ending with GB, MB, KB, and B,
while being compatible with existing input methods.
### Does this PR introduce _any_ user-facing change?
- Users can use new input methods
- The documentation has also been modified
### How was this patch tested?
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
---------
Signed-off-by: 李子琦 <liziqi_ing@163.com>
### What this PR does / why we need it?
Add adxl timeout parameter in kv pool user guide, avoiding timeout error
when initializing connections between devices.
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>