### What this PR does / why we need it?
Add release note for v0.13.0rc1
- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it?
Instructions for using permissions added to docker
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: lilinsiman <lilinsiman@gmail.com>
### What this PR does / why we need it?
As support for the mooncake connector is now available, the llmdatadist
connector is no longer being maintained, so the llmdatadist-related
files need to be retired.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
### What this PR does / why we need it?
Correct the mistake in information documents
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
---------
Signed-off-by: lilinsiman <lilinsiman@gmail.com>
### What this PR does / why we need it?
Corrected the errors in the information
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: lilinsiman <lilinsiman@gmail.com>
Added instructions for resolving 'invalid tar header' error on Kylin OS with an ARM64 architecture on Atlas300I hardware during docker
pull, including steps for offline loading of docker images.
---
### What this PR does / why we need it?
The primary motivation for this PR is to address a critical `docker
pull` failure that occurs on specific, yet important, enterprise
environments. Specifically, when operating on **Kylin OS (麒麟操作系统) with
an ARM64 architecture on Atlas300I hardware**, users frequently
encounter an `archive/tar: invalid tar header` error, which completely
blocks the setup process. This issue has been consistently reproduced,
with multiple retries failing with the same error, confirming that it is
a persistent environmental problem rather than a transient network
issue.
<img width="2060" height="525" alt="image"
src="https://github.com/user-attachments/assets/6c1c5728-de27-476f-8df4-723564fc290b"
/>
This guide provides a robust, step-by-step workaround using an
offline-loading method (`docker save` on a host machine and `docker
load` on the target machine). This solution is crucial for enabling
users on this platform to use vLLM.
This contribution does not directly fix an existing issue number, but it
proactively solves a significant environmental and usability problem for
a growing user base.
### Does this PR introduce _any_ user-facing change?
No.It does not alter any code, APIs, interfaces, or existing behavior of
the vLLM project.
### How was this patch tested?
The instructions and troubleshooting steps in this guide were validated
through a real-world, end-to-end test case on the my hardware and OS.
The testing process was as follows:
1. **Problem Reproduction**: An attempt was made to directly `docker
pull` the `vllm-ascend:v0.10.0rc1-310p` image on a target machine
running Kylin OS (ARM64). The `invalid tar header` failure was
successfully and consistently reproduced, confirming the existence of
the problem.
2. **Solution Implementation**: The workaround detailed in the guide was
executed:
* On a separate host machine (Ubuntu x86_64), the image was successfully
pulled using the `--platform linux/arm64` flag.
* The image was then saved to a `.tar` archive using `docker save`.
* The `.tar` archive was transferred to the target Kylin OS machine.
* The image was successfully loaded from the archive using `docker load
-i ...`.
3. **End-to-End Validation**: After loading the image, the vLLM
container was launched on the target machine following the instructions
in the guide. Both online inference (via `curl` to the API server) and
offline inference (via the Python script) were executed successfully,
confirming that the entire workflow described in the document is
accurate and effective.
Since this is a documentation-only change based on a validated workflow,
no new unit or integration tests were added to the codebase.
- vLLM version: v0.11.0rc3
- vLLM main:
83f478bb19
---------
Signed-off-by: Liwx <liweixuan1014@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Many FAQ content is out of date, this PR refresh it.
- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Pin version that can stable running 310I Duo to vllm-ascend v0.10.0rc1.
### What this PR does / why we need it?
Since PR #2614 310I Duo been broken. Although we are currently working
on fixing the issue with the 310I Duo being broken, there is no
confirmed timeline for a fix in the short term. To allow users to
quickly find a working version instead of going back and forth on trial
and error, this PR fixes the version in the 310I Duo guide.
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
NA
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
---------
Signed-off-by: leo-pony <nengjunma@outlook.com>
### What this PR does / why we need it?
add faqs:install vllm-ascend will overwrite existing torch-npu
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
### What this PR does / why we need it?
1. Solved the issue where sizes capture failed for the Qwen3-32b-int8
model when aclgraph, dp1, and tp4 were enabled.
2. Added the exception thrown when sizes capture fails and provided a
solution
3. Add this common problem to the FAQ doc
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut
- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0
Signed-off-by: lilinsiman <lilinsiman@gmail.com>
### What this PR does / why we need it?
Add Docker export/import guide for air-gapped environments
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
NA
- vLLM version: v0.10.0
- vLLM main:
d16aa3dae4
Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
### What this PR does / why we need it?
- update determinitic calculation
- update support device
### Does this PR introduce _any_ user-facing change?
- Users should update ray and protobuf when using ray as distributed
backend
- Users should change to use `export HCCL_DETERMINISTIC=true` when
enabling determinitic calculation
### How was this patch tested?
N/A
- vLLM version: v0.10.0
- vLLM main:
ea1292ad3e
Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it?
Release note of v0.10.0rc1
- vLLM version: v0.10.0
- vLLM main:
8e8e0b6af1
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it?
Add release note for v0.9.1rc2
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
- vLLM version: v0.10.0
- vLLM main:
c494f96fbc
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Add a new FAQ, if users re-install vllm-ascend with pip, the `build`
folder should be removed first
---------
Signed-off-by: rjg-lyh <1318825571@qq.com>
Signed-off-by: weiguihua <weiguihua2@huawei.com>
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
### What this PR does / why we need it?
Change not to no in faqs.md.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Local Test
Signed-off-by: xleoken <xleoken@163.com>
### What this PR does / why we need it?
Improve assertion on Graph mode with MLA.
When running deepseek with graph mode, the fused MLA op only support
`numHeads / numKvHeads ∈ {32, 64, 128}`, thus we improve the assertion
info here to avoid users confused with this.
### Does this PR introduce _any_ user-facing change?
Adjusting tp size is required when running deepseek-v3/r1 with graph
mode. deepseek-v2-lite is not supported in graph mode.
### How was this patch tested?
Test locally as the CI machine could not run V3 due to the HBM limits.
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
1. Add `__init__.py` for vllm_ascend/compilation to make sure it's a
python module
2. Fix model runner bug to keep the same with vllm
3. Add release note for 0.9.0rc2
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
1. Fix format check error to make format.sh work
2. Add codespell check CI
3. Add the missing required package for vllm-ascend.
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
add notes for OOM in faqs.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
---------
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Add 0.8.5rc1 release note and bump vllm version to v0.8.5.post1
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
---------
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Sometimes, user install a dev/editable version of vllm. In this case, we
should make sure vllm-ascend works as well.
This PR add a new env `VLLM_VERSION`. It's used for developers who edit
vllm. In this case, developers should set thie env to make sure which
vllm version is installed and used.
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This PR added patch module for vllm
1. platform patch: the patch will be registered when load the platform
2. worker patch: the patch will be registered when worker is started.
The detail is:
1. patch_common: patch for main and 0.8.4 version
4. patch_main: patch for main verison
5. patch_0_8_4: patch for 0.8.4 version