xc-llm-ascend/docs/source/faqs.md

# FAQs

## Version Specific FAQs

- [[v0.17.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/7173)
- [[v0.13.0] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6583)

## General FAQs

### 1. What devices are currently supported?

Currently, **ONLY** Atlas A2 series (Ascend-cann-kernels-910b), Atlas A3 series (Atlas-A3-cann-kernels) and Atlas 300I (Ascend-cann-kernels-310p) series are supported:

- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
- Atlas 800I A2 Inference series (Atlas 800I A2)
- Atlas A3 Training series (Atlas 800T A3, Atlas 900 A3 SuperPoD, Atlas 9000 A3 SuperPoD)
- Atlas 800I A3 Inference series (Atlas 800I A3)
- [Experimental] Atlas 300I Inference series (Atlas 300I Duo).
- [Experimental] Currently for 310I Duo the stable version is vllm-ascend v0.10.0rc1.

Below series are NOT supported yet:

- Atlas 200I A2 (Ascend-cann-kernels-310b) unplanned yet
- Ascend 910, Ascend 910 Pro B (Ascend-cann-kernels-910) unplanned yet

From a technical view, vllm-ascend supports devices if torch-npu is supported. Otherwise, we have to implement it by using custom ops. We also welcome you to join us to improve together.

### 2. How to get our docker containers?

You can get our containers at `Quay.io`, e.g., [<u>vllm-ascend</u>](https://quay.io/repository/ascend/vllm-ascend?tab=tags) and [<u>cann</u>](https://quay.io/repository/ascend/cann?tab=tags).

If you are in China, you can use `daocloud` or some other mirror sites to accelerate your downloading:

```bash
# Replace with tag you want to pull
TAG=v0.9.1
docker pull m.daocloud.io/quay.io/ascend/vllm-ascend:$TAG
# or
docker pull quay.nju.edu.cn/ascend/vllm-ascend:$TAG
```

#### Load Docker Images for offline environment

If you want to use container image for offline environments (no internet connection), you need to download container image in an environment with internet access:

**Exporting Docker images:**

```{code-block} bash
   :substitutions:
# Pull the image on a machine with internet access
TAG=|vllm_ascend_version|
docker pull quay.io/ascend/vllm-ascend:$TAG

# Export the image to a tar file and compress to tar.gz
docker save quay.io/ascend/vllm-ascend:$TAG | gzip > vllm-ascend-$TAG.tar.gz
```

**Importing Docker images in environment without internet access:**

```{code-block} bash
   :substitutions:
# Transfer the tar/tar.gz file to the offline environment and load it
TAG=|vllm_ascend_version|
docker load -i vllm-ascend-$TAG.tar.gz

# Verify the image is loaded
docker images | grep vllm-ascend
```

### 3. What models does vllm-ascend supports?

Find more details [<u>here</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html).

### 4. How to get in touch with our community?

There are many channels that you can communicate with our community developers / users:

- Submit a GitHub [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues?page=1).
- Join our [<u>weekly meeting</u>](https://docs.google.com/document/d/1hCSzRTMZhIB8vRq1_qOOjx4c9uYUxvdQvDsMV2JcSrw/edit?tab=t.0#heading=h.911qu8j8h35z) and share your ideas.
- Join our [<u>WeChat</u>](https://github.com/vllm-project/vllm-ascend/issues/227) group and ask your questions.
- Join our ascend channel in [<u>vLLM forums</u>](https://discuss.vllm.ai/c/hardware-support/vllm-ascend-support/6) and publish your topics.

### 5. What features does vllm-ascend V1 supports?

Find more details [<u>here</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html).

### 6. How to solve the problem of "Failed to infer device type" or "libatb.so: cannot open shared object file"?

Basically, the reason is that the NPU environment is not configured correctly. You can:

1. try `source /usr/local/Ascend/nnal/atb/set_env.sh` to enable NNAL package.
2. try `source /usr/local/Ascend/ascend-toolkit/set_env.sh` to enable CANN package.
3. try `npu-smi info` to check whether the NPU is working.

If the above steps are not working, you can try the following code in Python to check whether there are any errors:

```python
import torch
import torch_npu
import vllm
```

If all above steps are not working, feel free to submit a GitHub issue.

### 7. How vllm-ascend work with vLLM?

`vllm-ascend` is a hardware plugin for vLLM. The version of `vllm-ascend` is the same as the version of `vllm`. For example, if you use `vllm` 0.9.1, you should use vllm-ascend 0.9.1 as well. For the main branch, we ensure that `vllm-ascend` and `vllm` are compatible at every commit.

### 8. Does vllm-ascend support Prefill Disaggregation feature?

Yes, vllm-ascend supports Prefill Disaggregation feature with Mooncake backend. See the [official tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/pd_disaggregation_mooncake_multi_node.html) for example.

### 9. Does vllm-ascend support quantization method?

Currently, w8a8, w4a8, and w4a4 quantization methods are already supported by vllm-ascend.

### 10. How to run a W8A8 DeepSeek model?

Follow the [inference tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/multi_node.html) and replace the model with DeepSeek.

### 11. How is vllm-ascend tested?

vllm-ascend is tested in three aspects: functions, performance, and accuracy.

- **Functional test**: We added CI, including part of vllm's native unit tests and vllm-ascend's own unit tests. In vllm-ascend's tests, we test basic functionalities, popular model availability, and [supported features](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html) through E2E test.

- **Performance test**: We provide [benchmark](https://github.com/vllm-project/vllm-ascend/tree/main/benchmarks) tools for E2E performance benchmark, which can be easily re-run locally. We will publish a perf website to show the performance test results for each pull request.

- **Accuracy test**: We are working on adding accuracy test to the CI as well.

- **Nightly test**: we'll run full test every night to make sure the code is working.

For each release, we'll publish the performance test and accuracy test report in the future.

### 12. How to fix the error "InvalidVersion" when using vllm-ascend?

The problem is usually caused by the installation of a development or editable version of the vLLM package. In this case, we provide the environment variable `VLLM_VERSION` to let users specify the version of vLLM package to use. Please set the environment variable `VLLM_VERSION` to the version of the vLLM package you have installed. The format of `VLLM_VERSION` should be `X.Y.Z`.

### 13. How to handle the out-of-memory issue?

OOM errors typically occur when the model exceeds the memory capacity of a single NPU. For general guidance, you can refer to [vLLM OOM troubleshooting documentation](https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html#out-of-memory).

In scenarios where NPUs have limited high bandwidth memory (HBM) capacity, dynamic memory allocation/deallocation during inference can exacerbate memory fragmentation, leading to OOM. To address this:

- **Limit `--max-model-len`**: It can save the HBM usage for KV cache initialization step.

- **Adjust `--gpu-memory-utilization`**: If unspecified, the default value is `0.9`. You can decrease this value to reserve more memory to reduce fragmentation risks. See details in: [vLLM - Inference and Serving - Engine Arguments](https://docs.vllm.ai/en/latest/serving/engine_args.html#vllm.engine.arg_utils-_engine_args_parser-cacheconfig).

- **Configure `PYTORCH_NPU_ALLOC_CONF`**: Set this environment variable to optimize NPU memory management. For example, you can use `export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True` to enable virtual memory feature to mitigate memory fragmentation caused by frequent dynamic memory size adjustments during runtime. See details in [PYTORCH_NPU_ALLOC_CONF](https://www.hiascend.com/document/detail/zh/Pytorch/700/comref/Envvariables/Envir_012.html).

### 14. Failed to enable NPU graph mode when running DeepSeek

Enabling NPU graph mode for DeepSeek may trigger an error. This is because when both MLA and NPU graph mode are active, the number of queries per KV head must be 32, 64, or 128. However, DeepSeek-V2-Lite has only 16 attention heads, which results in 16 queries per KV—a value outside the supported range. Support for NPU graph mode on DeepSeek-V2-Lite will be added in a future update.

And if you're using DeepSeek-V3 or DeepSeek-R1, please make sure after the tensor parallel split, `num_heads`/`num_kv_heads` is {32, 64, 128}.

```bash
[rank0]: RuntimeError: EZ9999: Inner Error!
[rank0]: EZ9999: [PID: 62938] 2025-05-27-06:52:12.455.807 numHeads / numKvHeads = 8, MLA only support {32, 64, 128}.[FUNC:CheckMlaAttrs][FILE:incre_flash_attention_tiling_check.cc][LINE:1218]
```

### 15. Failed to reinstall vllm-ascend from source after uninstalling vllm-ascend

You may encounter the problem of C/C++ compilation failure when reinstalling vllm-ascend from source using pip. If the installation fails, use `python setup.py install` (recommended) to install, or use `python setup.py clean` to clear the cache.

### 16. How to generate deterministic results when using vllm-ascend?

There are several factors that affect output determinism:

1. Sampler method: using **greedy sampling** by setting `temperature=0` in `SamplingParams`, e.g.:

```python
from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0)
# Create an LLM.
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```

2. Set the following environment parameters:

```bash
export LCCL_DETERMINISTIC=1
export HCCL_DETERMINISTIC=true
export ATB_MATMUL_SHUFFLE_K_ENABLE=0
export ATB_LLM_LCOC_ENABLE=0
```

### 17. How to fix the error "ImportError: Please install vllm[audio] for audio support" for the Qwen2.5-Omni model？

The `Qwen2.5-Omni` model requires the `librosa` package to be installed, you need to install the `qwen-omni-utils` package to ensure all dependencies are met, run `pip install qwen-omni-utils`.
This package will install `librosa` and its related dependencies, resolving the `ImportError: No module named 'librosa'` issue and ensuring that the audio processing functionality works correctly.

### 18. How to troubleshoot and resolve size capture failures resulting from stream resource exhaustion, and what are the underlying causes?

```shell
error example in detail: 
ERROR 09-26 10:48:07 [model_runner_v1.py:3029] ACLgraph sizes capture fail: RuntimeError:
ERROR 09-26 10:48:07 [model_runner_v1.py:3029] ACLgraph has insufficient available streams to capture the configured number of sizes.Please verify both the availability of adequate streams and the appropriateness of the configured size count.
```

Recommended mitigation strategies:

1. Manually configure the compilation_config parameter with a reduced size set: '{"cudagraph_capture_sizes":[size1, size2, size3, ...]}'.
2. Employ ACLgraph's full graph mode as an alternative to the piecewise approach.

Root cause analysis:
The current stream requirement calculation for size captures only accounts for measurable factors including: data parallel size, tensor parallel size, expert parallel configuration, piece graph count, multistream-overlap shared expert settings, and HCCL communication mode (AIV/AICPU). However, numerous unquantifiable elements, such as operator characteristics and specific hardware features, consume additional streams outside of this calculation framework, resulting in stream resource exhaustion during size capture operations.

### 19. How to install custom version of torch_npu?

torch-npu will be overridden  when installing vllm-ascend. If you need to install a specific version of torch-npu, you can manually install the specified version of torch-npu after vllm-ascend is installed.

### 20. On certain systems (e.g., Kylin OS), `docker pull` may fail with an `invalid tar header` error

On certain operating systems, such as Kylin OS, you may encounter an `invalid tar header` error during the `docker pull` process:

```text
failed to register layer: ApplyLayer exit status 1 stdout: stderr: archive/tar: invalid tar header
```

This is often due to system compatibility issues. You can resolve this by using an offline loading method with a second machine.

1. On a separate host machine (e.g., a standard Ubuntu server), pull the image for the target ARM64 architecture and package it into a `.tar` file.

   ```bash
   export IMAGE_TAG=v0.10.0rc1-310p
   export IMAGE_NAME="quay.io/ascend/vllm-ascend:${IMAGE_TAG}"
   # If in China region, uncomment to use a mirror:
   # export IMAGE_NAME="m.daocloud.io/quay.io/ascend/vllm-ascend:${IMAGE_TAG}"
   
   # Pull the image for the ARM64 platform and save it
   docker pull --platform linux/arm64 "${IMAGE_NAME}"
   docker save -o "vllm_ascend_${IMAGE_TAG}.tar" "${IMAGE_NAME}"
   ```

2. Transfer the image archive

Copy the `vllm_ascend_<tag>.tar` file (where `<tag>` is the image tag you used) to your target machine

### 21. Why am I getting an error when executing the script to start a Docker container? The error message is: "operation not permitted"

When using `--shm-size`, you may need to add the `--privileged=true` flag to your `docker run` command to grant the container necessary permissions. Please be aware that using `--privileged=true` grants the container extensive privileges on the host system, which can be a security risk. Only use this option if you understand the implications and trust the container's source.

### 22. How to achieve low latency in a small batch scenario?

The performance of `torch_npu.npu_fused_infer_attention_score` in small batch scenarios is not satisfactory, mainly due to the lack of flash decoding function. We offer an alternative operator in `tools/install_flash_infer_attention_score_ops_a2.sh` and `tools/install_flash_infer_attention_score_ops_a3.sh`, you can install it using the following instruction:

```bash
bash tools/install_flash_infer_attention_score_ops_a2.sh
## change to run the following instruction if you're using A3 machine
# bash tools/install_flash_infer_attention_score_ops_a3.sh
```

**NOTE**: Don't set `additional_config.pa_shape_list` when using this method; otherwise, it will lead to another attention operator.
**Important**: Please make sure you're using the **official image** of `vllm-ascend`; otherwise, you **must change** the directory `/vllm-workspace` in `tools/install_flash_infer_attention_score_ops_a2.sh` or `tools/install_flash_infer_attention_score_ops_a3.sh` to your own, or create one. If you're not the root user, you need `sudo` **privileges** to run this script.

### 23. How to set `SOC_VERSION` when building from source on a CPU-only machine?

When building from source (e.g. `pip install -e .`), the build may try to infer the target chip via `npu-smi`. If `npu-smi` is not available (common in CPU-only build environments), you must set `SOC_VERSION` manually before installation.

You can use the defaults from `Dockerfile*` as a reference. For example:

```bash
# Atlas A2
export SOC_VERSION="ascend910b1"

# Atlas A3
export SOC_VERSION="ascend910_9391"

# Atlas 300I
export SOC_VERSION="ascend310p1"

# Atlas A5 (Ascend 950 series)
export SOC_VERSION="<value starting with ascend950>"
```

### 24. Compilation error occasionally encounters with triton-ascend

As shown in [#7782](https://github.com/vllm-project/vllm-ascend/issues/7782), triton-ascend occasionally encounters compilation errors, which is a known issue in triton-ascend 3.2.0. To avoid this issue, please use the official docker images or install the specific triton-ascend version as following:

```bash
PYTHON_TAG=$(python3 -c "import sys; print(f'cp{sys.version_info.major}{sys.version_info.minor}')") && \
ARCH=$(python3 -c "import platform; machine = platform.machine().lower(); arch_map = {'x86_64': 'x86_64', 'amd64': 'x86_64', 'aarch64': 'aarch64', 'arm64': 'aarch64'}; print(arch_map.get(machine, machine))") && \
TRITON_ASCEND_WHEEL="triton_ascend-3.2.0.dev20260322-${PYTHON_TAG}-${PYTHON_TAG}-manylinux_2_27_${ARCH}.manylinux_2_28_${ARCH}.whl" && \
python3 -m pip install "https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/${TRITON_ASCEND_WHEEL}"
```
-												[Doc] Add initial FAQs (#247)

### What this PR does / why we need it?
Add initial FAQs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-03-06 10:42:42 +08:00
+								# FAQs
 								## Version Specific FAQs
-												[ReleaseNote] Add release note for v0.17.0rc1 (#7240)

### What this PR does / why we need it?
This pull request adds the release notes for `v0.17.0rc1`. It also
updates version numbers across various documentation files, including
`README.md`, `README.zh.md`,
`docs/source/community/versioning_policy.md`, and `docs/source/conf.py`
to reflect the new release.

- vLLM version: v0.17.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/4034c3d32e30d01639459edd3ab486f56993876d
											
										
										
											2026-03-15 22:47:47 +08:00
+								- [[v0.17.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/7173)
-												[Doc] backport 0.13.0 release note (#6584)

### What this PR does / why we need it?
Backport 0.13.0 release note to main branch and update related doc link

### Does this PR introduce _any_ user-facing change?
yes
### How was this patch tested?
by doc CI

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/d7e17aaacd5ed1b4b4be6bcfef3a1b7cbc84fc9a

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2026-02-06 10:29:15 +08:00
+								- [[v0.13.0] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6583)
-												[Doc] Add initial FAQs (#247)

### What this PR does / why we need it?
Add initial FAQs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-03-06 10:42:42 +08:00
 								## General FAQs
 								### 1. What devices are currently supported?
-												[Doc] fix the nit in docs (#6826)

Refresh the doc, fix the nit in the docs

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2026-02-27 11:50:27 +08:00
+								Currently, **ONLY** Atlas A2 series (Ascend-cann-kernels-910b), Atlas A3 series (Atlas-A3-cann-kernels) and Atlas 300I (Ascend-cann-kernels-310p) series are supported:
-												[Doc] Add initial FAQs (#247)

### What this PR does / why we need it?
Add initial FAQs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-03-06 10:42:42 +08:00
 								- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
 								- Atlas 800I A2 Inference series (Atlas 800I A2)
-												[ReleaseNote] Release note of v0.10.0rc1 (#2225)

### What this PR does / why we need it?
Release note of v0.10.0rc1

- vLLM version: v0.10.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/8e8e0b6af189d262bcfdaef6c0cfb94772e86b0b

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
											
										
										
											2025-08-07 14:46:49 +08:00
+								- Atlas A3 Training series (Atlas 800T A3, Atlas 900 A3 SuperPoD, Atlas 9000 A3 SuperPoD)
 								- Atlas 800I A3 Inference series (Atlas 800I A3)
-												[Info][main] Correct the mistake in information documents (#4157)

### What this PR does / why we need it?
Correct the mistake in information documents

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.11.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/2918c1b49c88c29783c86f78d2c4221cb9622379

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
											
										
										
											2025-11-13 15:53:58 +08:00
+								- [Experimental] Atlas 300I Inference series (Atlas 300I Duo).
 								- [Experimental] Currently for 310I Duo the stable version is vllm-ascend v0.10.0rc1.
-												[Doc] Add initial FAQs (#247)

### What this PR does / why we need it?
Add initial FAQs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-03-06 10:42:42 +08:00
 								Below series are NOT supported yet:
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
-												[Doc] Add initial FAQs (#247)

### What this PR does / why we need it?
Add initial FAQs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-03-06 10:42:42 +08:00
+								- Atlas 200I A2 (Ascend-cann-kernels-310b) unplanned yet
 								- Ascend 910, Ascend 910 Pro B (Ascend-cann-kernels-910) unplanned yet
-												[Doc] fix the nit in docs (#6826)

Refresh the doc, fix the nit in the docs

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2026-02-27 11:50:27 +08:00
+								From a technical view, vllm-ascend supports devices if torch-npu is supported. Otherwise, we have to implement it by using custom ops. We also welcome you to join us to improve together.
-												[Doc] Update FAQ doc (#504)

### What this PR does / why we need it?
Update FAQ doc.
---------

Signed-off-by: shen-shanshan <467638484@qq.com>
											
										
										
											2025-04-14 11:11:40 +08:00
 								### 2. How to get our docker containers?
 								You can get our containers at `Quay.io`, e.g., [<u>vllm-ascend</u>](https://quay.io/repository/ascend/vllm-ascend?tab=tags) and [<u>cann</u>](https://quay.io/repository/ascend/cann?tab=tags).
-												[Doc] Update FAQ (#3792)

Many FAQ content is out of date, this PR refresh it.

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/c9461e05a4ed3557cfbf4b15ded1e26761cc39ca

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-10-27 20:32:17 +08:00
+								If you are in China, you can use `daocloud` or some other mirror sites to accelerate your downloading:
-												[Doc] Update FAQ doc (#504)

### What this PR does / why we need it?
Update FAQ doc.
---------

Signed-off-by: shen-shanshan <467638484@qq.com>
											
										
										
											2025-04-14 11:11:40 +08:00
 								```bash
-												[Doc] Update faqs (#699)

### What this PR does / why we need it?
Update faqs to make it more clear


Signed-off-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-04-28 18:48:23 +08:00
+								# Replace with tag you want to pull
-												[Doc] Update FAQ (#3792)

Many FAQ content is out of date, this PR refresh it.

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/c9461e05a4ed3557cfbf4b15ded1e26761cc39ca

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-10-27 20:32:17 +08:00
+								TAG=v0.9.1
-												[Doc] Update faqs (#699)

### What this PR does / why we need it?
Update faqs to make it more clear


Signed-off-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-04-28 18:48:23 +08:00
+								docker pull m.daocloud.io/quay.io/ascend/vllm-ascend:$TAG
-												[Doc] Update FAQ (#3792)

Many FAQ content is out of date, this PR refresh it.

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/c9461e05a4ed3557cfbf4b15ded1e26761cc39ca

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-10-27 20:32:17 +08:00
+								# or
 								docker pull quay.nju.edu.cn/ascend/vllm-ascend:$TAG
-												[Doc] Update FAQ doc (#504)

### What this PR does / why we need it?
Update FAQ doc.
---------

Signed-off-by: shen-shanshan <467638484@qq.com>
											
										
										
											2025-04-14 11:11:40 +08:00
+								```
-												[Doc] Add container image save/load FAQ for offline environments (#2347)

### What this PR does / why we need it?

Add Docker export/import guide for air-gapped environments

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

NA

- vLLM version: v0.10.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/d16aa3dae446d93f870a2e51b240e18a01cac294

Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
											
										
										
											2025-08-13 16:00:43 +08:00
+								#### Load Docker Images for offline environment
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
-												[Info][main] Corrected the errors in the information (#4055)

### What this PR does / why we need it?
Corrected the errors in the information

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.11.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83f478bb19489b41e9d208b47b4bb5a95ac171ac

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
											
										
										
											2025-11-08 18:48:59 +08:00
+								If you want to use container image for offline environments (no internet connection), you need to download container image in an environment with internet access:
-												[Doc] Add container image save/load FAQ for offline environments (#2347)

### What this PR does / why we need it?

Add Docker export/import guide for air-gapped environments

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

NA

- vLLM version: v0.10.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/d16aa3dae446d93f870a2e51b240e18a01cac294

Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
											
										
										
											2025-08-13 16:00:43 +08:00
 								**Exporting Docker images:**
 								```{code-block} bash
 								   :substitutions:
 								# Pull the image on a machine with internet access
 								TAG=|vllm_ascend_version|
 								docker pull quay.io/ascend/vllm-ascend:$TAG
 								# Export the image to a tar file and compress to tar.gz
 								docker save quay.io/ascend/vllm-ascend:$TAG | gzip > vllm-ascend-$TAG.tar.gz
 								```
 								**Importing Docker images in environment without internet access:**
 								```{code-block} bash
 								   :substitutions:
 								# Transfer the tar/tar.gz file to the offline environment and load it
 								TAG=|vllm_ascend_version|
 								docker load -i vllm-ascend-$TAG.tar.gz
 								# Verify the image is loaded
 								docker images | grep vllm-ascend
 								```
-												[Doc] Update FAQ doc (#504)

### What this PR does / why we need it?
Update FAQ doc.
---------

Signed-off-by: shen-shanshan <467638484@qq.com>
											
										
										
											2025-04-14 11:11:40 +08:00
+								### 3. What models does vllm-ascend supports?
-												[Doc] Update doc url link (#5781)

Drop `dev` suffix for doc url.
Rename url to `https://docs.vllm.ai/projects/ascend`

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/2f4e6548efec402b913ffddc8726230d9311948d

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2026-01-12 11:21:31 +08:00
+								Find more details [<u>here</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html).
-												[Doc] Update FAQ doc (#504)

### What this PR does / why we need it?
Update FAQ doc.
---------

Signed-off-by: shen-shanshan <467638484@qq.com>
											
										
										
											2025-04-14 11:11:40 +08:00
 								### 4. How to get in touch with our community?
 								There are many channels that you can communicate with our community developers / users:
 								- Submit a GitHub [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues?page=1).
 								- Join our [<u>weekly meeting</u>](https://docs.google.com/document/d/1hCSzRTMZhIB8vRq1_qOOjx4c9uYUxvdQvDsMV2JcSrw/edit?tab=t.0#heading=h.911qu8j8h35z) and share your ideas.
-												[Info][main] Corrected the errors in the information (#4055)

### What this PR does / why we need it?
Corrected the errors in the information

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.11.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83f478bb19489b41e9d208b47b4bb5a95ac171ac

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
											
										
										
											2025-11-08 18:48:59 +08:00
+								- Join our [<u>WeChat</u>](https://github.com/vllm-project/vllm-ascend/issues/227) group and ask your questions.
-												[Doc] Update FAQ doc (#504)

### What this PR does / why we need it?
Update FAQ doc.
---------

Signed-off-by: shen-shanshan <467638484@qq.com>
											
										
										
											2025-04-14 11:11:40 +08:00
+								- Join our ascend channel in [<u>vLLM forums</u>](https://discuss.vllm.ai/c/hardware-support/vllm-ascend-support/6) and publish your topics.
 								### 5. What features does vllm-ascend V1 supports?
-												[Doc] Update doc url link (#5781)

Drop `dev` suffix for doc url.
Rename url to `https://docs.vllm.ai/projects/ascend`

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/2f4e6548efec402b913ffddc8726230d9311948d

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2026-01-12 11:21:31 +08:00
+								Find more details [<u>here</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html).
-												[Doc] Update FAQ (#518)

Update FAQ

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-04-15 10:17:56 +08:00
 								### 6. How to solve the problem of "Failed to infer device type" or "libatb.so: cannot open shared object file"?
-												[MISC] Add patch module (#526)

This PR added patch module for vllm
1. platform patch: the patch will be registered when load the platform
2. worker patch: the patch will be registered when worker is started.

The detail is:
1. patch_common: patch for main and 0.8.4 version
4. patch_main: patch for main verison
5. patch_0_8_4: patch for 0.8.4 version
											
										
										
											2025-04-16 09:28:58 +08:00
+								Basically, the reason is that the NPU environment is not configured correctly. You can:
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
-												[MISC] Add patch module (#526)

This PR added patch module for vllm
1. platform patch: the patch will be registered when load the platform
2. worker patch: the patch will be registered when worker is started.

The detail is:
1. patch_common: patch for main and 0.8.4 version
4. patch_main: patch for main verison
5. patch_0_8_4: patch for 0.8.4 version
											
										
										
											2025-04-16 09:28:58 +08:00
+. try `source /usr/local/Ascend/nnal/atb/set_env.sh` to enable NNAL package.
 . try `source /usr/local/Ascend/ascend-toolkit/set_env.sh` to enable CANN package.
 . try `npu-smi info` to check whether the NPU is working.
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								If the above steps are not working, you can try the following code in Python to check whether there are any errors:
-												[MISC] Add patch module (#526)

This PR added patch module for vllm
1. platform patch: the patch will be registered when load the platform
2. worker patch: the patch will be registered when worker is started.

The detail is:
1. patch_common: patch for main and 0.8.4 version
4. patch_main: patch for main verison
5. patch_0_8_4: patch for 0.8.4 version
											
										
										
											2025-04-16 09:28:58 +08:00
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
+								```python
-												[MISC] Add patch module (#526)

This PR added patch module for vllm
1. platform patch: the patch will be registered when load the platform
2. worker patch: the patch will be registered when worker is started.

The detail is:
1. patch_common: patch for main and 0.8.4 version
4. patch_main: patch for main verison
5. patch_0_8_4: patch for 0.8.4 version
											
										
										
											2025-04-16 09:28:58 +08:00
+								import torch
 								import torch_npu
 								import vllm
 								```
 								If all above steps are not working, feel free to submit a GitHub issue.
-												[Doc] Update FAQ (#518)

Update FAQ

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-04-15 10:17:56 +08:00
-												[Doc] Update FAQ (#3792)

Many FAQ content is out of date, this PR refresh it.

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/c9461e05a4ed3557cfbf4b15ded1e26761cc39ca

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-10-27 20:32:17 +08:00
+								### 7. How vllm-ascend work with vLLM?
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								`vllm-ascend` is a hardware plugin for vLLM. The version of `vllm-ascend` is the same as the version of `vllm`. For example, if you use `vllm` 0.9.1, you should use vllm-ascend 0.9.1 as well. For the main branch, we ensure that `vllm-ascend` and `vllm` are compatible at every commit.
-												[Doc] Update FAQ (#518)

Update FAQ

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-04-15 10:17:56 +08:00
-												[Doc] Update FAQ (#3792)

Many FAQ content is out of date, this PR refresh it.

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/c9461e05a4ed3557cfbf4b15ded1e26761cc39ca

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-10-27 20:32:17 +08:00
+								### 8. Does vllm-ascend support Prefill Disaggregation feature?
-												[Doc] Update FAQ (#518)

Update FAQ

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-04-15 10:17:56 +08:00
-												[Doc] fix the nit in docs (#6826)

Refresh the doc, fix the nit in the docs

- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2026-02-27 11:50:27 +08:00
+								Yes, vllm-ascend supports Prefill Disaggregation feature with Mooncake backend. See the [official tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/pd_disaggregation_mooncake_multi_node.html) for example.
-												[Doc] Update FAQ (#518)

Update FAQ

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-04-15 10:17:56 +08:00
-												[Doc] Update FAQ (#3792)

Many FAQ content is out of date, this PR refresh it.

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/c9461e05a4ed3557cfbf4b15ded1e26761cc39ca

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-10-27 20:32:17 +08:00
+								### 9. Does vllm-ascend support quantization method?
-												[Doc] Update FAQ (#518)

Update FAQ

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-04-15 10:17:56 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								Currently, w8a8, w4a8, and w4a4 quantization methods are already supported by vllm-ascend.
-												[Doc] update faq about w8a8 (#534)

update faq about w8a8

---------

Signed-off-by: Mengqing Cao <cmq0113@163.com>
											
										
										
											2025-04-16 09:37:21 +08:00
-												[Doc] Update doc (#3836)

### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:03:39 +08:00
+								### 10. How to run a W8A8 DeepSeek model?
-												[Doc] update faq about w8a8 (#534)

update faq about w8a8

---------

Signed-off-by: Mengqing Cao <cmq0113@163.com>
											
										
										
											2025-04-16 09:37:21 +08:00
-												[Doc] Update doc url link (#5781)

Drop `dev` suffix for doc url.
Rename url to `https://docs.vllm.ai/projects/ascend`

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/2f4e6548efec402b913ffddc8726230d9311948d

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2026-01-12 11:21:31 +08:00
+								Follow the [inference tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/multi_node.html) and replace the model with DeepSeek.
-												[Doc] update faq about w8a8 (#534)

update faq about w8a8

---------

Signed-off-by: Mengqing Cao <cmq0113@163.com>
											
										
										
											2025-04-16 09:37:21 +08:00
-												[Doc] Update doc (#3836)

### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:03:39 +08:00
+								### 11. How is vllm-ascend tested?
-												[Doc]Update faq (#536)

### What this PR does / why we need it?
update performance and accuracy faq

Signed-off-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-04-17 14:56:51 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								vllm-ascend is tested in three aspects: functions, performance, and accuracy.
-												[Doc]Update faq (#536)

### What this PR does / why we need it?
update performance and accuracy faq

Signed-off-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-04-17 14:56:51 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								- **Functional test**: We added CI, including part of vllm's native unit tests and vllm-ascend's own unit tests. In vllm-ascend's tests, we test basic functionalities, popular model availability, and [supported features](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html) through E2E test.
-												[Doc]Update faq (#536)

### What this PR does / why we need it?
update performance and accuracy faq

Signed-off-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-04-17 14:56:51 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								- **Performance test**: We provide [benchmark](https://github.com/vllm-project/vllm-ascend/tree/main/benchmarks) tools for E2E performance benchmark, which can be easily re-run locally. We will publish a perf website to show the performance test results for each pull request.
-												[Doc]Update faq (#536)

### What this PR does / why we need it?
update performance and accuracy faq

Signed-off-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-04-17 14:56:51 +08:00
-												[Doc] Update doc (#3836)

### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:03:39 +08:00
+								- **Accuracy test**: We are working on adding accuracy test to the CI as well.
-												[Doc]Update faq (#536)

### What this PR does / why we need it?
update performance and accuracy faq

Signed-off-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-04-17 14:56:51 +08:00
-												[Doc] Update FAQ (#3792)

Many FAQ content is out of date, this PR refresh it.

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/c9461e05a4ed3557cfbf4b15ded1e26761cc39ca

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-10-27 20:32:17 +08:00
+								- **Nightly test**: we'll run full test every night to make sure the code is working.
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								For each release, we'll publish the performance test and accuracy test report in the future.
-												[MISC] Make vllm version configurable (#651)

Sometimes, user install a dev/editable version of vllm. In this case, we
should make sure vllm-ascend works as well.

This PR add a new env `VLLM_VERSION`. It's used for developers who edit
vllm. In this case, developers should set thie env to make sure which
vllm version is installed and used.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-04-28 14:19:06 +08:00
-												[Doc] Update FAQ (#3792)

Many FAQ content is out of date, this PR refresh it.

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/c9461e05a4ed3557cfbf4b15ded1e26761cc39ca

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-10-27 20:32:17 +08:00
+								### 12. How to fix the error "InvalidVersion" when using vllm-ascend?
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								The problem is usually caused by the installation of a development or editable version of the vLLM package. In this case, we provide the environment variable `VLLM_VERSION` to let users specify the version of vLLM package to use. Please set the environment variable `VLLM_VERSION` to the version of the vLLM package you have installed. The format of `VLLM_VERSION` should be `X.Y.Z`.
-												[Doc] Add notes for OOM in FAQs (#786)

### What this PR does / why we need it?
add notes for OOM in faqs.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

---------

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-05-08 16:28:29 +08:00
-												[Doc] Update doc (#3836)

### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:03:39 +08:00
+								### 13. How to handle the out-of-memory issue?
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
-												[Doc] Update doc (#3836)

### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:03:39 +08:00
+								OOM errors typically occur when the model exceeds the memory capacity of a single NPU. For general guidance, you can refer to [vLLM OOM troubleshooting documentation](https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html#out-of-memory).
-												[Doc] Add notes for OOM in FAQs (#786)

### What this PR does / why we need it?
add notes for OOM in faqs.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

---------

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-05-08 16:28:29 +08:00
-												[Doc] Update doc (#3836)

### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:03:39 +08:00
+								In scenarios where NPUs have limited high bandwidth memory (HBM) capacity, dynamic memory allocation/deallocation during inference can exacerbate memory fragmentation, leading to OOM. To address this:
-												[Doc] Add notes for OOM in FAQs (#786)

### What this PR does / why we need it?
add notes for OOM in faqs.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

---------

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-05-08 16:28:29 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								- **Limit `--max-model-len`**: It can save the HBM usage for KV cache initialization step.
-												[Doc] Update FAQ (#3792)

Many FAQ content is out of date, this PR refresh it.

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/c9461e05a4ed3557cfbf4b15ded1e26761cc39ca

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-10-27 20:32:17 +08:00
-												[Doc] Update doc (#3836)

### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:03:39 +08:00
+								- **Adjust `--gpu-memory-utilization`**: If unspecified, the default value is `0.9`. You can decrease this value to reserve more memory to reduce fragmentation risks. See details in: [vLLM - Inference and Serving - Engine Arguments](https://docs.vllm.ai/en/latest/serving/engine_args.html#vllm.engine.arg_utils-_engine_args_parser-cacheconfig).
-												[Doc] Add notes for OOM in FAQs (#786)

### What this PR does / why we need it?
add notes for OOM in faqs.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

---------

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-05-08 16:28:29 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								- **Configure `PYTORCH_NPU_ALLOC_CONF`**: Set this environment variable to optimize NPU memory management. For example, you can use `export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True` to enable virtual memory feature to mitigate memory fragmentation caused by frequent dynamic memory size adjustments during runtime. See details in [PYTORCH_NPU_ALLOC_CONF](https://www.hiascend.com/document/detail/zh/Pytorch/700/comref/Envvariables/Envir_012.html).
-												[MLA][Graph] Improve assertion on Graph mode with MLA (#933)

### What this PR does / why we need it?
Improve assertion on Graph mode with MLA.

When running deepseek with graph mode, the fused MLA op only support
`numHeads / numKvHeads ∈ {32, 64, 128}`, thus we improve the assertion
info here to avoid users confused with this.

### Does this PR introduce _any_ user-facing change?
Adjusting tp size is required when running deepseek-v3/r1 with graph
mode. deepseek-v2-lite is not supported in graph mode.

### How was this patch tested?
Test locally as the CI machine could not run V3 due to the HBM limits.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
											
										
										
											2025-06-10 22:26:53 +08:00
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
+								### 14. Failed to enable NPU graph mode when running DeepSeek
-												[Info][main] Corrected the errors in the information (#4055)

### What this PR does / why we need it?
Corrected the errors in the information

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.11.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/83f478bb19489b41e9d208b47b4bb5a95ac171ac

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
											
										
										
											2025-11-08 18:48:59 +08:00
+								Enabling NPU graph mode for DeepSeek may trigger an error. This is because when both MLA and NPU graph mode are active, the number of queries per KV head must be 32, 64, or 128. However, DeepSeek-V2-Lite has only 16 attention heads, which results in 16 queries per KV—a value outside the supported range. Support for NPU graph mode on DeepSeek-V2-Lite will be added in a future update.
-												[MLA][Graph] Improve assertion on Graph mode with MLA (#933)

### What this PR does / why we need it?
Improve assertion on Graph mode with MLA.

When running deepseek with graph mode, the fused MLA op only support
`numHeads / numKvHeads ∈ {32, 64, 128}`, thus we improve the assertion
info here to avoid users confused with this.

### Does this PR introduce _any_ user-facing change?
Adjusting tp size is required when running deepseek-v3/r1 with graph
mode. deepseek-v2-lite is not supported in graph mode.

### How was this patch tested?
Test locally as the CI machine could not run V3 due to the HBM limits.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
											
										
										
											2025-06-10 22:26:53 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								And if you're using DeepSeek-V3 or DeepSeek-R1, please make sure after the tensor parallel split, `num_heads`/`num_kv_heads` is {32, 64, 128}.
-												[MLA][Graph] Improve assertion on Graph mode with MLA (#933)

### What this PR does / why we need it?
Improve assertion on Graph mode with MLA.

When running deepseek with graph mode, the fused MLA op only support
`numHeads / numKvHeads ∈ {32, 64, 128}`, thus we improve the assertion
info here to avoid users confused with this.

### Does this PR introduce _any_ user-facing change?
Adjusting tp size is required when running deepseek-v3/r1 with graph
mode. deepseek-v2-lite is not supported in graph mode.

### How was this patch tested?
Test locally as the CI machine could not run V3 due to the HBM limits.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
											
										
										
											2025-06-10 22:26:53 +08:00
 								```bash
 								[rank0]: RuntimeError: EZ9999: Inner Error!
 								[rank0]: EZ9999: [PID: 62938] 2025-05-27-06:52:12.455.807 numHeads / numKvHeads = 8, MLA only support {32, 64, 128}.[FUNC:CheckMlaAttrs][FILE:incre_flash_attention_tiling_check.cc][LINE:1218]
 								```
-												[Doc] Add reinstall instructions doc (#1303)

Add a new FAQ, if users re-install vllm-ascend with pip, the `build`
folder should be removed first

---------

Signed-off-by: rjg-lyh <1318825571@qq.com>
Signed-off-by: weiguihua <weiguihua2@huawei.com>
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
											
										
										
											2025-06-23 14:06:27 +08:00
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
+								### 15. Failed to reinstall vllm-ascend from source after uninstalling vllm-ascend
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								You may encounter the problem of C/C++ compilation failure when reinstalling vllm-ascend from source using pip. If the installation fails, use `python setup.py install` (recommended) to install, or use `python setup.py clean` to clear the cache.
-												[Doc] Update FAQ and add test guidance (#1360)

### What this PR does / why we need it?
- Add test guidance
- Add reduce layer guidance
- update faq on determinitic calculation

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-06-25 09:59:23 +08:00
-												[Doc] Update doc (#3836)

### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:03:39 +08:00
+								### 16. How to generate deterministic results when using vllm-ascend?
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								There are several factors that affect output determinism:
-												[Doc] Update FAQ and add test guidance (#1360)

### What this PR does / why we need it?
- Add test guidance
- Add reduce layer guidance
- update faq on determinitic calculation

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-06-25 09:59:23 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+. Sampler method: using **greedy sampling** by setting `temperature=0` in `SamplingParams`, e.g.:
-												[Doc] Update FAQ and add test guidance (#1360)

### What this PR does / why we need it?
- Add test guidance
- Add reduce layer guidance
- update faq on determinitic calculation

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-06-25 09:59:23 +08:00
 								```python
 								from vllm import LLM, SamplingParams
 								prompts = [
 								    "Hello, my name is",
 								    "The president of the United States is",
 								    "The capital of France is",
 								    "The future of AI is",
 								]
 								# Create a sampling params object.
 								sampling_params = SamplingParams(temperature=0)
 								# Create an LLM.
 								llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")
 								# Generate texts from the prompts.
 								outputs = llm.generate(prompts, sampling_params)
 								for output in outputs:
 								    prompt = output.prompt
 								    generated_text = output.outputs[0].text
 								    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 								```
-												[Doc] Update doc (#3836)

### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:03:39 +08:00
+. Set the following environment parameters:
-												[Doc] Update FAQ and add test guidance (#1360)

### What this PR does / why we need it?
- Add test guidance
- Add reduce layer guidance
- update faq on determinitic calculation

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-06-25 09:59:23 +08:00
 								```bash
-												[Doc] Update faq (#2334)

### What this PR does / why we need it?
  - update determinitic calculation
  - update support device

### Does this PR introduce _any_ user-facing change?
- Users should update ray and protobuf when using ray as distributed
backend
- Users should change to use `export HCCL_DETERMINISTIC=true` when
enabling determinitic calculation

### How was this patch tested?
N/A

- vLLM version: v0.10.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/ea1292ad3ee724e44b3dfec2a26778cd614729f9

Signed-off-by: MengqingCao <cmq0113@163.com>
											
										
										
											2025-08-12 14:12:53 +08:00
+								export LCCL_DETERMINISTIC=1
 								export HCCL_DETERMINISTIC=true
 								export ATB_MATMUL_SHUFFLE_K_ENABLE=0
 								export ATB_LLM_LCOC_ENABLE=0
-												[Doc] Update FAQ and add test guidance (#1360)

### What this PR does / why we need it?
- Add test guidance
- Add reduce layer guidance
- update faq on determinitic calculation

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-06-25 09:59:23 +08:00
+								```
-												[Doc] Add a doc for qwen omni (#1867)

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

### What this PR does / why we need it?
Add FAQ note for qwen omni
Fixes https://github.com/vllm-project/vllm-ascend/issues/1760 issue1



- vLLM version: v0.9.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/b9a21e9173508e38ac693a8781c48ee24c8873ec
											
										
										
											2025-07-20 09:05:41 +08:00
-												[Doc] Update doc (#3836)

### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:03:39 +08:00
+								### 17. How to fix the error "ImportError: Please install vllm[audio] for audio support" for the Qwen2.5-Omni model？
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								The `Qwen2.5-Omni` model requires the `librosa` package to be installed, you need to install the `qwen-omni-utils` package to ensure all dependencies are met, run `pip install qwen-omni-utils`.
 								This package will install `librosa` and its related dependencies, resolving the `ImportError: No module named 'librosa'` issue and ensuring that the audio processing functionality works correctly.
-												[BugFix] Fix ACLgraph bug in Qwen3_32b_int8 case (#3204)

### What this PR does / why we need it?
1. Solved the issue where sizes capture failed for the Qwen3-32b-int8
model when aclgraph, dp1, and tp4 were enabled.
2. Added the exception thrown when sizes capture fails and provided a
solution
3. Add this common problem to the FAQ doc
### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
											
										
										
											2025-09-28 17:44:04 +08:00
-												[Doc] Update FAQ (#3792)

Many FAQ content is out of date, this PR refresh it.

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/c9461e05a4ed3557cfbf4b15ded1e26761cc39ca

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-10-27 20:32:17 +08:00
+								### 18. How to troubleshoot and resolve size capture failures resulting from stream resource exhaustion, and what are the underlying causes?
-												[BugFix] Fix ACLgraph bug in Qwen3_32b_int8 case (#3204)

### What this PR does / why we need it?
1. Solved the issue where sizes capture failed for the Qwen3-32b-int8
model when aclgraph, dp1, and tp4 were enabled.
2. Added the exception thrown when sizes capture fails and provided a
solution
3. Add this common problem to the FAQ doc
### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
											
										
										
											2025-09-28 17:44:04 +08:00
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
+								```shell
-												[BugFix] Fix ACLgraph bug in Qwen3_32b_int8 case (#3204)

### What this PR does / why we need it?
1. Solved the issue where sizes capture failed for the Qwen3-32b-int8
model when aclgraph, dp1, and tp4 were enabled.
2. Added the exception thrown when sizes capture fails and provided a
solution
3. Add this common problem to the FAQ doc
### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
											
										
										
											2025-09-28 17:44:04 +08:00
+								error example in detail:
 								ERROR 09-26 10:48:07 [model_runner_v1.py:3029] ACLgraph sizes capture fail: RuntimeError:
 								ERROR 09-26 10:48:07 [model_runner_v1.py:3029] ACLgraph has insufficient available streams to capture the configured number of sizes.Please verify both the availability of adequate streams and the appropriateness of the configured size count.
 								```
 								Recommended mitigation strategies:
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
-												[BugFix] Fix ACLgraph bug in Qwen3_32b_int8 case (#3204)

### What this PR does / why we need it?
1. Solved the issue where sizes capture failed for the Qwen3-32b-int8
model when aclgraph, dp1, and tp4 were enabled.
2. Added the exception thrown when sizes capture fails and provided a
solution
3. Add this common problem to the FAQ doc
### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
											
										
										
											2025-09-28 17:44:04 +08:00
+. Manually configure the compilation_config parameter with a reduced size set: '{"cudagraph_capture_sizes":[size1, size2, size3, ...]}'.
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+. Employ ACLgraph's full graph mode as an alternative to the piecewise approach.
-												[BugFix] Fix ACLgraph bug in Qwen3_32b_int8 case (#3204)

### What this PR does / why we need it?
1. Solved the issue where sizes capture failed for the Qwen3-32b-int8
model when aclgraph, dp1, and tp4 were enabled.
2. Added the exception thrown when sizes capture fails and provided a
solution
3. Add this common problem to the FAQ doc
### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
											
										
										
											2025-09-28 17:44:04 +08:00
 								Root cause analysis:
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								The current stream requirement calculation for size captures only accounts for measurable factors including: data parallel size, tensor parallel size, expert parallel configuration, piece graph count, multistream-overlap shared expert settings, and HCCL communication mode (AIV/AICPU). However, numerous unquantifiable elements, such as operator characteristics and specific hardware features, consume additional streams outside of this calculation framework, resulting in stream resource exhaustion during size capture operations.
-												[Doc] add faqs:install vllm-ascend will overwrite existing torch-npu (#3245)

### What this PR does / why we need it?
add faqs:install vllm-ascend will overwrite existing torch-npu

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
											
										
										
											2025-09-29 12:02:23 +08:00
-												[Doc] Update FAQ (#3792)

Many FAQ content is out of date, this PR refresh it.

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/c9461e05a4ed3557cfbf4b15ded1e26761cc39ca

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-10-27 20:32:17 +08:00
+								### 19. How to install custom version of torch_npu?
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
-												[Doc] Update doc (#3836)

### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-10-29 11:03:39 +08:00
+								torch-npu will be overridden  when installing vllm-ascend. If you need to install a specific version of torch-npu, you can manually install the specified version of torch-npu after vllm-ascend is installed.
-												Add FAQ for docker pull error on Kylin OS (#3870)

Added instructions for resolving 'invalid tar header' error on Kylin OS with an ARM64 architecture on Atlas300I hardware during docker
pull, including steps for offline loading of docker images.

---

### What this PR does / why we need it?

The primary motivation for this PR is to address a critical `docker
pull` failure that occurs on specific, yet important, enterprise
environments. Specifically, when operating on **Kylin OS (麒麟操作系统) with
an ARM64 architecture on Atlas300I hardware**, users frequently
encounter an `archive/tar: invalid tar header` error, which completely
blocks the setup process. This issue has been consistently reproduced,
with multiple retries failing with the same error, confirming that it is
a persistent environmental problem rather than a transient network
issue.

<img width="2060" height="525" alt="image"
src="https://github.com/user-attachments/assets/6c1c5728-de27-476f-8df4-723564fc290b"
/>

This guide provides a robust, step-by-step workaround using an
offline-loading method (`docker save` on a host machine and `docker
load` on the target machine). This solution is crucial for enabling
users on this platform to use vLLM.

This contribution does not directly fix an existing issue number, but it
proactively solves a significant environmental and usability problem for
a growing user base.

### Does this PR introduce _any_ user-facing change?

No.It does not alter any code, APIs, interfaces, or existing behavior of
the vLLM project.

### How was this patch tested?

The instructions and troubleshooting steps in this guide were validated
through a real-world, end-to-end test case on the my hardware and OS.

The testing process was as follows:

1. **Problem Reproduction**: An attempt was made to directly `docker
pull` the `vllm-ascend:v0.10.0rc1-310p` image on a target machine
running Kylin OS (ARM64). The `invalid tar header` failure was
successfully and consistently reproduced, confirming the existence of
the problem.
2. **Solution Implementation**: The workaround detailed in the guide was
executed:
* On a separate host machine (Ubuntu x86_64), the image was successfully
pulled using the `--platform linux/arm64` flag.
* The image was then saved to a `.tar` archive using `docker save`.
* The `.tar` archive was transferred to the target Kylin OS machine.
* The image was successfully loaded from the archive using `docker load
-i ...`.
3. **End-to-End Validation**: After loading the image, the vLLM
container was launched on the target machine following the instructions
in the guide. Both online inference (via `curl` to the API server) and
offline inference (via the Python script) were executed successfully,
confirming that the entire workflow described in the document is
accurate and effective.

Since this is a documentation-only change based on a validated workflow,
no new unit or integration tests were added to the codebase.


- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/83f478bb19489b41e9d208b47b4bb5a95ac171ac

---------

Signed-off-by: Liwx <liweixuan1014@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
											
										
										
											2025-10-30 14:10:52 +08:00
 								### 20. On certain systems (e.g., Kylin OS), `docker pull` may fail with an `invalid tar header` error
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								On certain operating systems, such as Kylin OS, you may encounter an `invalid tar header` error during the `docker pull` process:
-												Add FAQ for docker pull error on Kylin OS (#3870)

Added instructions for resolving 'invalid tar header' error on Kylin OS with an ARM64 architecture on Atlas300I hardware during docker
pull, including steps for offline loading of docker images.

---

### What this PR does / why we need it?

The primary motivation for this PR is to address a critical `docker
pull` failure that occurs on specific, yet important, enterprise
environments. Specifically, when operating on **Kylin OS (麒麟操作系统) with
an ARM64 architecture on Atlas300I hardware**, users frequently
encounter an `archive/tar: invalid tar header` error, which completely
blocks the setup process. This issue has been consistently reproduced,
with multiple retries failing with the same error, confirming that it is
a persistent environmental problem rather than a transient network
issue.

<img width="2060" height="525" alt="image"
src="https://github.com/user-attachments/assets/6c1c5728-de27-476f-8df4-723564fc290b"
/>

This guide provides a robust, step-by-step workaround using an
offline-loading method (`docker save` on a host machine and `docker
load` on the target machine). This solution is crucial for enabling
users on this platform to use vLLM.

This contribution does not directly fix an existing issue number, but it
proactively solves a significant environmental and usability problem for
a growing user base.

### Does this PR introduce _any_ user-facing change?

No.It does not alter any code, APIs, interfaces, or existing behavior of
the vLLM project.

### How was this patch tested?

The instructions and troubleshooting steps in this guide were validated
through a real-world, end-to-end test case on the my hardware and OS.

The testing process was as follows:

1. **Problem Reproduction**: An attempt was made to directly `docker
pull` the `vllm-ascend:v0.10.0rc1-310p` image on a target machine
running Kylin OS (ARM64). The `invalid tar header` failure was
successfully and consistently reproduced, confirming the existence of
the problem.
2. **Solution Implementation**: The workaround detailed in the guide was
executed:
* On a separate host machine (Ubuntu x86_64), the image was successfully
pulled using the `--platform linux/arm64` flag.
* The image was then saved to a `.tar` archive using `docker save`.
* The `.tar` archive was transferred to the target Kylin OS machine.
* The image was successfully loaded from the archive using `docker load
-i ...`.
3. **End-to-End Validation**: After loading the image, the vLLM
container was launched on the target machine following the instructions
in the guide. Both online inference (via `curl` to the API server) and
offline inference (via the Python script) were executed successfully,
confirming that the entire workflow described in the document is
accurate and effective.

Since this is a documentation-only change based on a validated workflow,
no new unit or integration tests were added to the codebase.


- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/83f478bb19489b41e9d208b47b4bb5a95ac171ac

---------

Signed-off-by: Liwx <liweixuan1014@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
											
										
										
											2025-10-30 14:10:52 +08:00
 								```text
 								failed to register layer: ApplyLayer exit status 1 stdout: stderr: archive/tar: invalid tar header
 								```
 								This is often due to system compatibility issues. You can resolve this by using an offline loading method with a second machine.
 . On a separate host machine (e.g., a standard Ubuntu server), pull the image for the target ARM64 architecture and package it into a `.tar` file.
 								   ```bash
 								   export IMAGE_TAG=v0.10.0rc1-310p
 								   export IMAGE_NAME="quay.io/ascend/vllm-ascend:${IMAGE_TAG}"
 								   # If in China region, uncomment to use a mirror:
 								   # export IMAGE_NAME="m.daocloud.io/quay.io/ascend/vllm-ascend:${IMAGE_TAG}"
 								   # Pull the image for the ARM64 platform and save it
 								   docker pull --platform linux/arm64 "${IMAGE_NAME}"
 								   docker save -o "vllm_ascend_${IMAGE_TAG}.tar" "${IMAGE_NAME}"
 								   ```
 . Transfer the image archive
 								Copy the `vllm_ascend_<tag>.tar` file (where `<tag>` is the image tag you used) to your target machine
-												[main][doc] Instructions for using permissions added to docker (#5092)

### What this PR does / why we need it?
Instructions for using permissions added to docker

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.12.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
											
										
										
											2025-12-17 15:26:09 +08:00
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
+								### 21. Why am I getting an error when executing the script to start a Docker container? The error message is: "operation not permitted"
-												[main][doc] Instructions for using permissions added to docker (#5092)

### What this PR does / why we need it?
Instructions for using permissions added to docker

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.12.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
											
										
										
											2025-12-17 15:26:09 +08:00
+								When using `--shm-size`, you may need to add the `--privileged=true` flag to your `docker run` command to grant the container necessary permissions. Please be aware that using `--privileged=true` grants the container extensive privileges on the host system, which can be a security risk. Only use this option if you understand the implications and trust the container's source.
-												[ReleaseNote] Add release note for v0.13.0rc1 (#5334)

### What this PR does / why we need it?
Add release note for v0.13.0rc1

- vLLM version: release/v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bc0a5a0c089844b17cb93f3294348f411e523586
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
											
										
										
											2025-12-27 18:46:57 +08:00
 								### 22. How to achieve low latency in a small batch scenario?
-												[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
											
										
										
											2026-01-15 09:06:01 +08:00
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								The performance of `torch_npu.npu_fused_infer_attention_score` in small batch scenarios is not satisfactory, mainly due to the lack of flash decoding function. We offer an alternative operator in `tools/install_flash_infer_attention_score_ops_a2.sh` and `tools/install_flash_infer_attention_score_ops_a3.sh`, you can install it using the following instruction:
-												[ReleaseNote] Add release note for v0.13.0rc1 (#5334)

### What this PR does / why we need it?
Add release note for v0.13.0rc1

- vLLM version: release/v0.13.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/bc0a5a0c089844b17cb93f3294348f411e523586
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
											
										
										
											2025-12-27 18:46:57 +08:00
 								```bash
 								bash tools/install_flash_infer_attention_score_ops_a2.sh
 								## change to run the following instruction if you're using A3 machine
 								# bash tools/install_flash_infer_attention_score_ops_a3.sh
 								```
-												[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
											
										
										
											2026-02-13 15:50:05 +08:00
+								**NOTE**: Don't set `additional_config.pa_shape_list` when using this method; otherwise, it will lead to another attention operator.
 								**Important**: Please make sure you're using the **official image** of `vllm-ascend`; otherwise, you **must change** the directory `/vllm-workspace` in `tools/install_flash_infer_attention_score_ops_a2.sh` or `tools/install_flash_infer_attention_score_ops_a3.sh` to your own, or create one. If you're not the root user, you need `sudo` **privileges** to run this script.
-												[Doc][Installation] Clarify SOC_VERSION for CPU-only source builds (#7278)

### What this PR does / why we need it?
- Clarify that `SOC_VERSION` must be set when building from source in a
CPU-only environment where `npu-smi` is unavailable.
- Add concrete `SOC_VERSION` examples (A2/A3/300I/A5) and point users to
`Dockerfile*` defaults.
- Improve the `setup.py` error message so users get actionable guidance
when `SOC_VERSION` is missing.

Fixes #6816.

### Does this PR introduce _any_ user-facing change?
- Yes. Documentation is updated and the build-time error message is more
informative.

### How was this patch tested?
- (Local) Syntax check: `python -m compileall setup.py`.

- vLLM version: v0.17.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/4034c3d32e30d01639459edd3ab486f56993876d

Signed-off-by: NJX-njx <3771829673@qq.com>
											
										
										
											2026-03-14 22:38:25 +08:00
 								### 23. How to set `SOC_VERSION` when building from source on a CPU-only machine?
 								When building from source (e.g. `pip install -e .`), the build may try to infer the target chip via `npu-smi`. If `npu-smi` is not available (common in CPU-only build environments), you must set `SOC_VERSION` manually before installation.
 								You can use the defaults from `Dockerfile*` as a reference. For example:
 								```bash
 								# Atlas A2
 								export SOC_VERSION="ascend910b1"
 								# Atlas A3
 								export SOC_VERSION="ascend910_9391"
 								# Atlas 300I
 								export SOC_VERSION="ascend310p1"
 								# Atlas A5 (Ascend 950 series)
 								export SOC_VERSION="<value starting with ascend950>"
 								```
-												[v0.18.0][Triton] Fix triton-ascend version in Dockerfile (#7766)

### What this PR does / why we need it?
Triton-ascend occasionally encounters compilation errors, which is a
known issue in triton-ascend 3.2.0. However, we want to use the official
version rather than the development version, so we only changed the
triton-ascend version in the Dockerfile and added a FAQ to explain this
issue.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
											
										
										
											2026-03-30 14:43:16 +08:00
 								### 24. Compilation error occasionally encounters with triton-ascend
 								As shown in [#7782](https://github.com/vllm-project/vllm-ascend/issues/7782), triton-ascend occasionally encounters compilation errors, which is a known issue in triton-ascend 3.2.0. To avoid this issue, please use the official docker images or install the specific triton-ascend version as following:
 								```bash
 								PYTHON_TAG=$(python3 -c "import sys; print(f'cp{sys.version_info.major}{sys.version_info.minor}')") && \
 								ARCH=$(python3 -c "import platform; machine = platform.machine().lower(); arch_map = {'x86_64': 'x86_64', 'amd64': 'x86_64', 'aarch64': 'aarch64', 'arm64': 'aarch64'}; print(arch_map.get(machine, machine))") && \
 								TRITON_ASCEND_WHEEL="triton_ascend-3.2.0.dev20260322-${PYTHON_TAG}-${PYTHON_TAG}-manylinux_2_27_${ARCH}.manylinux_2_28_${ARCH}.whl" && \
 								python3 -m pip install "https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/${TRITON_ASCEND_WHEEL}"
 								```