xc-llm-ascend/docs/source/faqs.md

# FAQs

## Version Specific FAQs

- [[v0.7.1rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/19)
- [[v0.7.3rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/267)
- [[v0.7.3rc2] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/418)

## General FAQs

### 1. What devices are currently supported?

Currently, **ONLY Atlas A2 series**  (Ascend-cann-kernels-910b) are supported:

- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
- Atlas 800I A2 Inference series (Atlas 800I A2)

Below series are NOT supported yet:
- Atlas 300I Duo、Atlas 300I Pro (Ascend-cann-kernels-310p) might be supported on 2025.Q2
- Atlas 200I A2 (Ascend-cann-kernels-310b) unplanned yet
- Ascend 910, Ascend 910 Pro B (Ascend-cann-kernels-910) unplanned yet

From a technical view, vllm-ascend support would be possible if the torch-npu is supported. Otherwise, we have to implement it by using custom ops. We are also welcome to join us to improve together.

### 2. How to get our docker containers?

You can get our containers at `Quay.io`, e.g., [<u>vllm-ascend</u>](https://quay.io/repository/ascend/vllm-ascend?tab=tags) and [<u>cann</u>](https://quay.io/repository/ascend/cann?tab=tags).

If you are in China, you can use `daocloud` to accelerate your downloading:

1) Open `daemon.json`:

```bash
vi /etc/docker/daemon.json
```

2) Add `https://docker.m.daocloud.io` to `registry-mirrors`:

```json
{
  "registry-mirrors": [
        "https://docker.m.daocloud.io"
    ]
}
```

3) Restart your docker service:

```bash
sudo systemctl daemon-reload
sudo systemctl restart docker
```

After configuration, you can download our container from `m.daocloud.io/quay.io/ascend/vllm-ascend:v0.7.3rc2`.

### 3. What models does vllm-ascend supports?

Currently, we have already fully tested and supported `Qwen` / `Deepseek` (V0 only) / `Llama` models, other models we have tested are shown [<u>here</u>](https://vllm-ascend.readthedocs.io/en/latest/user_guide/supported_models.html). Plus, according to users' feedback, `gemma3` and `glm4` are not supported yet. Besides, more models need test.

### 4. How to get in touch with our community?

There are many channels that you can communicate with our community developers / users:

- Submit a GitHub [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues?page=1).
- Join our [<u>weekly meeting</u>](https://docs.google.com/document/d/1hCSzRTMZhIB8vRq1_qOOjx4c9uYUxvdQvDsMV2JcSrw/edit?tab=t.0#heading=h.911qu8j8h35z) and share your ideas.
- Join our [<u>WeChat</u>](https://github.com/vllm-project/vllm-ascend/issues/227) group and ask your quenstions.
- Join our ascend channel in [<u>vLLM forums</u>](https://discuss.vllm.ai/c/hardware-support/vllm-ascend-support/6) and publish your topics.

### 5. What features does vllm-ascend V1 supports?

Find more details [<u>here</u>](https://github.com/vllm-project/vllm-ascend/issues/414).

### 6. How to solve the problem of "Failed to infer device type" or "libatb.so: cannot open shared object file"?

Basically, the reason is that the NPU environment is not configured correctly. You can:
1. try `source /usr/local/Ascend/nnal/atb/set_env.sh` to enable NNAL package.
2. try `source /usr/local/Ascend/ascend-toolkit/set_env.sh` to enable CANN package.
3. try `npu-smi info` to check whether the NPU is working.

If all above steps are not working, you can try the following code with python to check whether there is any error:

```
import torch
import torch_npu
import vllm
```

If all above steps are not working, feel free to submit a GitHub issue.

### 7. Does vllm-ascend support Atlas 300I Duo?

No, vllm-ascend now only supports Atlas A2 series. We are working on it.

### 8. How does vllm-ascend perform?

Currently, only some models are improved. Such as `Qwen2 VL`, `Deepseek  V3`. Others are not good enough. In the future, we will support graph mode and custom ops to improve the performance of vllm-ascend. And when the official release of vllm-ascend is released, you can install `mindie-turbo` with `vllm-ascend` to speed up the inference as well.

### 9. How vllm-ascend work with vllm?
vllm-ascend is a plugin for vllm. Basically, the version of vllm-ascend is the same as the version of vllm. For example, if you use vllm 0.7.3, you should use vllm-ascend 0.7.3 as well. For main branch, we will make sure `vllm-ascend` and `vllm` are compatible by each commit.

### 10. Does vllm-ascend support Prefill Disaggregation feature?

Currently, only 1P1D is supported by vllm. For vllm-ascend, it'll be done by [this PR](https://github.com/vllm-project/vllm-ascend/pull/432). For NPND, vllm is not stable and fully supported yet. We will make it stable and supported by vllm-ascend in the future.

### 11. Does vllm-ascend support quantization method?

Currently, there is no quantization method supported in vllm-ascend originally. And the quantization supported is working in progress, w8a8 will firstly be supported.

### 12. How to run w8a8 DeepSeek model?

Currently, running on v0.7.3, we should run w8a8 with vllm + vllm-ascend + mindie-turbo. And we only need vllm + vllm-ascend when v0.8.X is released. After installing the above packages, you can follow the steps below to run w8a8 DeepSeek:

1. Quantize bf16 DeepSeek, e.g. [unsloth/DeepSeek-R1-BF16](https://modelscope.cn/models/unsloth/DeepSeek-R1-BF16), with msModelSlim to get w8a8 DeepSeek. Find more details in [msModelSlim doc](https://gitee.com/ascend/msit/tree/master/msmodelslim/msmodelslim/pytorch/llm_ptq)
2. Copy the content of `quant_model_description_w8a8_dynamic.json` into the `quantization_config` of `config.json` of the quantized model files.
3. Reference with the quantized DeepSeek model.

### 13. There is not output in log when loading models using vllm-ascend, How to solve it?

If you're using vllm 0.7.3 version, this is a known progress bar display issue in VLLM, which has been resolved in [this PR](https://github.com/vllm-project/vllm/pull/12428), please cherry-pick it locally by yourself. Otherwise, please fill up an issue.

### 14. How vllm-ascend is tested

vllm-ascend is tested by functionnal test, performance test and accuracy test.

- **Functionnal test**: we added CI, includes portion of vllm's native unit tests and vllm-ascend's own unit tests，on vllm-ascend's test, we test basic functional usability for popular models, include `Qwen2.5-7B-Instruct`、 `Qwen2.5-VL-7B-Instruct`、`Qwen2.5-VL-32B-Instruct`、`QwQ-32B`.

- **Performance test**: we provide [benchmark](https://github.com/vllm-project/vllm-ascend/tree/main/benchmarks) tools for end-to-end performance benchmark which can easily to re-route locally, we'll publish a perf website like [vllm](https://simon-mo-workspace.observablehq.cloud/vllm-dashboard-v0/perf) does to show the performance test results for each pull request

- **Accuracy test**: we're working on adding accuracy test to CI as well.

Finnall, for each release, we'll publish the performance test and accuracy test report in the future.
-												[Doc] Add initial FAQs (#247)

### What this PR does / why we need it?
Add initial FAQs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-03-06 10:42:42 +08:00
+								# FAQs
 								## Version Specific FAQs
 								- [[v0.7.1rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/19)
-												[Doc] Add the release note for 0.7.3rc1 (#285)

Add the release note for 0.7.3rc1

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-03-13 17:57:06 +08:00
+								- [[v0.7.3rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/267)
-												[Doc] Add 0.7.3rc2 release note (#419)

Add 0.7.3rc2 release note. We'll release 0.7.3rc2 right now.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-03-29 09:02:08 +08:00
+								- [[v0.7.3rc2] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/418)
-												[Doc] Add initial FAQs (#247)

### What this PR does / why we need it?
Add initial FAQs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-03-06 10:42:42 +08:00
 								## General FAQs
 								### 1. What devices are currently supported?
 								Currently, **ONLY Atlas A2 series**  (Ascend-cann-kernels-910b) are supported:
 								- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
 								- Atlas 800I A2 Inference series (Atlas 800I A2)
 								Below series are NOT supported yet:
 								- Atlas 300I Duo、Atlas 300I Pro (Ascend-cann-kernels-310p) might be supported on 2025.Q2
 								- Atlas 200I A2 (Ascend-cann-kernels-310b) unplanned yet
 								- Ascend 910, Ascend 910 Pro B (Ascend-cann-kernels-910) unplanned yet
 								From a technical view, vllm-ascend support would be possible if the torch-npu is supported. Otherwise, we have to implement it by using custom ops. We are also welcome to join us to improve together.
-												[Doc] Update FAQ doc (#504)

### What this PR does / why we need it?
Update FAQ doc.
---------

Signed-off-by: shen-shanshan <467638484@qq.com>
											
										
										
											2025-04-14 11:11:40 +08:00
 								### 2. How to get our docker containers?
 								You can get our containers at `Quay.io`, e.g., [<u>vllm-ascend</u>](https://quay.io/repository/ascend/vllm-ascend?tab=tags) and [<u>cann</u>](https://quay.io/repository/ascend/cann?tab=tags).
 								If you are in China, you can use `daocloud` to accelerate your downloading:
 ) Open `daemon.json`:
 								```bash
 								vi /etc/docker/daemon.json
 								```
 ) Add `https://docker.m.daocloud.io` to `registry-mirrors`:
 								```json
 								{
 								  "registry-mirrors": [
 								        "https://docker.m.daocloud.io"
 								    ]
 								}
 								```
 ) Restart your docker service:
 								```bash
 								sudo systemctl daemon-reload
 								sudo systemctl restart docker
 								```
 								After configuration, you can download our container from `m.daocloud.io/quay.io/ascend/vllm-ascend:v0.7.3rc2`.
 								### 3. What models does vllm-ascend supports?
-												[Doc] Update FAQ (#518)

Update FAQ

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-04-15 10:17:56 +08:00
+								Currently, we have already fully tested and supported `Qwen` / `Deepseek` (V0 only) / `Llama` models, other models we have tested are shown [<u>here</u>](https://vllm-ascend.readthedocs.io/en/latest/user_guide/supported_models.html). Plus, according to users' feedback, `gemma3` and `glm4` are not supported yet. Besides, more models need test.
-												[Doc] Update FAQ doc (#504)

### What this PR does / why we need it?
Update FAQ doc.
---------

Signed-off-by: shen-shanshan <467638484@qq.com>
											
										
										
											2025-04-14 11:11:40 +08:00
 								### 4. How to get in touch with our community?
 								There are many channels that you can communicate with our community developers / users:
 								- Submit a GitHub [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues?page=1).
 								- Join our [<u>weekly meeting</u>](https://docs.google.com/document/d/1hCSzRTMZhIB8vRq1_qOOjx4c9uYUxvdQvDsMV2JcSrw/edit?tab=t.0#heading=h.911qu8j8h35z) and share your ideas.
 								- Join our [<u>WeChat</u>](https://github.com/vllm-project/vllm-ascend/issues/227) group and ask your quenstions.
 								- Join our ascend channel in [<u>vLLM forums</u>](https://discuss.vllm.ai/c/hardware-support/vllm-ascend-support/6) and publish your topics.
 								### 5. What features does vllm-ascend V1 supports?
 								Find more details [<u>here</u>](https://github.com/vllm-project/vllm-ascend/issues/414).
-												[Doc] Update FAQ (#518)

Update FAQ

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-04-15 10:17:56 +08:00
 								### 6. How to solve the problem of "Failed to infer device type" or "libatb.so: cannot open shared object file"?
-												[MISC] Add patch module (#526)

This PR added patch module for vllm
1. platform patch: the patch will be registered when load the platform
2. worker patch: the patch will be registered when worker is started.

The detail is:
1. patch_common: patch for main and 0.8.4 version
4. patch_main: patch for main verison
5. patch_0_8_4: patch for 0.8.4 version
											
										
										
											2025-04-16 09:28:58 +08:00
+								Basically, the reason is that the NPU environment is not configured correctly. You can:
 . try `source /usr/local/Ascend/nnal/atb/set_env.sh` to enable NNAL package.
 . try `source /usr/local/Ascend/ascend-toolkit/set_env.sh` to enable CANN package.
 . try `npu-smi info` to check whether the NPU is working.
 								If all above steps are not working, you can try the following code with python to check whether there is any error:
 								```
 								import torch
 								import torch_npu
 								import vllm
 								```
 								If all above steps are not working, feel free to submit a GitHub issue.
-												[Doc] Update FAQ (#518)

Update FAQ

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-04-15 10:17:56 +08:00
 								### 7. Does vllm-ascend support Atlas 300I Duo?
 								No, vllm-ascend now only supports Atlas A2 series. We are working on it.
 								### 8. How does vllm-ascend perform?
-												[MISC] Add patch module (#526)

This PR added patch module for vllm
1. platform patch: the patch will be registered when load the platform
2. worker patch: the patch will be registered when worker is started.

The detail is:
1. patch_common: patch for main and 0.8.4 version
4. patch_main: patch for main verison
5. patch_0_8_4: patch for 0.8.4 version
											
										
										
											2025-04-16 09:28:58 +08:00
+								Currently, only some models are improved. Such as `Qwen2 VL`, `Deepseek  V3`. Others are not good enough. In the future, we will support graph mode and custom ops to improve the performance of vllm-ascend. And when the official release of vllm-ascend is released, you can install `mindie-turbo` with `vllm-ascend` to speed up the inference as well.
-												[Doc] Update FAQ (#518)

Update FAQ

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-04-15 10:17:56 +08:00
 								### 9. How vllm-ascend work with vllm?
-												[MISC] Add patch module (#526)

This PR added patch module for vllm
1. platform patch: the patch will be registered when load the platform
2. worker patch: the patch will be registered when worker is started.

The detail is:
1. patch_common: patch for main and 0.8.4 version
4. patch_main: patch for main verison
5. patch_0_8_4: patch for 0.8.4 version
											
										
										
											2025-04-16 09:28:58 +08:00
+								vllm-ascend is a plugin for vllm. Basically, the version of vllm-ascend is the same as the version of vllm. For example, if you use vllm 0.7.3, you should use vllm-ascend 0.7.3 as well. For main branch, we will make sure `vllm-ascend` and `vllm` are compatible by each commit.
-												[Doc] Update FAQ (#518)

Update FAQ

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-04-15 10:17:56 +08:00
 								### 10. Does vllm-ascend support Prefill Disaggregation feature?
 								Currently, only 1P1D is supported by vllm. For vllm-ascend, it'll be done by [this PR](https://github.com/vllm-project/vllm-ascend/pull/432). For NPND, vllm is not stable and fully supported yet. We will make it stable and supported by vllm-ascend in the future.
-												[Doc] update faq about w8a8 (#534)

update faq about w8a8

---------

Signed-off-by: Mengqing Cao <cmq0113@163.com>
											
										
										
											2025-04-16 09:37:21 +08:00
 								### 11. Does vllm-ascend support quantization method?
 								Currently, there is no quantization method supported in vllm-ascend originally. And the quantization supported is working in progress, w8a8 will firstly be supported.
 								### 12. How to run w8a8 DeepSeek model?
 								Currently, running on v0.7.3, we should run w8a8 with vllm + vllm-ascend + mindie-turbo. And we only need vllm + vllm-ascend when v0.8.X is released. After installing the above packages, you can follow the steps below to run w8a8 DeepSeek:
 . Quantize bf16 DeepSeek, e.g. [unsloth/DeepSeek-R1-BF16](https://modelscope.cn/models/unsloth/DeepSeek-R1-BF16), with msModelSlim to get w8a8 DeepSeek. Find more details in [msModelSlim doc](https://gitee.com/ascend/msit/tree/master/msmodelslim/msmodelslim/pytorch/llm_ptq)
 . Copy the content of `quant_model_description_w8a8_dynamic.json` into the `quantization_config` of `config.json` of the quantized model files.
 . Reference with the quantized DeepSeek model.
-												[Doc] update faq about progress bar display issue (#538)

### What this PR does / why we need it?
update faq about progress bar display issue

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
											
										
										
											2025-04-16 16:07:08 +08:00
 								### 13. There is not output in log when loading models using vllm-ascend, How to solve it?
-												[Doc]Update faq (#536)

### What this PR does / why we need it?
update performance and accuracy faq

Signed-off-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-04-17 14:56:51 +08:00
+								If you're using vllm 0.7.3 version, this is a known progress bar display issue in VLLM, which has been resolved in [this PR](https://github.com/vllm-project/vllm/pull/12428), please cherry-pick it locally by yourself. Otherwise, please fill up an issue.
 								### 14. How vllm-ascend is tested
 								vllm-ascend is tested by functionnal test, performance test and accuracy test.
 								- **Functionnal test**: we added CI, includes portion of vllm's native unit tests and vllm-ascend's own unit tests，on vllm-ascend's test, we test basic functional usability for popular models, include `Qwen2.5-7B-Instruct`、 `Qwen2.5-VL-7B-Instruct`、`Qwen2.5-VL-32B-Instruct`、`QwQ-32B`.
 								- **Performance test**: we provide [benchmark](https://github.com/vllm-project/vllm-ascend/tree/main/benchmarks) tools for end-to-end performance benchmark which can easily to re-route locally, we'll publish a perf website like [vllm](https://simon-mo-workspace.observablehq.cloud/vllm-dashboard-v0/perf) does to show the performance test results for each pull request
 								- **Accuracy test**: we're working on adding accuracy test to CI as well.
 								Finnall, for each release, we'll publish the performance test and accuracy test report in the future.