[Doc][Misc] Restructure tutorial documentation (#6501)
### What this PR does / why we need it? This PR refactors the tutorial documentation by restructuring it into three categories: Models, Features, and Hardware. This improves the organization and navigation of the tutorials, making it easier for users to find relevant information. - The single `tutorials/index.md` is split into three separate index files: - `docs/source/tutorials/models/index.md` - `docs/source/tutorials/features/index.md` - `docs/source/tutorials/hardwares/index.md` - Existing tutorial markdown files have been moved into their respective new subdirectories (`models/`, `features/`, `hardwares/`). - The main `index.md` has been updated to link to these new tutorial sections. This change makes the documentation structure more logical and scalable for future additions. ### Does this PR introduce _any_ user-facing change? Yes, this PR changes the structure and URLs of the tutorial documentation pages. Users following old links to tutorials will encounter broken links. It is recommended to set up redirects if the documentation framework supports them. ### How was this patch tested? These are documentation-only changes. The documentation should be built and reviewed locally to ensure all links are correct and the pages render as expected. - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
@@ -35,7 +35,9 @@ By using vLLM Ascend plugin, popular open-source models, including Transformer-l
|
||||
:maxdepth: 1
|
||||
quick_start
|
||||
installation
|
||||
tutorials/index.md
|
||||
tutorials/models/index
|
||||
tutorials/features/index
|
||||
tutorials/hardwares/index
|
||||
faqs
|
||||
:::
|
||||
|
||||
|
||||
@@ -136,7 +136,7 @@ pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/si
|
||||
|
||||
```bash
|
||||
# For torch-npu dev version or x86 machine
|
||||
pip config set global.extra-index-url "https://download.pytorch.org/whl/cpu/ https://mirrors.huaweicloud.com/ascend/repos/pypi"
|
||||
pip config set global.extra-index-url "https://download.pytorch.org/whl/cpu/"
|
||||
```
|
||||
|
||||
Then you can install `vllm` and `vllm-ascend` from **pre-built wheel**:
|
||||
@@ -187,12 +187,12 @@ Supported images as following.
|
||||
|
||||
| image name | Hardware | OS |
|
||||
|-|-|-|
|
||||
| image-tag | Atlas A2 | Ubuntu |
|
||||
| image-tag-openeuler | Atlas A2 | openEuler |
|
||||
| image-tag-a3 | Atlas A3 | Ubuntu |
|
||||
| image-tag-a3-openeuler | Atlas A3 | openEuler |
|
||||
| image-tag-310p | Atlas 300I | Ubuntu |
|
||||
| image-tag-310p-openeuler | Atlas 300I | openEuler |
|
||||
| vllm-ascend:<image-tag> | Atlas A2 | Ubuntu |
|
||||
| vllm-ascend:<image-tag>-openeuler | Atlas A2 | openEuler |
|
||||
| vllm-ascend:<image-tag>-a3 | Atlas A3 | Ubuntu |
|
||||
| vllm-ascend:<image-tag>-a3-openeuler | Atlas A3 | openEuler |
|
||||
| vllm-ascend:<image-tag>-310p | Atlas 300I | Ubuntu |
|
||||
| vllm-ascend:<image-tag>-310p-openeuler | Atlas 300I | openEuler |
|
||||
|
||||
:::{dropdown} Click here to see "Build from Dockerfile"
|
||||
or build IMAGE from **source code**:
|
||||
@@ -258,7 +258,7 @@ prompts = [
|
||||
# Create a sampling params object.
|
||||
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
|
||||
# Create an LLM.
|
||||
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")
|
||||
llm = LLM(model="Qwen/Qwen3-0.6B")
|
||||
|
||||
# Generate texts from the prompts.
|
||||
outputs = llm.generate(prompts, sampling_params)
|
||||
@@ -277,7 +277,7 @@ python example.py
|
||||
If you encounter a connection error with Hugging Face (e.g., `We couldn't connect to 'https://huggingface.co' to load the files, and couldn't find them in the cached files.`), run the following commands to use ModelScope as an alternative:
|
||||
|
||||
```bash
|
||||
export VLLM_USE_MODELSCOPE = true
|
||||
export VLLM_USE_MODELSCOPE=true
|
||||
pip install modelscope
|
||||
python example.py
|
||||
```
|
||||
@@ -292,7 +292,7 @@ INFO 02-18 08:49:58 __init__.py:34] set environment variable VLLM_PLUGINS to con
|
||||
INFO 02-18 08:49:58 __init__.py:42] plugin ascend loaded.
|
||||
INFO 02-18 08:49:58 __init__.py:174] Platform plugin ascend is activated
|
||||
INFO 02-18 08:50:12 config.py:526] This model supports multiple tasks: {'embed', 'classify', 'generate', 'score', 'reward'}. Defaulting to 'generate'.
|
||||
INFO 02-18 08:50:12 llm_engine.py:232] Initializing a V0 LLM engine (v0.7.1) with config: model='./Qwen2.5-0.5B-Instruct', speculative_config=None, tokenizer='./Qwen2.5-0.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=./Qwen2.5-0.5B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
|
||||
INFO 02-18 08:50:12 llm_engine.py:232] Initializing a V0 LLM engine (v0.7.1) with config: model='./Qwen3-0.6B', speculative_config=None, tokenizer='./Qwen3-0.6B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=./Qwen3-0.6B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
|
||||
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
|
||||
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 5.86it/s]
|
||||
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 5.85it/s]
|
||||
|
||||
@@ -114,7 +114,7 @@ prompts = [
|
||||
]
|
||||
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
|
||||
# The first run will take about 3-5 mins (10 MB/s) to download models
|
||||
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")
|
||||
llm = LLM(model="Qwen/Qwen3-0.6B")
|
||||
|
||||
outputs = llm.generate(prompts, sampling_params)
|
||||
|
||||
@@ -130,13 +130,13 @@ for output in outputs:
|
||||
|
||||
vLLM can also be deployed as a server that implements the OpenAI API protocol. Run
|
||||
the following command to start the vLLM server with the
|
||||
[Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) model:
|
||||
[Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) model:
|
||||
|
||||
<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
|
||||
|
||||
```bash
|
||||
# Deploy vLLM server (The first run will take about 3-5 mins (10 MB/s) to download models)
|
||||
vllm serve Qwen/Qwen2.5-0.5B-Instruct &
|
||||
vllm serve Qwen/Qwen3-0.6B &
|
||||
```
|
||||
|
||||
If you see a log as below:
|
||||
@@ -166,7 +166,7 @@ You can also query the model with input prompts:
|
||||
curl http://localhost:8000/v1/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "Qwen/Qwen2.5-0.5B-Instruct",
|
||||
"model": "Qwen/Qwen3-0.6B",
|
||||
"prompt": "Beijing is a",
|
||||
"max_completion_tokens": 5,
|
||||
"temperature": 0
|
||||
|
||||
15
docs/source/tutorials/features/index.md
Normal file
15
docs/source/tutorials/features/index.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Feature Tutorials
|
||||
|
||||
This section provides tutorials for different features of vLLM Ascend.
|
||||
|
||||
:::{toctree}
|
||||
:caption: Feature Tutorials
|
||||
:maxdepth: 1
|
||||
pd_colocated_mooncake_multi_instance
|
||||
pd_disaggregation_mooncake_single_node
|
||||
pd_disaggregation_mooncake_multi_node
|
||||
long_sequence_context_parallel_single_node
|
||||
long_sequence_context_parallel_multi_node
|
||||
suffix_speculative_decoding
|
||||
ray
|
||||
:::
|
||||
@@ -20,13 +20,13 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
### Verify Multi-node Communication
|
||||
|
||||
Refer to [verify multi-node communication environment](../installation.md#verify-multi-node-communication) to verify multi-node communication.
|
||||
Refer to [verify multi-node communication environment](../../installation.md#verify-multi-node-communication) to verify multi-node communication.
|
||||
|
||||
### Installation
|
||||
|
||||
You can use our official docker image to run `DeepSeek-V3.1` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
@@ -331,7 +331,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
2. After execution, you can get the result, here is the result of `DeepSeek-V3.1-w8a8` for reference only.
|
||||
|
||||
@@ -343,7 +343,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
|
||||
### Using vLLM Benchmark
|
||||
|
||||
@@ -139,7 +139,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
2. After execution, you can get the result, here is the result of `Qwen3-235B-A22B-w8a8` for reference only.
|
||||
|
||||
@@ -151,7 +151,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
|
||||
### Using vLLM Benchmark
|
||||
|
||||
9
docs/source/tutorials/hardwares/index.md
Normal file
9
docs/source/tutorials/hardwares/index.md
Normal file
@@ -0,0 +1,9 @@
|
||||
# Hardware Tutorials
|
||||
|
||||
This section provides tutorials on different hardware of vLLM Ascend.
|
||||
|
||||
:::{toctree}
|
||||
:caption: Hardware Tutorials
|
||||
:maxdepth: 1
|
||||
310p
|
||||
:::
|
||||
@@ -7,9 +7,9 @@ This article takes the `DeepSeek-R1-W8A8` version as an example to introduce the
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -21,13 +21,13 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
### Verify Multi-node Communication(Optional)
|
||||
|
||||
If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../installation.md#verify-multi-node-communication).
|
||||
If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../../installation.md#verify-multi-node-communication).
|
||||
|
||||
### Installation
|
||||
|
||||
You can use our official docker image to run `DeepSeek-R1-W8A8` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
@@ -254,7 +254,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
2. After execution, you can get the result, here is the result of `DeepSeek-R1-W8A8` in `vllm-ascend:0.11.0rc2` for reference only.
|
||||
|
||||
@@ -267,7 +267,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
As an example, take the `gsm8k` dataset as a test dataset, and run accuracy evaluation of `DeepSeek-R1-W8A8` in online mode.
|
||||
|
||||
1. Refer to [Using lm_eval](../developer_guide/evaluation/using_lm_eval.md) for `lm_eval` installation.
|
||||
1. Refer to [Using lm_eval](../../developer_guide/evaluation/using_lm_eval.md) for `lm_eval` installation.
|
||||
|
||||
2. Run `lm_eval` to execute the accuracy evaluation.
|
||||
|
||||
@@ -285,7 +285,7 @@ lm_eval \
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
|
||||
### Using vLLM Benchmark
|
||||
|
||||
@@ -16,9 +16,9 @@ This document will show the main verification steps of the model, including supp
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -34,13 +34,13 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
### Verify Multi-node Communication(Optional)
|
||||
|
||||
If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../installation.md#verify-multi-node-communication).
|
||||
If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../../installation.md#verify-multi-node-communication).
|
||||
|
||||
### Installation
|
||||
|
||||
You can use our official docker image to run `DeepSeek-V3.1` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
@@ -252,7 +252,7 @@ vllm serve /weights/DeepSeek-V3.1-w8a8-mtp-QuaRot \
|
||||
|
||||
### Prefill-Decode Disaggregation
|
||||
|
||||
We recommend using Mooncake for deployment: [Mooncake](./pd_disaggregation_mooncake_multi_node.md).
|
||||
We recommend using Mooncake for deployment: [Mooncake](../features/pd_disaggregation_mooncake_multi_node.md).
|
||||
|
||||
Take Atlas 800 A3 (64G × 16) for example, we recommend to deploy 2P1D (4 nodes) rather than 1P1D (2 nodes), because there is no enough NPU memory to serve high concurrency in 1P1D case.
|
||||
|
||||
@@ -672,7 +672,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
2. After execution, you can get the result, here is the result of `DeepSeek-V3.1-w8a8-mtp-QuaRot` in `vllm-ascend:0.11.0rc1` for reference only.
|
||||
|
||||
@@ -689,7 +689,7 @@ Not test yet.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
|
||||
The performance result is:
|
||||
|
||||
@@ -8,9 +8,9 @@ This document will show the main verification steps of the model, including supp
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -25,7 +25,7 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
### Verify Multi-node Communication(Optional)
|
||||
|
||||
If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../installation.md#verify-multi-node-communication).
|
||||
If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../../installation.md#verify-multi-node-communication).
|
||||
|
||||
### Installation
|
||||
|
||||
@@ -116,7 +116,7 @@ docker run --rm \
|
||||
|
||||
In addition, if you don't want to use the docker image as above, you can also build all from source:
|
||||
|
||||
- Install `vllm-ascend` from source, refer to [installation](../installation.md).
|
||||
- Install `vllm-ascend` from source, refer to [installation](../../installation.md).
|
||||
|
||||
If you want to deploy multi-node environment, you need to set up environment on each node.
|
||||
|
||||
@@ -851,7 +851,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
2. After execution, you can get the result.
|
||||
|
||||
@@ -859,7 +859,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
As an example, take the `gsm8k` dataset as a test dataset, and run accuracy evaluation of `DeepSeek-V3.2-W8A8` in online mode.
|
||||
|
||||
1. Refer to [Using lm_eval](../developer_guide/evaluation/using_lm_eval.md) for `lm_eval` installation.
|
||||
1. Refer to [Using lm_eval](../../developer_guide/evaluation/using_lm_eval.md) for `lm_eval` installation.
|
||||
|
||||
2. Run `lm_eval` to execute the accuracy evaluation.
|
||||
|
||||
@@ -877,7 +877,7 @@ lm_eval \
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
|
||||
The performance result is:
|
||||
|
||||
@@ -10,9 +10,9 @@ This document will show the main verification steps of the model, including supp
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -31,7 +31,7 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
You can use our official docker image to run `GLM-4.x` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
@@ -121,7 +121,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
2. After execution, you can get the result, here is the result of `GLM4.6` in `vllm-ascend:main` (after `vllm-ascend:0.13.0rc1`) for reference only.
|
||||
|
||||
@@ -138,7 +138,7 @@ Not test yet.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
|
||||
### Using vLLM Benchmark
|
||||
|
||||
@@ -24,7 +24,7 @@ It is recommended to download the model weights to a local directory (e.g., `./P
|
||||
|
||||
You can use our official docker image to run `PaddleOCR-VL` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
@@ -10,9 +10,9 @@ This tutorial uses the vLLM-Ascend `v0.11.0rc3-a3` version for demonstration, sh
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -484,7 +484,7 @@ You can refer to the [monitoring configuration](https://github.com/vllm-project/
|
||||
|
||||
As an example, take the `mmmu_val` dataset as a test dataset, and run accuracy evaluation of `Qwen3-VL-8B-Instruct` in offline mode.
|
||||
|
||||
1. Refer to [Using lm_eval](../developer_guide/evaluation/using_lm_eval.md) for more details on `lm_eval` installation.
|
||||
1. Refer to [Using lm_eval](../../developer_guide/evaluation/using_lm_eval.md) for more details on `lm_eval` installation.
|
||||
|
||||
```shell
|
||||
pip install lm_eval
|
||||
@@ -515,7 +515,7 @@ lm_eval \
|
||||
|
||||
As an example, take the `mmmu_val` dataset as a test dataset, and run accuracy evaluation of `Qwen2.5-VL-32B-Instruct` in offline mode.
|
||||
|
||||
1. Refer to [Using lm_eval](../developer_guide/evaluation/using_lm_eval.md) for more details on `lm_eval` installation.
|
||||
1. Refer to [Using lm_eval](../../developer_guide/evaluation/using_lm_eval.md) for more details on `lm_eval` installation.
|
||||
|
||||
```shell
|
||||
pip install lm_eval
|
||||
@@ -10,9 +10,9 @@ The `Qwen2.5-7B-Instruct` model was supported since `vllm-ascend:v0.9.0`.
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -138,7 +138,7 @@ A valid response (e.g., `"Beijing is a vibrant and historic capital city"`) indi
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
Results and logs are saved to `benchmark/outputs/default/`. A sample accuracy report is shown below:
|
||||
|
||||
@@ -150,7 +150,7 @@ Results and logs are saved to `benchmark/outputs/default/`. A sample accuracy re
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
|
||||
### Using vLLM Benchmark
|
||||
|
||||
@@ -8,9 +8,9 @@ The `Qwen2.5-Omni` model was supported since `vllm-ascend:v0.11.0rc0`. This docu
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -25,7 +25,7 @@ Following examples use the 7B version by default.
|
||||
|
||||
You can use our official docker image to run `Qwen2.5-Omni` directly.
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
@@ -174,7 +174,7 @@ Qwen2.5-Omni on vllm-ascend has been test on AISBench.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
2. After execution, you can get the result, here is the result of `Qwen2.5-Omni-7B` with `vllm-ascend:0.11.0rc0` for reference only.
|
||||
|
||||
@@ -187,7 +187,7 @@ Qwen2.5-Omni on vllm-ascend has been test on AISBench.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
|
||||
### Using vLLM Benchmark
|
||||
|
||||
@@ -10,9 +10,9 @@ The `Qwen3-235B-A22B` model is first supported in `vllm-ascend:v0.8.4rc2`.
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -25,7 +25,7 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
### Verify Multi-node Communication(Optional)
|
||||
|
||||
If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../installation.md#verify-multi-node-communication).
|
||||
If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../../installation.md#verify-multi-node-communication).
|
||||
|
||||
### Installation
|
||||
|
||||
@@ -34,7 +34,7 @@ If you want to deploy multi-node environment, you need to verify multi-node comm
|
||||
|
||||
For example, using images `quay.io/ascend/vllm-ascend:v0.11.0rc2`(for Atlas 800 A2) and `quay.io/ascend/vllm-ascend:v0.11.0rc2-a3`(for Atlas 800 A3).
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
@@ -76,7 +76,7 @@ Select an image based on your machine type and start the docker image on your no
|
||||
|
||||
You can build all from source.
|
||||
|
||||
- Install `vllm-ascend`, refer to [set up using python](../installation.md#set-up-using-python).
|
||||
- Install `vllm-ascend`, refer to [set up using python](../../installation.md#set-up-using-python).
|
||||
|
||||
::::
|
||||
:::::
|
||||
@@ -253,11 +253,11 @@ INFO: Application startup complete.
|
||||
|
||||
### Multi-node Deployment with Ray
|
||||
|
||||
- refer to [Ray Distributed (Qwen/Qwen3-235B-A22B)](./ray.md).
|
||||
- refer to [Ray Distributed (Qwen/Qwen3-235B-A22B)](../features/ray.md).
|
||||
|
||||
### Prefill-Decode Disaggregation
|
||||
|
||||
- refer to [Prefill-Decode Disaggregation Mooncake Verification (Qwen)](./pd_disaggregation_mooncake_multi_node.md)
|
||||
- refer to [Prefill-Decode Disaggregation Mooncake Verification (Qwen)](../features/pd_disaggregation_mooncake_multi_node.md)
|
||||
|
||||
## Functional Verification
|
||||
|
||||
@@ -280,7 +280,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
2. After execution, you can get the result, here is the result of `Qwen3-235B-A22B-w8a8` in `vllm-ascend:0.11.0rc0` for reference only.
|
||||
|
||||
@@ -292,7 +292,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
|
||||
### Using vLLM Benchmark
|
||||
|
||||
@@ -8,9 +8,9 @@ This document will show the main verification steps of the model, including supp
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -52,7 +52,7 @@ docker run --rm \
|
||||
|
||||
In addition, if you don't want to use the docker image as above, you can also build all from source:
|
||||
|
||||
- Install `vllm-ascend` from source, refer to [installation](../installation.md).
|
||||
- Install `vllm-ascend` from source, refer to [installation](../../installation.md).
|
||||
|
||||
## Deployment
|
||||
|
||||
@@ -90,7 +90,7 @@ curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/jso
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
2. After execution, you can get the result, here is the result of `Qwen3-Coder-30B-A3B-Instruct` in `vllm-ascend:0.11.0rc0` for reference only.
|
||||
|
||||
@@ -102,4 +102,4 @@ curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/jso
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
@@ -16,9 +16,9 @@ This example requires version **v0.11.0rc2**. Earlier versions may lack certain
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -38,7 +38,7 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
### Verify Multi-node Communication(Optional)
|
||||
|
||||
If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../installation.md#verify-multi-node-communication).
|
||||
If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../../installation.md#verify-multi-node-communication).
|
||||
|
||||
### Installation
|
||||
|
||||
@@ -97,7 +97,7 @@ In the [Run docker container](./Qwen3-Dense.md#run-docker-container), detailed e
|
||||
|
||||
In addition, if you don't want to use the docker image as above, you can also build all from source:
|
||||
|
||||
- Install `vllm-ascend` from source, refer to [installation](../installation.md).
|
||||
- Install `vllm-ascend` from source, refer to [installation](../../installation.md).
|
||||
|
||||
If you want to deploy multi-node environment, you need to set up environment on each node.
|
||||
|
||||
@@ -269,7 +269,7 @@ Here is one accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
2. After execution, you can get the result, here is the result of `Qwen3-32B-W8A8` in `vllm-ascend:0.11.0rc2` for reference only.
|
||||
|
||||
@@ -283,7 +283,7 @@ Here is one accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
|
||||
### Using vLLM Benchmark
|
||||
|
||||
@@ -10,9 +10,9 @@ The `Qwen3-Next` model is first supported in `vllm-ascend:v0.10.2rc1`.
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Weight Preparation
|
||||
|
||||
@@ -134,7 +134,7 @@ Prompt: 'Who are you?', Generated text: ' What do you know about me?\n\nHello! I
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
2. After execution, you can get the result, here is the result of `Qwen3-Next-80B-A3B-Instruct` in `vllm-ascend:0.13.0rc1` for reference only.
|
||||
|
||||
@@ -146,7 +146,7 @@ Prompt: 'Who are you?', Generated text: ' What do you know about me?\n\nHello! I
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
|
||||
### Using vLLM Benchmark
|
||||
|
||||
@@ -26,7 +26,7 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
You can use our official docker image to run Qwen3-Omni-30B-A3B-Thinking directly
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
@@ -63,7 +63,7 @@ docker run --rm \
|
||||
|
||||
You can build all from source.
|
||||
|
||||
- Install `vllm-ascend`, refer to [set up using python](../installation.md#set-up-using-python).
|
||||
- Install `vllm-ascend`, refer to [set up using python](../../installation.md#set-up-using-python).
|
||||
|
||||
::::
|
||||
:::::
|
||||
@@ -10,9 +10,9 @@ This tutorial uses the vLLM-Ascend `v0.11.0rc2` version for demonstration, sho
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -24,7 +24,7 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
### Verify Multi-node Communication(Optional)
|
||||
|
||||
If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../installation.md#verify-multi-node-communication).
|
||||
If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../../installation.md#verify-multi-node-communication).
|
||||
|
||||
### Installation
|
||||
|
||||
@@ -33,7 +33,7 @@ If you want to deploy multi-node environment, you need to verify multi-node comm
|
||||
|
||||
For example, using images `quay.io/ascend/vllm-ascend:v0.11.0rc2`(for Atlas 800 A2) and `quay.io/ascend/vllm-ascend:v0.11.0rc2-a3`(for Atlas 800 A3).
|
||||
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
@@ -76,7 +76,7 @@ Select an image based on your machine type and start the docker image on your no
|
||||
|
||||
You can build all from source.
|
||||
|
||||
- Install `vllm-ascend`, refer to [set up using python](../installation.md#set-up-using-python).
|
||||
- Install `vllm-ascend`, refer to [set up using python](../../installation.md#set-up-using-python).
|
||||
|
||||
::::
|
||||
:::::
|
||||
@@ -209,11 +209,11 @@ INFO: Application startup complete.
|
||||
|
||||
### Multi-node Deployment with Ray
|
||||
|
||||
- refer to [Ray Distributed (Qwen/Qwen3-235B-A22B)](./ray.md).
|
||||
- refer to [Ray Distributed (Qwen/Qwen3-235B-A22B)](../features/ray.md).
|
||||
|
||||
### Prefill-Decode Disaggregation
|
||||
|
||||
- refer to [Prefill-Decode Disaggregation Mooncake Verification](./pd_disaggregation_mooncake_multi_node.md)
|
||||
- refer to [Prefill-Decode Disaggregation Mooncake Verification](../features/pd_disaggregation_mooncake_multi_node.md)
|
||||
|
||||
## Functional Verification
|
||||
|
||||
@@ -240,7 +240,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
|
||||
|
||||
2. After execution, you can get the result, here is the result of `Qwen3-VL-235B-A22B-Instruct` in `vllm-ascend:0.11.0rc2` for reference only.
|
||||
|
||||
@@ -252,7 +252,7 @@ Here are two accuracy evaluation methods.
|
||||
|
||||
### Using AISBench
|
||||
|
||||
Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.
|
||||
|
||||
### Using vLLM Benchmark
|
||||
|
||||
@@ -8,8 +8,8 @@ This document will show the main verification steps of the `Qwen3-VL-30B-A3B-Ins
|
||||
|
||||
## Supported Features
|
||||
|
||||
- Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
- Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
- Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
- Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -6,7 +6,7 @@ The Qwen3-VL-Embedding and Qwen3-VL-Reranker model series are the latest additio
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -21,11 +21,11 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
You can use our official docker image to run `Qwen3-VL-Embedding` series models.
|
||||
|
||||
- Start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
- Start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
If you don't want to use the docker image as above, you can also build all from source:
|
||||
|
||||
- Install `vllm-ascend` from source, refer to [installation](../installation.md).
|
||||
- Install `vllm-ascend` from source, refer to [installation](../../installation.md).
|
||||
|
||||
## Deployment
|
||||
|
||||
@@ -6,7 +6,7 @@ The Qwen3-VL-Embedding and Qwen3-VL-Reranker model series are the latest additio
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -21,11 +21,11 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
You can use our official docker image to run `Qwen3-VL-Reranker` series models.
|
||||
|
||||
- Start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
- Start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
If you don't want to use the docker image as above, you can also build all from source:
|
||||
|
||||
- Install `vllm-ascend` from source, refer to [installation](../installation.md).
|
||||
- Install `vllm-ascend` from source, refer to [installation](../../installation.md).
|
||||
|
||||
## Deployment
|
||||
|
||||
@@ -6,7 +6,7 @@ The Qwen3 Embedding model series is the latest proprietary model of the Qwen fam
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -22,11 +22,11 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
You can use our official docker image to run `Qwen3-Embedding` series models.
|
||||
|
||||
- Start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
- Start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
if you don't want to use the docker image as above, you can also build all from source:
|
||||
|
||||
- Install `vllm-ascend` from source, refer to [installation](../installation.md).
|
||||
- Install `vllm-ascend` from source, refer to [installation](../../installation.md).
|
||||
|
||||
## Deployment
|
||||
|
||||
@@ -6,7 +6,7 @@ The Qwen3 Reranker model series is the latest proprietary model of the Qwen fami
|
||||
|
||||
## Supported Features
|
||||
|
||||
Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -22,11 +22,11 @@ It is recommended to download the model weight to the shared directory of multip
|
||||
|
||||
You can use our official docker image to run `Qwen3-Reranker` series models.
|
||||
|
||||
- Start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).
|
||||
- Start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
|
||||
|
||||
if you don't want to use the docker image as above, you can also build all from source:
|
||||
|
||||
- Install `vllm-ascend` from source, refer to [installation](../installation.md).
|
||||
- Install `vllm-ascend` from source, refer to [installation](../../installation.md).
|
||||
|
||||
## Deployment
|
||||
|
||||
@@ -1,7 +1,9 @@
|
||||
# Tutorials
|
||||
# Model Tutorials
|
||||
|
||||
This section provides tutorials for different models of vLLM Ascend.
|
||||
|
||||
:::{toctree}
|
||||
:caption: Models
|
||||
:caption: Model Tutorials
|
||||
:maxdepth: 1
|
||||
Qwen2.5-Omni.md
|
||||
Qwen2.5-7B.md
|
||||
@@ -27,21 +29,3 @@ GLM4.x.md
|
||||
Kimi-K2-Thinking.md
|
||||
PaddleOCR-VL.md
|
||||
:::
|
||||
|
||||
:::{toctree}
|
||||
:caption: Features
|
||||
:maxdepth: 1
|
||||
pd_colocated_mooncake_multi_instance.md
|
||||
pd_disaggregation_mooncake_single_node.md
|
||||
pd_disaggregation_mooncake_multi_node.md
|
||||
long_sequence_context_parallel_single_node.md
|
||||
long_sequence_context_parallel_multi_node.md
|
||||
suffix_speculative_decoding.md
|
||||
ray
|
||||
:::
|
||||
|
||||
:::{toctree}
|
||||
:caption: Hardware
|
||||
:maxdepth: 1
|
||||
310p.md
|
||||
:::
|
||||
@@ -155,8 +155,6 @@ python -m vllm.entrypoints.api_server \
|
||||
--quantization ascend
|
||||
```
|
||||
|
||||
The above commands are for reference only. For more details, consult the [official guide](../../tutorials/index.md).
|
||||
|
||||
## References
|
||||
|
||||
- [ModelSlim Documentation](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/README.md)
|
||||
|
||||
@@ -16,16 +16,16 @@ Get the latest info here: <https://github.com/vllm-project/vllm-ascend/issues/16
|
||||
|
||||
| Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
|
||||
|-------------------------------|-----------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|
|
||||
| DeepSeek V3/3.1 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 240k || [DeepSeek-V3.1](../../tutorials/DeepSeek-V3.1.md) |
|
||||
| DeepSeek V3.2 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 160k | ✅ | [DeepSeek-V3.2](../../tutorials/DeepSeek-V3.2.md) |
|
||||
| DeepSeek R1 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 128k || [DeepSeek-R1](../../tutorials/DeepSeek-R1.md) |
|
||||
| Qwen3 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ ||| ✅ || ✅ | ✅ | 128k | ✅ | [Qwen3-Dense](../../tutorials/Qwen3-Dense.md) |
|
||||
| Qwen3-Coder | ✅ | | ✅ | A2/A3 ||✅|✅|✅|||✅|✅|✅|✅||||||[Qwen3-Coder-30B-A3B tutorial](../../tutorials/Qwen3-Coder-30B-A3B.md)|
|
||||
| Qwen3-Moe | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | 256k || [Qwen3-235B-A22B](../../tutorials/Qwen3-235B-A22B.md) |
|
||||
| Qwen3-Next | ✅ | | ✅ | A2/A3 | ✅ |||||| ✅ ||| ✅ || ✅ | ✅ ||| [Qwen3-Next](../../tutorials/Qwen3-Next.md) |
|
||||
| Qwen2.5 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ |||| ✅ ||| ✅ |||||| [Qwen2.5-7B](../../tutorials/Qwen2.5-7B.md) |
|
||||
| GLM-4.x | ✅ | || A2/A3 |✅|✅|✅||✅|✅|✅|||✅||✅|✅|128k||[GLM-4.x](../../tutorials/GLM4.x.md)|
|
||||
| Kimi-K2-Thinking | ✅ | || A2/A3 |||||||||||||||| [Kimi-K2-Thinking](../../tutorials/Kimi-K2-Thinking.md) |
|
||||
| DeepSeek V3/3.1 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 240k || [DeepSeek-V3.1](../../tutorials/models/DeepSeek-V3.1.md) |
|
||||
| DeepSeek V3.2 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 160k | ✅ | [DeepSeek-V3.2](../../tutorials/models/DeepSeek-V3.2.md) |
|
||||
| DeepSeek R1 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 128k || [DeepSeek-R1](../../tutorials/models/DeepSeek-R1.md) |
|
||||
| Qwen3 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ ||| ✅ || ✅ | ✅ | 128k | ✅ | [Qwen3-Dense](../../tutorials/models/Qwen3-Dense.md) |
|
||||
| Qwen3-Coder | ✅ | | ✅ | A2/A3 ||✅|✅|✅|||✅|✅|✅|✅||||||[Qwen3-Coder-30B-A3B tutorial](../../tutorials/models/Qwen3-Coder-30B-A3B.md)|
|
||||
| Qwen3-Moe | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | 256k || [Qwen3-235B-A22B](../../tutorials/models/Qwen3-235B-A22B.md) |
|
||||
| Qwen3-Next | ✅ | | ✅ | A2/A3 | ✅ |||||| ✅ ||| ✅ || ✅ | ✅ ||| [Qwen3-Next](../../tutorials/models/Qwen3-Next.md) |
|
||||
| Qwen2.5 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ |||| ✅ ||| ✅ |||||| [Qwen2.5-7B](../../tutorials/models/Qwen2.5-7B.md) |
|
||||
| GLM-4.x | ✅ | || A2/A3 |✅|✅|✅||✅|✅|✅|||✅||✅|✅|128k||[GLM-4.x](../../tutorials/models/GLM4.x.md)|
|
||||
| Kimi-K2-Thinking | ✅ | || A2/A3 |||||||||||||||| [Kimi-K2-Thinking](../../tutorials/models/Kimi-K2-Thinking.md) |
|
||||
|
||||
#### Extended Compatible Models
|
||||
|
||||
@@ -60,10 +60,10 @@ Get the latest info here: <https://github.com/vllm-project/vllm-ascend/issues/16
|
||||
|
||||
| Model | Support | Note | Supported Hardware | Doc |
|
||||
|-------------------------------|-----------|----------------------------------------------------------------------|--------------------------|------|
|
||||
| Qwen3-Embedding | ✅ | | A2/A3 | [Qwen3_embedding](../../tutorials/Qwen3_embedding.md)|
|
||||
| Qwen3-VL-Embedding | ✅ | | A2/A3 | [Qwen3-VL-Embedding](../../tutorials/Qwen3-VL-Embedding.md)|
|
||||
| Qwen3-Reranker | ✅ | | A2/A3 | [Qwen3_reranker](../../tutorials/Qwen3_reranker.md)|
|
||||
| Qwen3-VL-Reranker | ✅ | | A2/A3 | [Qwen3-VL-Reranker](../../tutorials/Qwen3-VL-Reranker.md)|
|
||||
| Qwen3-Embedding | ✅ | | A2/A3 | [Qwen3_embedding](../../tutorials/models/Qwen3_embedding.md)|
|
||||
| Qwen3-VL-Embedding | ✅ | | A2/A3 | [Qwen3-VL-Embedding](../../tutorials/models/Qwen3-VL-Embedding.md)|
|
||||
| Qwen3-Reranker | ✅ | | A2/A3 | [Qwen3_reranker](../../tutorials/models/Qwen3_reranker.md)|
|
||||
| Qwen3-VL-Reranker | ✅ | | A2/A3 | [Qwen3-VL-Reranker](../../tutorials/models/Qwen3-VL-Reranker.md)|
|
||||
| Molmo | ✅ | [1942](https://github.com/vllm-project/vllm-ascend/issues/1942) | A2/A3 | |
|
||||
| XLM-RoBERTa-based | ✅ | | A2/A3 | |
|
||||
| Bert | ✅ | | A2/A3 | |
|
||||
@@ -76,11 +76,11 @@ Get the latest info here: <https://github.com/vllm-project/vllm-ascend/issues/16
|
||||
|
||||
| Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
|
||||
|--------------------------------|---------------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|
|
||||
| Qwen2.5-VL | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ |||| ✅ | ✅ | ✅ | 30k || [Qwen-VL-Dense](../../tutorials/Qwen-VL-Dense.md) |
|
||||
| Qwen3-VL | ✅ | ||A2/A3|||||||✅|||||✅|✅||| [Qwen-VL-Dense](../../tutorials/Qwen-VL-Dense.md) |
|
||||
| Qwen3-VL-MOE | ✅ | | ✅ | A2/A3||✅|✅|||✅|✅|✅|✅|✅|✅|✅|✅|256k||[Qwen3-VL-MOE](../../tutorials/Qwen3-VL-235B-A22B-Instruct.md)|
|
||||
| Qwen3-Omni-30B-A3B-Thinking | ✅ | ||A2/A3|||||||✅||✅|||||||[Qwen3-Omni-30B-A3B-Thinking](../../tutorials/Qwen3-Omni-30B-A3B-Thinking.md)|
|
||||
| Qwen2.5-Omni | ✅ | || A2/A3 |||||||||||||||| [Qwen2.5-Omni](../../tutorials/Qwen2.5-Omni.md) |
|
||||
| Qwen2.5-VL | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ |||| ✅ | ✅ | ✅ | 30k || [Qwen-VL-Dense](../../tutorials/models/Qwen-VL-Dense.md) |
|
||||
| Qwen3-VL | ✅ | ||A2/A3|||||||✅|||||✅|✅||| [Qwen-VL-Dense](../../tutorials/models/Qwen-VL-Dense.md) |
|
||||
| Qwen3-VL-MOE | ✅ | | ✅ | A2/A3||✅|✅|||✅|✅|✅|✅|✅|✅|✅|✅|256k||[Qwen3-VL-MOE](../../tutorials/models/Qwen3-VL-235B-A22B-Instruct.md)|
|
||||
| Qwen3-Omni-30B-A3B-Thinking | ✅ | ||A2/A3|||||||✅||✅|||||||[Qwen3-Omni-30B-A3B-Thinking](../../tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md)|
|
||||
| Qwen2.5-Omni | ✅ | || A2/A3 |||||||||||||||| [Qwen2.5-Omni](../../tutorials/models/Qwen2.5-Omni.md) |
|
||||
|
||||
#### Extended Compatible Models
|
||||
|
||||
|
||||
Reference in New Issue
Block a user