diff --git a/docs/source/tutorials/models/DeepSeek-V3.1.md b/docs/source/tutorials/models/DeepSeek-V3.1.md
index dc844f67..d787f061 100644
--- a/docs/source/tutorials/models/DeepSeek-V3.1.md
+++ b/docs/source/tutorials/models/DeepSeek-V3.1.md
@@ -25,8 +25,7 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea
 ### Model Weight
 
 - `DeepSeek-V3.1`(BF16 version): [Download model weight](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-V3.1).
-- `DeepSeek-V3.1-w8a8`(Quantized version without mtp): [Download model weight](https://www.modelscope.cn/models/vllm-ascend/DeepSeek-V3.1-w8a8).
-- `DeepSeek-V3.1_w8a8mix_mtp`(Quantized version with mix mtp): [Download model weight](https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1-w8a8). Please modify `torch_dtype` from `float16` to `bfloat16` in `config.json`.
+- `DeepSeek-V3.1-w8a8-mtp-QuaRot`(Quantized version with mix mtp): [Download model weight](https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1-w8a8-mtp-QuaRot).
 - `DeepSeek-V3.1-Terminus-w4a8-mtp-QuaRot`(Quantized version with mix mtp): [Download model weight](https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1-Terminus-w4a8-mtp-QuaRot).
 - `Method of Quantify`: [msmodelslim](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v31-w8a8-%E6%B7%B7%E5%90%88%E9%87%8F%E5%8C%96-mtp-%E9%87%8F%E5%8C%96). You can use these methods to quantify the model.
 
diff --git a/docs/source/tutorials/models/DeepSeek-V3.2.md b/docs/source/tutorials/models/DeepSeek-V3.2.md
index 65563e30..18817647 100644
--- a/docs/source/tutorials/models/DeepSeek-V3.2.md
+++ b/docs/source/tutorials/models/DeepSeek-V3.2.md
@@ -17,7 +17,7 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea
 ### Model Weight
 
 - `DeepSeek-V3.2-Exp`(BF16 version): require 2 Atlas 800 A3 (64G × 16) nodes or 4 Atlas 800 A2 (64G × 8) nodes. [Download model weight](https://modelers.cn/models/Modelers_Park/DeepSeek-V3.2-Exp-BF16)
-- `DeepSeek-V3.2-Exp-w8a8`(Quantized version): require 1 Atlas 800 A3 (64G × 16) node or 2 Atlas 800 A2 (64G × 8) nodes. [Download model weight](https://modelers.cn/models/Modelers_Park/DeepSeek-V3.2-Exp-w8a8)
+- `DeepSeek-V3.2-Exp-W8A8`(Quantized version): require 1 Atlas 800 A3 (64G × 16) node or 2 Atlas 800 A2 (64G × 8) nodes. [Download model weight](https://www.modelscope.cn/models/vllm-ascend/DeepSeek-V3.2-Exp-W8A8)
 - `DeepSeek-V3.2`(BF16 version): require 2 Atlas 800 A3 (64G × 16) nodes or 4 Atlas 800 A2 (64G × 8) nodes. Model weight in BF16 not found now.
 - `DeepSeek-V3.2-w8a8`(Quantized version): require 1 Atlas 800 A3 (64G × 16) node or 2 Atlas 800 A2 (64G × 8) nodes. [Download model weight](https://www.modelscope.cn/models/vllm-ascend/DeepSeek-V3.2-W8A8/)
 
diff --git a/docs/source/tutorials/models/GLM5.md b/docs/source/tutorials/models/GLM5.md
index 86981206..8654d07d 100644
--- a/docs/source/tutorials/models/GLM5.md
+++ b/docs/source/tutorials/models/GLM5.md
@@ -18,7 +18,7 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea
 
 - `GLM-5`(BF16 version): [Download model weight](https://www.modelscope.cn/models/ZhipuAI/GLM-5).
 - `GLM-5-w4a8`: [Download model weight](https://modelscope.cn/models/Eco-Tech/GLM-5-w4a8).
-- `GLM-5-w8a8`: [Download model weight](https://ai.gitcode.com/Eco-Tech/GLM-5-w8a8/tree/main).
+- `GLM-5-w8a8`: [Download model weight](https://www.modelscope.cn/models/Eco-Tech/GLM-5-w8a8).
 - You can use [msmodelslim](https://gitcode.com/Ascend/msmodelslim) to quantify the model naively.
 
 It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`
@@ -1308,6 +1308,21 @@ python load_balance_proxy_server_example.py \
       6721 6722 6723 6724      
 ```
 
+## Functional Verification
+
+Once your server is started, you can query the model with input prompts:
+
+```shell
+curl http://<node0_ip>:<port>/v1/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "glm-5",
+        "prompt": "The future of AI is",
+        "max_completion_tokens": 50,
+        "temperature": 0
+    }'
+```
+
 ## Accuracy Evaluation
 
 Here are two accuracy evaluation methods.
diff --git a/docs/source/tutorials/models/Kimi-K2-Thinking.md b/docs/source/tutorials/models/Kimi-K2-Thinking.md
index af7f1525..95d64a21 100644
--- a/docs/source/tutorials/models/Kimi-K2-Thinking.md
+++ b/docs/source/tutorials/models/Kimi-K2-Thinking.md
@@ -113,12 +113,24 @@ Run the following script to start the vLLM server on Multi-NPU:
 For an Atlas 800 A3 (64G*16) node, tensor-parallel-size should be at least 16.
 
 ```bash
-vllm serve Kimi-K2-Thinking \
---served-model-name kimi-k2-thinking \
---tensor-parallel-size 16 \
---enable-expert-parallel \
---trust-remote-code \
---no-enable-prefix-caching
+#!/bin/bash
+export VLLM_USE_MODELSCOPE=True
+export HCCL_BUFFSIZE=1024
+export TASK_QUEUE_ENABLE=1
+export OMP_PROC_BIND=false
+export HCCL_OP_EXPANSION_MODE=AIV
+export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
+
+vllm serve "moonshotai/Kimi-K2-Thinking" \
+  --tensor-parallel-size 16 \
+  --port 8000 \
+  --max-model-len 8192 \
+  --max-num-batched-tokens 8192 \
+  --max-num-seqs 12 \
+  --gpu-memory-utilization 0.9 \
+  --trust-remote-code \
+  --enable-expert-parallel \
+  --no-enable-prefix-caching
 ```
 
 Once your server is started, you can query the model with input prompts.
diff --git a/docs/source/tutorials/models/MiniMax-M2.5.md b/docs/source/tutorials/models/MiniMax-M2.5.md
index a1fa7749..a3509d45 100644
--- a/docs/source/tutorials/models/MiniMax-M2.5.md
+++ b/docs/source/tutorials/models/MiniMax-M2.5.md
@@ -299,7 +299,7 @@ print(resp.choices[0].message.content)
 Or send a request using curl:
 
 ```{code-block} bash
-curl http://127.0.0.1:8000/v1/chat/completions \
+curl http://localhost:8000/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
     "model": "MiniMax-M2.5",
diff --git a/docs/source/tutorials/models/PaddleOCR-VL.md b/docs/source/tutorials/models/PaddleOCR-VL.md
index 8c6b2945..7a47f13a 100644
--- a/docs/source/tutorials/models/PaddleOCR-VL.md
+++ b/docs/source/tutorials/models/PaddleOCR-VL.md
@@ -74,7 +74,7 @@ Run the following script to start the vLLM server on single 910B4:
 #!/bin/sh
 export VLLM_USE_MODELSCOPE=true
 export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"
-export TASK_QUEUE_ENABLE=2
+export TASK_QUEUE_ENABLE=1
 export CPU_AFFINITY_CONF=1
 export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
 
diff --git a/docs/source/tutorials/models/Qwen-VL-Dense.md b/docs/source/tutorials/models/Qwen-VL-Dense.md
index fb55dd3a..dcee9432 100644
--- a/docs/source/tutorials/models/Qwen-VL-Dense.md
+++ b/docs/source/tutorials/models/Qwen-VL-Dense.md
@@ -142,7 +142,7 @@ llm = LLM(
 )
 
 sampling_params = SamplingParams(
-    max_completion_tokens=512
+    max_tokens=512
 )
 
 image_messages = [
diff --git a/docs/source/tutorials/models/Qwen2.5-7B.md b/docs/source/tutorials/models/Qwen2.5-7B.md
index 7c8c52f4..c3052128 100644
--- a/docs/source/tutorials/models/Qwen2.5-7B.md
+++ b/docs/source/tutorials/models/Qwen2.5-7B.md
@@ -122,7 +122,7 @@ Not supported yet.
 After starting the service, verify functionality using a `curl` request:
 
 ```shell
-curl http://<IP>:<Port>/v1/completions \
+curl http://localhost:8000/v1/completions \
     -H "Content-Type: application/json" \
     -d '{
         "model": "qwen-2.5-7b-instruct",
diff --git a/docs/source/tutorials/models/Qwen2.5-Omni.md b/docs/source/tutorials/models/Qwen2.5-Omni.md
index 5757091a..2f4857b4 100644
--- a/docs/source/tutorials/models/Qwen2.5-Omni.md
+++ b/docs/source/tutorials/models/Qwen2.5-Omni.md
@@ -16,8 +16,8 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea
 
 ### Model Weight
 
-- `Qwen2.5-Omni-3B`(BF16): [Download model weight](https://huggingface.co/Qwen/Qwen2.5-Omni-3B)
-- `Qwen2.5-Omni-7B`(BF16): [Download model weight](https://huggingface.co/Qwen/Qwen2.5-Omni-7B)
+- `Qwen2.5-Omni-3B`(BF16): [Download model weight](https://modelscope.cn/models/Qwen/Qwen2.5-Omni-3B)
+- `Qwen2.5-Omni-7B`(BF16): [Download model weight](https://modelscope.cn/models/Qwen/Qwen2.5-Omni-7B)
 
 Following examples use the 7B version by default.
 
@@ -71,6 +71,8 @@ docker run --rm \
 :::{note}
 The env `LOCAL_MEDIA_PATH` which allowing API requests to read local images or videos from directories specified by the server file system. Please note this is a security risk. Should only be enabled in trusted environments.
 
+:::
+
 ```bash
 export VLLM_USE_MODELSCOPE=true
 export MODEL_PATH="Qwen/Qwen2.5-Omni-7B"
@@ -104,10 +106,10 @@ VLLM_TARGET_DEVICE=empty pip install -v ".[audio]"
 ```bash
 export VLLM_USE_MODELSCOPE=true
 export MODEL_PATH=Qwen/Qwen2.5-Omni-7B
-export LOCAL_MEDIA_PATH=/local_path/to_media/
+export LOCAL_MEDIA_PATH=$HOME/.cache/vllm/assets/vllm_public_assets/
 export DP_SIZE=8
 
-vllm serve ${MODEL_PATH}\
+vllm serve ${MODEL_PATH} \
 --host 0.0.0.0 \
 --port 8000 \
 --served-model-name Qwen-Omni \
@@ -137,7 +139,7 @@ INFO:     Application startup complete.
 Once your server is started, you can query the model with input prompts:
 
 ```bash
-curl http://127.0.0.1:8000/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer EMPTY"   -d '{
+curl http://localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer EMPTY"   -d '{
     "model": "Qwen-Omni",
     "messages": [
       {
diff --git a/docs/source/tutorials/models/Qwen3-235B-A22B.md b/docs/source/tutorials/models/Qwen3-235B-A22B.md
index a41733d2..b35e7124 100644
--- a/docs/source/tutorials/models/Qwen3-235B-A22B.md
+++ b/docs/source/tutorials/models/Qwen3-235B-A22B.md
@@ -18,7 +18,7 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea
 
 ### Model Weight
 
-- `Qwen3-235B-A22B`(BF16 version): require 1 Atlas 800 A3 (64G × 16) node, 1 Atlas 800 A2 (64G × 8) node or 2 Atlas 800 A2(32G * 8)nodes. [Download model weight](https://modelers.cn/models/Modelers_Park/Qwen3-235B-A22B)
+- `Qwen3-235B-A22B`(BF16 version): require 1 Atlas 800 A3 (64G × 16) node, 1 Atlas 800 A2 (64G × 8) node or 2 Atlas 800 A2(32G * 8)nodes. [Download model weight](https://www.modelscope.cn/models/Qwen/Qwen3-235B-A22B)
 - `Qwen3-235B-A22B-w8a8`(Quantized version): require 1 Atlas 800 A3 (64G × 16) node or 1 Atlas 800 A2 (64G × 8) node or 2 Atlas 800 A2(32G * 8)nodes. [Download model weight](https://modelscope.cn/models/vllm-ascend/Qwen3-235B-A22B-W8A8)
 
 It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`.
@@ -174,7 +174,7 @@ export OMP_NUM_THREADS=1
 export HCCL_BUFFSIZE=1024
 export TASK_QUEUE_ENABLE=1
 
-vllm serve vllm-ascend/Qwen3-235B-A22B \
+vllm serve Qwen/Qwen3-235B-A22B \
 --host 0.0.0.0 \
 --port 8000 \
 --data-parallel-size 2 \
@@ -219,7 +219,7 @@ export OMP_NUM_THREADS=1
 export HCCL_BUFFSIZE=1024
 export TASK_QUEUE_ENABLE=1
 
-vllm serve vllm-ascend/Qwen3-235B-A22B \
+vllm serve Qwen/Qwen3-235B-A22B \
 --host 0.0.0.0 \
 --port 8000 \
 --headless \
diff --git a/docs/source/tutorials/models/Qwen3-Coder-30B-A3B.md b/docs/source/tutorials/models/Qwen3-Coder-30B-A3B.md
index 8a627f58..3e1bb5d0 100644
--- a/docs/source/tutorials/models/Qwen3-Coder-30B-A3B.md
+++ b/docs/source/tutorials/models/Qwen3-Coder-30B-A3B.md
@@ -16,7 +16,7 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea
 
 ### Model Weight
 
-`Qwen3-Coder-30B-A3B-Instruct`(BF16 version): requires 1 Atlas 800 A3 node (with 16x 64G NPUs) or 1 Atlas 800 A2 node (with 8x 64G/32G NPUs). [Download model weight](https://modelers.cn/models/Modelers_Park/Qwen3-Coder-30B-A3B-Instruct)
+`Qwen3-Coder-30B-A3B-Instruct`(BF16 version): requires 1 Atlas 800 A3 node (with 16x 64G NPUs) or 1 Atlas 800 A2 node (with 8x 64G/32G NPUs). [Download model weight](https://modelscope.cn/models/Qwen/Qwen3-Coder-30B-A3B-Instruct)
 
 It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`
 
diff --git a/docs/source/tutorials/models/Qwen3-Dense.md b/docs/source/tutorials/models/Qwen3-Dense.md
index 9f929a9a..2b331a1d 100644
--- a/docs/source/tutorials/models/Qwen3-Dense.md
+++ b/docs/source/tutorials/models/Qwen3-Dense.md
@@ -8,11 +8,7 @@ Welcome to the tutorial on optimizing Qwen Dense models in the vLLM-Ascend envir
 
 This document will show the main verification steps of the model, including supported features, feature configuration, environment preparation, accuracy and performance evaluation.
 
-The Qwen3 Dense models is first supported in [v0.8.4rc2](https://github.com/vllm-project/vllm-ascend/blob/main/docs/source/user_guide/release_notes.md#v084rc2---20250429)
-
-## **Node**
-
-This example requires version **v0.11.0rc2**. Earlier versions may lack certain features.
+The Qwen3 Dense models is first supported in [v0.8.4rc2](https://github.com/vllm-project/vllm-ascend/blob/main/docs/source/user_guide/release_notes.md#v084rc2---20250429). This example requires version **v0.11.0rc2**. Earlier versions may lack certain features.
 
 ## Supported Features
 
@@ -115,12 +111,13 @@ The specific example scenario is as follows:
 
 ### Run docker container
 
-#### **Node**
+:::{note}
 
-- /model/Qwen3-32B-W8A8 is the model path, replace this with your actual path.
+- vllm-ascend/Qwen3-32B-W8A8 is the default model path, replace this with your actual path.
 - v0.11.0rc2-a3 is image tag, replace this with your actual tag.
 - replace this with your actual port: '-p 8113:8113'.
 - replace this with your actual card: '--device /dev/davinci0'.
+:::
 
 ```{code-block} bash
    :substitutions:
@@ -142,7 +139,7 @@ docker run --rm \
 -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
 -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
 -v /etc/ascend_install.info:/etc/ascend_install.info \
--v /model/Qwen3-32B-W8A8:/model/Qwen3-32B-W8A8 \
+-v /root/.cache:/root/.cache \
 -p 8113:8113 \
 -it $IMAGE bash
 ```
@@ -174,7 +171,7 @@ export HCCL_OP_EXPANSION_MODE="AIV"
 # Enable FlashComm_v1 optimization when tensor parallel is enabled.
 export VLLM_ASCEND_ENABLE_FLASHCOMM1=1
 
-vllm serve /model/Qwen3-32B-W8A8 \
+vllm serve vllm-ascend/Qwen3-32B-W8A8 \
   --served-model-name qwen3 \
   --trust-remote-code \
   --async-scheduling \
@@ -190,15 +187,16 @@ vllm serve /model/Qwen3-32B-W8A8 \
   --gpu-memory-utilization 0.9
 ```
 
-#### **Node**
+:::{note}
 
-- /model/Qwen3-32B-W8A8 is the model path, replace this with your actual path.
+- vllm-ascend/Qwen3-32B-W8A8 is the default model path, replace this with your actual path.
 
 - If the model is not a quantized model, remove the `--quantization ascend` parameter.
 
 - **[Optional]** `--additional-config '{"pa_shape_list":[48,64,72,80]}'`: `pa_shape_list` specifies the batch sizes where you want to switch to the PA operator. This is a temporary tuning knob. Currently, the attention operator dispatch defaults to the FIA operator. In some batch-size (concurrency) settings, FIA may have suboptimal performance. By setting `pa_shape_list`, when the runtime batch size matches one of the listed values, vLLM-Ascend will replace FIA with the PA operator to prevent performance degradation. In the future, FIA will be optimized for these scenarios and this parameter will be removed.
 
 - If the ultimate performance is desired, the cudagraph_capture_sizes parameter can be enabled, reference: [key-optimization-points](./Qwen3-Dense.md#key-optimization-points)、[optimization-highlights](./Qwen3-Dense.md#optimization-highlights). Here is an example of batchsize of 72: `--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY", "cudagraph_capture_sizes":[1,8,24,48,60,64,72,76]}'`.
+:::
 
 Once your server is started, you can query the model with input prompts
 
@@ -219,11 +217,12 @@ curl http://localhost:8113/v1/chat/completions -H "Content-Type: application/jso
 
 Run the following script to execute offline inference on multi-NPU.
 
-#### **Node**
+:::{note}
 
-- /model/Qwen3-32B-W8A8 is the model path, replace this with your actual path.
+- vllm-ascend/Qwen3-32B-W8A8 is the default model path, replace this with your actual path.
 
 - If the model is not a quantized model,remove the `quantization="ascend"` parameter.
+:::
 
 ```python
 import gc
@@ -244,7 +243,7 @@ prompts = [
     "The future of AI is",
 ]
 sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=40)
-llm = LLM(model="/model/Qwen3-32B-W8A8",
+llm = LLM(model="vllm-ascend/Qwen3-32B-W8A8",
           tensor_parallel_size=4,
           trust_remote_code=True,
           distributed_executor_backend="mp",
@@ -299,12 +298,13 @@ There are three `vllm bench` subcommands:
 
 Take the `serve` as an example. Run the code as follows.
 
-#### **Node**
+:::{note}
 
-- /model/Qwen3-32B-W8A8 is the model path, replace this with your actual path.
+- vllm-ascend/Qwen3-32B-W8A8 is the default model path, replace this with your actual path.
+:::
 
 ```shell
-vllm bench serve --model /model/Qwen3-32B-W8A8 --served-model-name qwen3 --port 8113 --dataset-name random --random-input 200 --num-prompts 200 --request-rate 1 --save-result --result-dir ./
+vllm bench serve --model vllm-ascend/Qwen3-32B-W8A8 --served-model-name qwen3 --port 8113 --dataset-name random --random-input 200 --num-prompts 200 --request-rate 1 --save-result --result-dir ./
 ```
 
 After about several minutes, you can get the performance evaluation result.
diff --git a/docs/source/tutorials/models/Qwen3-Next.md b/docs/source/tutorials/models/Qwen3-Next.md
index fa279af8..160bda10 100644
--- a/docs/source/tutorials/models/Qwen3-Next.md
+++ b/docs/source/tutorials/models/Qwen3-Next.md
@@ -16,7 +16,7 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea
 
 ## Weight Preparation
 
- Download Link for the `Qwen3-Next-80B-A3B-Instruct` Model Weights: [Download model weight](https://modelers.cn/models/Modelers_Park/Qwen3-Next-80B-A3B-Instruct/tree/main)
+ Download Link for the `Qwen3-Next-80B-A3B-Instruct` Model Weights: [Download model weight](https://modelscope.cn/models/Qwen/Qwen3-Next-80B-A3B-Instruct)
 
 ## Deployment
 
@@ -103,7 +103,7 @@ if __name__ == '__main__':
     prompts = [
         "Who are you?",
     ]
-    sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=40, max_completion_tokens=32)
+    sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=40, max_tokens=32)
     llm = LLM(model="Qwen/Qwen3-Next-80B-A3B-Instruct",
               tensor_parallel_size=4,
               enforce_eager=True,
diff --git a/docs/source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md b/docs/source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md
index 0a8b3704..c45cb77d 100644
--- a/docs/source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md
+++ b/docs/source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md
@@ -109,7 +109,7 @@ def clean_up():
 
 
 def main():
-    MODEL_PATH = "Qwen3/Qwen3-Omni-30B-A3B-Thinking"
+    MODEL_PATH = "Qwen/Qwen3-Omni-30B-A3B-Thinking"
     llm = LLM(
         model=MODEL_PATH,
         tensor_parallel_size=2,
@@ -123,7 +123,7 @@ def main():
         temperature=0.6,
         top_p=0.95,
         top_k=20,
-        max_completion_tokens=16384,
+        max_tokens=16384,
     )
 
     processor = Qwen3OmniMoeProcessor.from_pretrained(MODEL_PATH)
@@ -176,6 +176,11 @@ if __name__ == "__main__":
 Run the following script to start the vLLM server on Multi-NPU:
 For an Atlas A2 with 64 GB of NPU card memory, tensor-parallel-size should be at least 1, and for 32 GB of memory, tensor-parallel-size should be at least 2.
 
+```bash
+export HCCL_BUFFSIZE=512
+export HCCL_OP_EXPANSION_MODE=AIV
+```
+
 ```bash
 vllm serve Qwen/Qwen3-Omni-30B-A3B-Thinking --tensor-parallel-size 2 --enable_expert_parallel
 ```
diff --git a/docs/source/tutorials/models/Qwen3-VL-Embedding.md b/docs/source/tutorials/models/Qwen3-VL-Embedding.md
index a6694fc9..8ca909c5 100644
--- a/docs/source/tutorials/models/Qwen3-VL-Embedding.md
+++ b/docs/source/tutorials/models/Qwen3-VL-Embedding.md
@@ -40,7 +40,7 @@ vllm serve Qwen/Qwen3-VL-Embedding-8B --runner pooling
 Once your server is started, you can query the model with input prompts.
 
 ```bash
-curl http://127.0.0.1:8000/v1/embeddings -H "Content-Type: application/json" -d '{
+curl http://localhost:8000/v1/embeddings -H "Content-Type: application/json" -d '{
   "input": [
         "The capital of China is Beijing.",
         "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
diff --git a/docs/source/tutorials/models/Qwen3.5-27B.md b/docs/source/tutorials/models/Qwen3.5-27B.md
index 00e05a2e..69485d6d 100644
--- a/docs/source/tutorials/models/Qwen3.5-27B.md
+++ b/docs/source/tutorials/models/Qwen3.5-27B.md
@@ -18,8 +18,8 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea
 
 ### Model Weight
 
-- `Qwen3.5-27B`(BF16 version): require 1 Atlas 800 A3 (64G × 16) nodes or 1 Atlas 800 A2 (64G × 8) node. [Download model weight](https://huggingface.co/Qwen/Qwen3.5-27B/tree/main)
-- `Qwen3.5-27B-w8a8`(Quantized version): require 1 Atlas 800 A3 (64G × 16) node or 1 Atlas 800 A2 (64G × 8) node. [Download model weight](https://www.modelscope.cn/models/Eco-Tech/Qwen3.5-27B-w8a8-mtp/files)
+- `Qwen3.5-27B`(BF16 version): require 1 Atlas 800 A3 (64G × 16) nodes or 1 Atlas 800 A2 (64G × 8) node. [Download model weight](https://modelscope.cn/models/Qwen/Qwen3.5-27B)
+- `Qwen3.5-27B-w8a8`(Quantized version): require 1 Atlas 800 A3 (64G × 16) node or 1 Atlas 800 A2 (64G × 8) node. [Download model weight](https://www.modelscope.cn/models/Eco-Tech/Qwen3.5-27B-w8a8-mtp)
 
 It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`.
 
@@ -145,7 +145,7 @@ The parameters are explained as follows:
 Once your server is started, you can query the model with input prompts:
 
 ```shell
-curl http://<node0_ip>:<port>/v1/completions \
+curl http://localhost:8000/v1/completions \
     -H "Content-Type: application/json" \
     -d '{
         "model": "qwen3.5",
diff --git a/docs/source/tutorials/models/Qwen3.5-397B-A17B.md b/docs/source/tutorials/models/Qwen3.5-397B-A17B.md
index a433b5bc..bd96de16 100644
--- a/docs/source/tutorials/models/Qwen3.5-397B-A17B.md
+++ b/docs/source/tutorials/models/Qwen3.5-397B-A17B.md
@@ -18,8 +18,8 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea
 
 ### Model Weight
 
-- `Qwen3.5-397B-A17B`(BF16 version): require 2 Atlas 800 A3 (64G × 16) nodes or 2 Atlas 800 A2 (64G × 8) nodes. [Download model weight](https://huggingface.co/Qwen/Qwen3.5-397B-A17B/tree/main)
-- `Qwen3.5-397B-A17B-w8a8`(Quantized version): require 1 Atlas 800 A3 (64G × 16) node or 2 Atlas 800 A2 (64G × 8) nodes. [Download model weight](https://www.modelscope.cn/models/Eco-Tech/Qwen3.5-397B-A17B-w8a8-mtp/files)
+- `Qwen3.5-397B-A17B`(BF16 version): require 2 Atlas 800 A3 (64G × 16) nodes or 2 Atlas 800 A2 (64G × 8) nodes. [Download model weight](https://www.modelscope.cn/models/Qwen/Qwen3.5-397B-A17B)
+- `Qwen3.5-397B-A17B-w8a8`(Quantized version): require 1 Atlas 800 A3 (64G × 16) node or 2 Atlas 800 A2 (64G × 8) nodes. [Download model weight](https://www.modelscope.cn/models/Eco-Tech/Qwen3.5-397B-A17B-w8a8-mtp)
 
 It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`.
 
diff --git a/docs/source/tutorials/models/Qwen3_embedding.md b/docs/source/tutorials/models/Qwen3_embedding.md
index 7e490e7a..a7c497d8 100644
--- a/docs/source/tutorials/models/Qwen3_embedding.md
+++ b/docs/source/tutorials/models/Qwen3_embedding.md
@@ -41,7 +41,7 @@ vllm serve Qwen/Qwen3-Embedding-8B --runner pooling --host 127.0.0.1 --port 8888
 Once your server is started, you can query the model with input prompts.
 
 ```bash
-curl http://127.0.0.1:8888/v1/embeddings -H "Content-Type: application/json" -d '{
+curl http://localhost:8888/v1/embeddings -H "Content-Type: application/json" -d '{
   "input": [
         "The capital of China is Beijing.",
         "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
diff --git a/docs/source/tutorials/models/Qwen3_reranker.md b/docs/source/tutorials/models/Qwen3_reranker.md
index 94c1c8b6..2eef1ab1 100644
--- a/docs/source/tutorials/models/Qwen3_reranker.md
+++ b/docs/source/tutorials/models/Qwen3_reranker.md
@@ -111,7 +111,7 @@ model_name = "Qwen/Qwen3-Reranker-8B"
 
 model = LLM(
     model=model_name,
-    task="score",
+    runner="pooling",
     hf_overrides={
         "architectures": ["Qwen3ForSequenceClassification"],
         "classifier_from_token": ["no", "yes"],
@@ -154,7 +154,7 @@ if __name__ == "__main__":
 
     outputs = model.score(query_template.format(prefix=prefix, instruction=instruction, query=query), documents)
 
-    print([output.outputs[0].score for output in outputs])
+    print([output.outputs.score for output in outputs])
 ```
 
 If you run this script successfully, you will see a list of scores printed to the console, similar to this: