Commit Graph

58 Commits

Author SHA1 Message Date
zhangxinyuehfad
75de3fa172 [v0.11.0][Doc] Update doc (#3852)
### What this PR does / why we need it?
Update doc


Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-10-29 11:32:12 +08:00
liziyu
3164cb663c [Bugfix] mooncake connector support external dp & update readme (#3579)
### What this PR does / why we need it?

mooncake connector support external dp & update readme

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
2025-10-21 20:15:24 +08:00
likeful
6b6857929d [Doc] Add --shm-size option to Docker command for qwen3 vl 235B (#3519)
### What this PR does / why we need it?
Added shared memory size option to Docker run command.If shm-size is not
specified, docker will use 64MB by default. In this case,
vllm:EngineCore process may coredump if workload is high.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Done

Closes: https://github.com/vllm-project/vllm-ascend/issues/3513

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: likeful <irayki@gmail.com>
Signed-off-by: leijie2015 <irayki@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-20 23:37:35 +08:00
Li Wang
4c4a8458a5 [CI] Refator multi-node CI (#3487)
### What this PR does / why we need it?
Refactor the multi-machine CI use case. The purpose of this PR is to
increase the ease of adding multi-machine CI use cases, allowing
developers to add multi-machine cluster model testing use cases
(including PD separation) by simply adding a new YAML configuration
file.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-17 09:04:31 +08:00
leo-pony
291c00a224 [Doc] pin version that can stable running 310I Duo to vllm-ascend v0.10.0rc1 (#3455)
Pin version that can stable running 310I Duo to vllm-ascend v0.10.0rc1.

### What this PR does / why we need it?
Since PR #2614 310I Duo been broken. Although we are currently working
on fixing the issue with the 310I Duo being broken, there is no
confirmed timeline for a fix in the short term. To allow users to
quickly find a working version instead of going back and forth on trial
and error, this PR fixes the version in the 310I Duo guide.

### Does this PR introduce _any_ user-facing change?
NA

### How was this patch tested?
NA

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-10-16 08:54:09 +08:00
leo-pony
ff91904ee2 [Doc] Clearer corresponding relationship between configurations for multi-node guides (#3441)
Optimize multi-node guide: more clearer corresponding relationship
between configuration items and nodes

### What this PR does / why we need it?
Some issues caused by misunderstandings due to unclear guidance content,
for example: #3367

### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
NA

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-10-16 08:54:03 +08:00
zxr2333
c2c1db78a7 [Bugfix] fix ZeroDivisionError when prefill_tp_size > num_kv_head and fix tp_resharding README (#3437)
### What this PR does / why we need it?
Fix ZeroDivisionError when prefill_tp_size > num_kv_head, in this
situation, num_head_replica can be 0 and used to divide another value,
this PR restricts the minimum value of a to be 1. And this PR fix
tp_resharding README.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By CI.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
2025-10-15 08:45:44 +08:00
wangxiaoteng888
19b85ef1bc [Bugfix] multi_node_pd_disaggregation_mooncake.md update (#3400)
### What this PR does / why we need it?
multi_node_pd_disaggregation_mooncake.md update. Fix issues encountered
during service startup.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: wangxiaoteng@huawei.com <wangxiaoteng@huawei.com>
2025-10-14 09:29:35 +08:00
wangxiaoteng888
ca05f7d632 [Bugfix] TP size larger than KV cache head causes accuracy issues (#3366)
### What this PR does / why we need it?
Resolve the issue where, in the case of unequal TP (Tensor Parallelism),
the TP size is larger than the number of model attention kvcache heads,
causing the KV cache to generate duplicates, which leads to transmission
errors in the original code.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com>
2025-10-11 11:22:23 +08:00
Li Wang
60b7c936c5 [Doc] Update deepseek-v3.2 doc (#3319)
### What this PR does / why we need it?
Upgrade deepseek-v3.2 doc for A2
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-10 08:55:39 +08:00
Yikun Jiang
2dde1268c7 Fix doc for A2 series and cleanup note (#3307)
### What this PR does / why we need it?
Fix doc for A2 series and cleanup note

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-10-01 14:39:48 +08:00
wangxiyuan
b8c58d68e1 [Doc] Add deepseek v3.2 tutorial (#3275)
Add deepseek v3.2 tutorial

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-09-30 17:54:31 +08:00
Peipei
cf445c41f9 [Doc]Add qwen3_vl series guide (#3227)
### What this PR does / why we need it?
This PR provides user guide documents for Qwen3-VL 4B and
Qwen3-VL-235B-A22B.
### Does this PR introduce _any_ user-facing change?
None
### How was this patch tested?


- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

---------

Signed-off-by: booker123456 <945658361@qq.com>
2025-09-28 21:35:52 +08:00
Jianwei Mao
d586255678 fix wrong --num-gpus parameter requirements, and avoid ambiguity (#3116)
fix the problem of
https://github.com/vllm-project/vllm-ascend/issues/3114
- vLLM version: v0.10.2
- vLLM main:
5aeb925452

Signed-off-by: Jianwei Mao <maojianwei2012@126.com>
2025-09-23 11:58:44 +08:00
Li Wang
4267f5d55f [Doc] Add multi-node ray backend tutorial (#2376)
### What this PR does / why we need it?
Add multi-node ray backend tutorial for Qwen235B-A3B

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
f4cd80f944

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-09-18 15:30:18 +08:00
Yikun Jiang
0aba644633 Update max_tokens and prompt in qwen3 online doc (#2945)
### What this PR does / why we need it?
Update max_tokens and prompt in qwen3 online doc
Before:
```
"'max_tokens' or 'max_completion_tokens' is too large: 4096. This model's maximum context length is 4096 tokens and your request has 18 input tokens (4096 > 4096 - 18). None"
```

After:
```
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "/root/.cache/modelscope/hub/models/Qwen-SGlang/Qwen3-Next-80B-A3B-Instruct",
  "messages": [
    {"role": "user", "content": "Who are you?"}
  ],
  "temperature": 0.6,
  "top_p": 0.95,
  "top_k": 20,
  "max_tokens": 32
}'
.{"id":"chatcmpl-8ddbd65c9ddc405397219a6792feb9a0","object":"chat.completion","created":1757985049,"model":"/root/.cache/modelscope/hub/models/Qwen-SGlang/Qwen3-Next-80B-A3B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! I am Qwen, a large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I am designed to assist you in generating various","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":12,"total_tokens":44,"completion_tokens":32,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Manually test on my local env
- CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-09-16 09:27:50 +08:00
Yikun Jiang
b5ccef6115 [Doc] Add doc for Qwen3 Next (#2916)
### What this PR does / why we need it?
Add doc for Qwen3 Next

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Doc CI passed

Related: https://github.com/vllm-project/vllm-ascend/issues/2884


- vLLM version: v0.10.2
- vLLM main:
01413e0cf5

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-09-16 01:16:06 +08:00
yupeng
a746f8274f [DOC] Qwen3 PD disaggregation user guide (#2751)
### What this PR does / why we need it?
The PR is for the document of the prefiller&decoder disaggregation
deloyment guide.

The scenario of the guide is:
- Use 3 nodes totally and 2 NPUs on each node
- Qwen3-30B-A3B
- 1P2D
- Expert Parallel

The deployment can be used to verify PD Disggregation / Expert Parallel
features with a slightly less resources.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
No.


- vLLM version: v0.10.1.1
- vLLM main:
e599e2c65e

---------

Signed-off-by: paulyu12 <507435917@qq.com>
2025-09-07 10:35:37 +08:00
Li Wang
516e14ae6a [Doc] Upgrade to multi-node tutorial model to deepseek-v3.1-w8a8 (#2553)
### What this PR does / why we need it?
Upgrade to multi-node tutorial model to deepseek-v3.1-w8a8
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.1.1
- vLLM main:
de02b07db4

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-08-27 14:16:44 +08:00
Li Wang
042605f4b2 [Doc] Add stable modelslim branch (#2545)
### What this PR does / why we need it?
The branch `br_release_MindStudio_8.1.RC2_TR5_20260624` is commercial
delivery version of modelslim in Q3, and has been verified available
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.1.1
- vLLM main:
7d67a9d9f9

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-08-27 09:05:46 +08:00
Li Wang
2ad7e1251e [Doc] Fix quant documentation to make it reproducible (#2277)
### What this PR does / why we need it?
Fixed the expression of msit for code clone

- vLLM version: v0.10.0
- vLLM main:
afa5b7ca0b

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-08-14 17:19:47 +08:00
Mengqing Cao
4604882a3e [ReleaseNote] Release note of v0.10.0rc1 (#2225)
### What this PR does / why we need it?
Release note of v0.10.0rc1

- vLLM version: v0.10.0
- vLLM main:
8e8e0b6af1

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-07 14:46:49 +08:00
22dimensions
440d28a138 [Tutorial] Add qwen3 8b w4a8 tutorial (#2249)
### What this PR does / why we need it?

Add a new single npu quantization tutorial, and using the latest qwen3
model.

- vLLM version: v0.10.0
- vLLM main:
8e8e0b6af1

Signed-off-by: 22dimensions <waitingwind@foxmail.com>
2025-08-07 14:39:38 +08:00
zhangxinyuehfad
dbba3cabb0 [Doc] Update tutorials for single_npu_audio and single_npu_multimodal (#2252)
### What this PR does / why we need it?
Update tutorials for single_npu_audio and single_npu_multimodal

- vLLM version: v0.10.0
- vLLM main:
6b47ef24de

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-08-07 14:08:14 +08:00
Li Wang
bf84f2dbfa [Doc] Support kimi-k2-w8a8 (#2162)
### What this PR does / why we need it?
In fact, the kimi-k2 model is similar to the deepseek model, and we only
need to make a few changes to support it. what does this pr do:
1. Add kimi-k2-w8a8 deployment doc
2. Update quantization doc
3. Upgrade torchair support list
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.10.0
- vLLM main:
9edd1db02b

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-08-06 19:28:47 +08:00
leo-pony
f0c1f0c828 [Doc] Add qwen vl example in tutorials for 310I series (#2160)
### What this PR does / why we need it?
Add qwen vl example in tutorials for 310I series. 

Model: Qwen2.5-VL-3B-Instruct
Accuracy test result, dataset MMM-val:
| | 910B3 | 310P3 |
| --- | --- | --- |
|Summary|0.455 | 0.46 |
|--art_and_design| 0.558 | 0.566 |
|--business| 0.373 | 0.366 |
|--health_and_medicine|0.513 | 0.52 |
|--science|0.333 | 0.333 |
|--tech_and_engineering|0.362 | 0.380 |
|--humanities_and_social_science|0.691 | 0.691 |

Function test result:

1. On line:
![image](https://github.com/user-attachments/assets/d81bba61-df28-4676-a246-c5d094815ac7)
![image](https://github.com/user-attachments/assets/0be81628-9999-4ef2-93c1-898b3043e09e)

2. Offline:
![image](https://github.com/user-attachments/assets/603275c1-6ed6-4cfc-a6e2-7726156de087)

- vLLM version: v0.10.0
- vLLM main:
ad57f23f6a

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-08-02 08:58:56 +08:00
zhangxinyuehfad
d1c640841b [Bugfix] Fix num_hidden_layers when Qwen2-Audio 7B (#1803)
### What this PR does / why we need it?
Fix num_hidden_layers when Qwen2-Audio 7B and #1760 :
```
INFO 07-15 04:38:53 [platform.py:174] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode
Traceback (most recent call last):
  File "/workspace/test1.py", line 58, in <module>
    main(audio_count)
  File "/workspace/test1.py", line 38, in main
    llm = LLM(model="Qwen/Qwen2-Audio-7B-Instruct",
  File "/vllm-workspace/vllm/vllm/entrypoints/llm.py", line 271, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/vllm-workspace/vllm/vllm/engine/llm_engine.py", line 494, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
  File "/vllm-workspace/vllm/vllm/engine/arg_utils.py", line 1286, in create_engine_config
    config = VllmConfig(
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/pydantic/_internal/_dataclasses.py", line 123, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
  File "/vllm-workspace/vllm/vllm/config.py", line 4624, in __post_init__
    current_platform.check_and_update_config(self)
  File "/vllm-workspace/vllm-ascend/vllm_ascend/platform.py", line 180, in check_and_update_config
    update_aclgraph_sizes(vllm_config)
  File "/vllm-workspace/vllm-ascend/vllm_ascend/utils.py", line 307, in update_aclgraph_sizes
    num_hidden_layers = vllm_config.model_config.hf_config.num_hidden_layers
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/configuration_utils.py", line 211, in __getattribute__
    return super().__getattribute__(key)
AttributeError: 'Qwen2AudioConfig' object has no attribute 'num_hidden_layers'
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes: https://github.com/vllm-project/vllm-ascend/issues/1780
https://github.com/vllm-project/vllm-ascend/issues/1760
https://github.com/vllm-project/vllm-ascend/issues/1276
https://github.com/vllm-project/vllm-ascend/issues/359

- vLLM version: v0.10.0
- vLLM main:
7728dd77bb

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-07-26 20:13:00 +08:00
Li Wang
bdfb065b5d [1/2/N] Enable pymarkdown and python __init__ for lint system (#2011)
### What this PR does / why we need it?
1. Enable pymarkdown check
2. Enable python `__init__.py` check for vllm and vllm-ascend
3. Make clean code

### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
29c6fbe58c

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-25 22:16:10 +08:00
Mengqing Cao
8cfd257992 [Dist][EP] Remove ETP/EP maintained in vllm-ascend (#1681)
### What this PR does / why we need it?
Remove ETP/EP maintained in branch main. We drop this as there is no
relevant scenarios to use ETP now, and we may subsequently advocate
implementing expert tensor parallelism in vLLM to support scenarios
where the expert is needed to be sliced

This is a part of #1422 backport.

Fixes https://github.com/vllm-project/vllm-ascend/issues/1396
https://github.com/vllm-project/vllm-ascend/issues/1154

### Does this PR introduce _any_ user-facing change?
We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in
vllm instead.

### How was this patch tested?
CI passed with new added and existing test.


- vLLM version: v0.9.2
- vLLM main:
fe8a2c544a

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-07-21 09:08:04 +08:00
Zhu Yi Lin
538dd357e6 Add graph mode and improve on multi_npu_moge.md (#1849)
### What this PR does / why we need it?
Add graph mode and improve on multi_npu_moge.md

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
CI passed with new existing test.


- vLLM version: v0.9.2
- vLLM main:
5a7fb3ab9e

Signed-off-by: GDzhu01 <809721801@qq.com>
2025-07-17 17:53:37 +08:00
wangxiyuan
eb921d2b6f [Doc] Fix 404 error (#1797)
Fix url 404 error in doc
- vLLM version: v0.9.2
- vLLM main:
9ad0a4588b

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-15 11:52:38 +08:00
Li Wang
afcfe91dfa [Doc] Fix multi node doc (#1783)
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
Pin docker image to latest release
### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
1e9438e0b0

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-14 17:56:57 +08:00
wangxiyuan
b5b7e0ecc7 [Doc] Add qwen3 embedding 8b guide (#1734)
1. Add the tutorials for qwen3-embedding-8b
2. Remove VLLM_USE_V1=1  in docs, it's useless any more from 0.9.2


- vLLM version: v0.9.2
- vLLM main:
5923ab9524

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-11 17:40:17 +08:00
wangxiyuan
3d1e6a5929 [Doc] Update user doc index (#1581)
Add user doc index to make the user guide more clear
- vLLM version: v0.9.1
- vLLM main:
49e8c7ea25

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-10 14:26:59 +08:00
Li Wang
0c4aa2b4f1 [Doc] Add multi node data parallel doc (#1685)
### What this PR does / why we need it?
 add multi node data parallel doc
### Does this PR introduce _any_ user-facing change?
 add multi node data parallel doc
### How was this patch tested?

- vLLM version: v0.9.1
- vLLM main:
805d62ca88

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-10 09:36:37 +08:00
leo-pony
b4b19ea588 [Doc] Add multi-npu qwen3-MoE-32B Tutorials (#1419)
Signed-off-by: leo-pony <nengjunma@outlook.com>

### What this PR does / why we need it?
Add multi-npu qwen3-MoE-32B Tutorials
Relate RFC: https://github.com/vllm-project/vllm-ascend/issues/1248
- vLLM version: v0.9.1
- vLLM main:
5358cce5ff

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-07-10 09:06:51 +08:00
leo-pony
53ec583bbb [Docs] Update Altlas 300I series doc and fix CI lint (#1537)
### What this PR does / why we need it?
- Update Altlas 300I series doc: cleanup unused parameters and enable
optimized ops
- Fix code spell CI

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-06-30 23:34:00 +08:00
Yikun Jiang
e4df0a4395 Add Pangu MoE Pro for 300I series docs (#1516)
### What this PR does / why we need it?
Add Pangu MoE Pro for 300I series docs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-06-30 13:37:22 +08:00
Yikun Jiang
cad4c693c6 Add Pangu MoE Pro docs (#1512)
### What this PR does / why we need it?
This PR add Pangu MoE Pro 72B docs

[1] https://gitcode.com/ascend-tribe/pangu-pro-moe-model

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-06-30 12:15:33 +08:00
Shanshan Shen
99e685532d [Doc] Add Qwen2.5-VL eager mode doc (#1394)
### What this PR does / why we need it?
Add Qwen2.5-VL eager mode doc.

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-06-28 09:08:51 +08:00
Shanshan Shen
4e2daf5ab7 [Doc] Add qwen2-audio eager mode tutorial (#1371)
### What this PR does / why we need it?
Add qwen2-audio eager mode tutorial.


Signed-off-by: shen-shanshan <467638484@qq.com>
2025-06-26 16:56:05 +08:00
leo-pony
1025344912 Doc Enhancement: Single NPU(Qwen3-8B) aclgraph mode + eager mode (#1374)
### What this PR does / why we need it?
Doc Enhancement: Single NPU(Qwen3-8B) aclgraph mode + eager mode.
Relate RFC: https://github.com/vllm-project/vllm-ascend/issues/1248

### Does this PR introduce _any_ user-facing change?
No changes.


### How was this patch tested?
Preview

 Signed-off-by: leo-pony <nengjunma@outlook.com>

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-06-26 16:52:54 +08:00
Yikun Jiang
2e5f312530 Cleanup ununsed doc (#1352)
### What this PR does / why we need it?
Cleanup ununsed doc for MoGE model, we will add back this when MoGE
model ready.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-06-22 15:05:30 +08:00
Yikun Jiang
c30ddb8331 Bump v0.9.1rc1 release (#1349)
### What this PR does / why we need it?
Bump v0.9.1rc1 release

Closes: https://github.com/vllm-project/vllm-ascend/pull/1341
Closes: https://github.com/vllm-project/vllm-ascend/pull/1334

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed


---------

Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: shen-shanshan <467638484@qq.com>
2025-06-22 13:15:36 +08:00
22dimensions
c464c32b81 add doc for offline quantization inference (#1009)
add example for offline inference with quantized model

Signed-off-by: 22dimensions <waitingwind@foxmail.com>
2025-05-29 17:32:42 +08:00
22dimensions
d5401a08be [DOC] update modelslim version (#908)
1. update modelslim version to fix deepseek related issues
2. add note for "--quantization ascend"

Signed-off-by: 22dimensions <waitingwind@foxmail.com>
2025-05-21 09:12:02 +08:00
22dimensions
a8730e7a3c [Doc] update quantization docs with QwQ-32B-W8A8 example (#835)
1. replace deepseek-v2-lite model with more pratical model QwQ 32B
2. fix some incorrect commands
3. replase modelslim version with a more formal tag

Signed-off-by: 22dimensions <waitingwind@foxmail.com>
2025-05-17 15:25:17 +08:00
wangxiyuan
6193ba679b [CI] add codespell CI and fix format.sh (#827)
1. Fix format check error to make format.sh work
2. Add codespell check CI 
3. Add the missing required package for vllm-ascend.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-05-12 22:04:48 +08:00
Yikun Jiang
d39855b075 Update installation and tutorial doc (#711)
### What this PR does / why we need it?
Update installation and tutorial doc

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-04-28 21:52:17 +08:00
Li Wang
d0a0c81ced [Doc] Add deepsee-v2-lite w8a8 quantization turorial (#630)
### What this PR does / why we need it?
Add deepsee-v2-lite w8a8 quantization turorial

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-04-28 17:14:26 +08:00