Commit Graph

60 Commits

Author SHA1 Message Date
Shanshan Shen
985b0548b0 [Doc] Update v0.8.4 release note, add contents for structured output feature (#576)
### What this PR does / why we need it?
Update v0.8.4 release note:

- Add contents for structured output feature.
- Remove redundant `(` in spec decoding.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
Preview

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-04-18 17:44:16 +08:00
Mengqing Cao
2c903bc7ac [Doc] Update doc for custom ops build (#570)
- update doc about custom ops compile

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-04-18 15:35:10 +08:00
Mengqing Cao
b91f9a5afd [Doc][Build] Update build doc and faq (#568)
Update build doc and faq about deepseek w8a8

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-04-18 14:16:41 +08:00
wangxiyuan
e66ded5679 [Doc] Add release note for 0.8.4rc1 (#557)
Add release note for 0.8.4rc1, we'll release 0.8.4rc1 now.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-18 13:24:36 +08:00
Shanshan Shen
7eeff60715 [Doc] Update FAQ doc (#561)
### What this PR does / why we need it?
Update FAQ doc to make `docker pull` more clear


Signed-off-by: shen-shanshan <467638484@qq.com>
2025-04-18 13:13:13 +08:00
Mengqing Cao
b71f193cb0 [Model][Doc] Update model support list (#552)
Update model support list
cc @Yikun plz help review, thanks!

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-04-17 19:32:20 +08:00
hfadzxy
9935d45728 [CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460)
### What this PR does / why we need it?
Add model basic accuracy test(Qwen2.5-0.5B-Instruct)

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-04-17 14:59:56 +08:00
Li Wang
64fdf4cbef [Doc]Update faq (#536)
### What this PR does / why we need it?
update performance and accuracy faq

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-04-17 14:56:51 +08:00
hfadzxy
00de2ee6ad [Doc] update faq about progress bar display issue (#538)
### What this PR does / why we need it?
update faq about progress bar display issue

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-04-16 16:07:08 +08:00
Mengqing Cao
fe13cd9ea5 [Doc] update faq about w8a8 (#534)
update faq about w8a8

---------

Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-04-16 09:37:21 +08:00
wangxiyuan
bbe7ccd366 [MISC] Add patch module (#526)
This PR added patch module for vllm
1. platform patch: the patch will be registered when load the platform
2. worker patch: the patch will be registered when worker is started.

The detail is:
1. patch_common: patch for main and 0.8.4 version
4. patch_main: patch for main verison
5. patch_0_8_4: patch for 0.8.4 version
2025-04-16 09:28:58 +08:00
Shanshan Shen
bcbc04f92b [Doc] Add environment variables doc (#519)
### What this PR does / why we need it?
Add environment variables doc.
---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-04-15 16:09:36 +08:00
wangxiyuan
5c6d79687c [Doc] Update FAQ (#518)
Update FAQ

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-15 10:17:56 +08:00
wangxiyuan
5fa70b6393 [Build] Update doc (#509)
1. install torch-npu before vllm-ascend to ensure custom ops build
success.
2. set `COMPILE_CUSTOM_KERNELS=0` if users want to disable custom ops
build.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-14 14:38:50 +08:00
Shanshan Shen
11ecbfdb31 [Doc] Update FAQ doc (#504)
### What this PR does / why we need it?
Update FAQ doc.
---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-04-14 11:11:40 +08:00
wangxiyuan
9c7428b3d5 [CI] enable custom ops build (#466)
### What this PR does / why we need it?
This PR enable custom ops build  by default. 

### Does this PR introduce _any_ user-facing change?

Yes, users now install vllm-ascend from source will trigger custom ops
build step.

### How was this patch tested?
By image build and e2e CI

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-12 10:24:53 +08:00
Icey
d05ea17427 Add openEuler based container image for vLLM Ascend (#489)
### What this PR does / why we need it?

Provide users with openEuler-based vllm images, so modify the quick
start readme

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

There is no need for performing any test.

---------

Signed-off-by: Icey <1790571317@qq.com>
2025-04-10 14:30:49 +08:00
jinyuxin
5d6239306b [DOC] Update multi_node.md (#468)
### What this PR does / why we need it?
- Added instructions for verifying multi-node communication environment.
- Included explanations of Ray-related environment variables for
configuration.
- Provided detailed steps for launching services in a multi-node
environment.
### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
manually tested.

Signed-off-by: jinyuxin <jinyuxin2@huawei.com>
2025-04-08 14:19:57 +08:00
hfadzxy
94bf9c379e [Doc]Add developer guide for using lm-eval (#456)
### What this PR does / why we need it?
Add developer guide for using lm-eval

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
test manually

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-04-01 23:43:51 +08:00
Yikun Jiang
c42e21a5aa [Docs] Add install system dependencies in install doc (#438)
### What this PR does / why we need it?
Add install system dependencies in install doc

Resolve:
```
$ pip install vllm==v0.7.3
CMake Error at CMakeLists.txt:14 (project):
  No CMAKE_CXX_COMPILER could be found.
  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.
// ... ...
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for vllm
Failed to build vllm
ERROR: Failed to build installable wheels for some pyproject.toml based projects (vllm)
```

Closes: https://github.com/vllm-project/vllm-ascend/issues/439 


### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-31 14:17:55 +08:00
hfadzxy
7beb4339dc [Doc]Add developer guide for using OpenCompass (#368)
### What this PR does / why we need it?
Add developer guide for using OpenCompass

### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?

test manually

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-31 00:24:25 +08:00
wangxiyuan
ca8b1c3e47 [Doc] Add 0.7.3rc2 release note (#419)
Add 0.7.3rc2 release note. We'll release 0.7.3rc2 right now.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-29 09:02:08 +08:00
Tony
b1557abab6 fix multistep bug,remove uselesscodes (#355)
1. remove useluss code in attention.py
2. multistep now using StatefulModelInputForNPU and do not use
StatefulModelInput

Signed-off-by: new-TonyWang <wangtonyyu222@gmail.com>
2025-03-28 09:55:35 +08:00
Zhenyu Zheng
0b5a9643fd Add an example for user stories (#399)
Add an example for user stories and fix some typo

Add a new section, user story in the docs, to collect user stories of
llvm-ascend, also add an example and the issue template to collect user
story

Signed-off-by: Zhenyu Zheng <zheng.zhenyu@outlook.com>
2025-03-26 16:25:57 +08:00
Mengqing Cao
d4accf4ec2 [Doc][Model] update LLaVA 1.6 support (#373)
update LLaVA 1.6 support

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-03-26 09:07:55 +08:00
Mengqing Cao
6295d2e9bc [CI/Build][Doc] upgrade torch-npu to 0320 (#392)
### What this PR does / why we need it?
This pr upgrades torch-npu to 0320, so that #321,
https://github.com/vllm-project/vllm-ascend/issues/267#issuecomment-2745045743
could be fixed, and #372 should be reverted after this pr

### Does this PR introduce _any_ user-facing change?
upgrade torch-npu to 0320

### How was this patch tested?
tested locally with long seq inferencing.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-03-26 09:04:12 +08:00
Shanshan Shen
3fb3b5cf75 [Doc] Update model support doc (add QwQ-32B) (#388)
### What this PR does / why we need it?

Update model support doc (add QwQ-32B)


Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>
2025-03-25 11:40:50 +08:00
Shanshan Shen
c06af8b2e0 [V1][Core] Add support for V1 Engine (#295)
### What this PR does / why we need it?
Add support for V1 Engine.

Please note that this is just the initial version, and there may be some
places need to be fixed or optimized in the future, feel free to leave
some comments to us.

### Does this PR introduce _any_ user-facing change?

To use V1 Engine on NPU device, you need to set the env variable shown
below:

```bash
export VLLM_USE_V1=1
export VLLM_WORKER_MULTIPROC_METHOD=spawn
```

If you are using vllm for offline inferencing, you must add a `__main__`
guard like:

```bash
if __name__ == '__main__':

    llm = vllm.LLM(...)
```

Find more details
[here](https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html#python-multiprocessing).

### How was this patch tested?
I have tested the online serving with `Qwen2.5-7B-Instruct` using this
command:

```bash
vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
```

Query the model with input prompts:

```bash
curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen2.5-7B-Instruct",
        "prompt": "The future of AI is",
        "max_tokens": 7,
        "temperature": 0
    }'
```

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
Co-authored-by: didongli182 <didongli@huawei.com>
2025-03-20 19:34:44 +08:00
Shanshan Shen
441a62e937 [Doc] Fix bugs of installation doc and format tool (#330)
### What this PR does / why we need it?
Fix bugs of installation doc and format tool.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
no.

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-03-14 10:21:35 +08:00
wangxiyuan
c25631ec7b [Doc] Add the release note for 0.7.3rc1 (#285)
Add the release note for 0.7.3rc1

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-13 17:57:06 +08:00
Li Wang
41aba1cfc1 [Doc]Fix tutorial doc expression (#319)
Fix tutorial doc expression

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-03-13 15:24:05 +08:00
xiemingda
59ea23d0d3 [Doc] Add Single NPU (Qwen2.5-VL-7B) tutorial (#311)
Run vllm-ascend on Single NPU

What this PR does / why we need it?
Add vllm-ascend tutorial doc for Qwen/Qwen2.5-VL-7B-Instruct model
Inference/Serving doc

Does this PR introduce any user-facing change?
no

How was this patch tested?
no

Signed-off-by: xiemingda <xiemingda1002@gmail.com>
2025-03-12 20:37:12 +08:00
Yikun Jiang
007aeaa48b [Doc] Change distributed_executor_backend to mp (#287)
### What this PR does / why we need it?
Fix `ValueError: Unrecognized distributed executor backend tp. Supported
values are 'ray', 'mp' 'uni', 'external_launcher' or custom ExecutorBase
subclass.`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Test on my local node

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-10 11:27:26 +08:00
Yikun Jiang
38334f5daa [Docs] Re-arch on doc and make QwQ doc work (#271)
### What this PR does / why we need it?
Re-arch on tutorials, move singe npu / multi npu / multi node to index.
- Unifiy docker run cmd
- Use dropdown to hide build from source installation doc
- Re-arch tutorials to include Qwen/QwQ/DeepSeek
- Make QwQ doc works

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI test



Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-10 09:27:48 +08:00
Yikun Jiang
18bb8d1f52 Adapt vLLM requirements changes to fix main CI (#279)
### What this PR does / why we need it?
Adapt vLLM requirements changes:
206e2577fa (diff-01ec17406c969585ed075609a2bbf2f2f4fe3e3def36946694abe6d4eb60a6f2)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-09 16:07:45 +08:00
Yikun Jiang
be58d5f3d8 Bump torch_npu version to dev20250308.3 (#276)
### What this PR does / why we need it?
Bump torch_npu version to dev20250308.3 to fix performance regression on
multi-stream case:
e04c580d07
.


### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-09 15:59:15 +08:00
Mengqing Cao
91f7d8115d [CI/Build] Bump torch_npu to dev20250307.3 (#265)
Update torch-npu version to fix torch npu exponential_ accuracy
With this update, the percision issue when setting `temperature > 0` is
fixed.

---------

Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-03-07 20:34:07 +08:00
Yikun Jiang
cff08f9df8 [Doc] Add initial FAQs (#247)
### What this PR does / why we need it?
Add initial FAQs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-06 10:42:42 +08:00
wangxiyuan
ae49bfd13a [Core] Support pooling (#229)
This PR added pooling support for vllm-ascend

Tested with `bge-base-en-v1.5` by encode:
```
from vllm import LLM

# Sample prompts.
prompts = [
  "Hello, my name is",
  "The president of the United States is",
  "The capital of France is",
  "The future of AI is",
]
# Create an LLM.
model = LLM(model="./bge-base-en-v1.5", enforce_eager=True)
# Generate embedding. The output is a list of EmbeddingRequestOutputs.
outputs = model.encode(prompts)
# Print the outputs.
for output in outputs:
    print(output.outputs.embedding)  # list of 4096 floats
```

Tested by embedding:
```
from vllm import LLM, SamplingParams

llm = LLM(model="./bge-base-en-v1.5", task="embed")
(output,) = llm.embed("Hello, my name is")

embeds = output.outputs.embedding
print(f"Embeddings: {embeds!r} (size={len(embeds)})")
```

Related: https://github.com/vllm-project/vllm-ascend/issues/200

## Known issue
The accuracy is not correct since this feature rely on `enc-dec`
support. It'll be done in the following PR by @MengqingCao

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-04 15:59:34 +08:00
Shanshan Shen
8fda31cafe [Doc] Update Feature Support doc (#234)
### What this PR does / why we need it?
Update Feature Support doc.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
no.

---------

Signed-off-by: Shanshan Shen <467638484@qq.com>
2025-03-04 14:18:32 +08:00
Yikun Jiang
ebe14f20cf Recover vllm-ascend dev image (#209)
### What this PR does / why we need it?
Recover vllm-ascend dev image

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-03 09:08:41 +08:00
Yikun Jiang
6e358c4bef Add Document Branch Policy (#217)
### What this PR does / why we need it?
Add Document Branch Policy

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Related: https://github.com/vllm-project/vllm-ascend/issues/214

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-03 09:07:39 +08:00
Mengqing Cao
03dc5c01fd [Doc] update multinode doc (#181)
Update multinode doc
fix #167 #168

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-27 19:29:49 +08:00
wangxiyuan
6042c210bc [CI] upgrade to newest pta (#187)
Upgrade to newest torch-npu

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
2025-02-27 16:40:23 +08:00
Shanshan Shen
ee43179767 [ModelRunner] Fix cuda hard code in model runner (#155)
### What this PR does / why we need it?
1. Fix cuda hard code in model runner.
2. Fix tutorials doc rendering error.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
no.

Signed-off-by: Shanshan Shen <467638484@qq.com>
2025-02-27 14:16:46 +08:00
wangxiyuan
51ae37b22a [Doc] update readme (#147)
Fix doc issue in README

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-25 11:00:58 +08:00
Yikun Jiang
d21b3be685 Mark v0.7.1 as unmaintained and v0.7.3 as maintained (#139)
### What this PR does / why we need it?
Mark v0.7.1 as unmaintained and v0.7.3 as maintained:
vLLM released the v0.7.3 version:
https://github.com/vllm-project/vllm/releases/tag/v0.7.3 which include
serval commits:
- https://github.com/vllm-project/vllm/pull/12874
- https://github.com/vllm-project/vllm/pull/12432
- https://github.com/vllm-project/vllm/pull/13208

We'd better to bump the versions to v0.7.3.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-21 22:41:44 +08:00
HongtaoYang
fd2cc1b883 [Docs] Add Tutorials for Online Serving on Multi Machine (#120)
Add Tutorials for Online Serving on Multi Machine

---------

Signed-off-by: SidaoY <1024863041@qq.com>
Co-authored-by: yx0716 <jinyx1007@foxmail.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-02-21 11:03:00 +08:00
Yikun Jiang
3a4ce2aa15 [Docs] Fix vllm and vllm-ascend version (#107)
### What this PR does / why we need it?

Fix vllm and vllm-ascend version 

| branch/tag | vllm_version |
vllm_ascend_version|pip_vllm_ascend_version|pip_vllm_version|
|----|----|----|----|----|
| main | main | main | v0.7.1rc1 | v0.7.1 |
| v0.7.1-dev | v0.7.1 | v0.7.1rc1 | v0.7.1rc1 | v0.7.1 |
| v0.7.1rc1 | v0.7.1 | v0.7.1rc1 | v0.7.1rc1 | v0.7.1 |

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-20 11:05:35 +08:00
wangxiyuan
cff03a4913 [CI] change to quay.io (#102)
change docker registry to quay

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-19 17:04:46 +08:00