### What this PR does / why we need it?
bugfix for mtp in fullgraph
### Does this PR introduce _any_ user-facing change?
no
---------
Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
### What this PR does / why we need it?
we notice that `patch_main` is never used. Usually the patch is for all
version. And if it's for specified version, we can use `vllm_version_is`
instead. So let's remove the useless sub folder in patch module to make
it clear.
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Fix the wrong title of the modelrunner_prepare_inputs docs
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
pass CI
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
Signed-off-by: ChenTaoyu-SJTU <ctynb@qq.com>
### What this PR does / why we need it?
To help more developers quickly get started with vLLM, we need to write
clear and easy-to-understand code documentation and technical
interpretations. This will effectively lower the learning curve, attract
more excellent contributors, and collectively build a better developer
community.
Add ModelRunner_prepare_inputs doc
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
Pass CI
- vLLM version: v0.10.0
- vLLM main:
4be02a3776
---------
Signed-off-by: ChenTaoyu-SJTU <ctynb@qq.com>
### What this PR does / why we need it?
Update user guide for using lm-eval
1. add using lm-eval on online server
2. add using offline datasets
- vLLM version: v0.10.0
- vLLM main:
9edd1db02b
---------
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it?
- Upgrade to v0.10.0
- Drop v0.9.2 version compatibility
- Add patch for
`vllm_ascend/patch/worker/patch_common/patch_sampler_gather_logprobs.py`
as workaround of
f3a683b7c9
for v0.10.0 and also add e2e test `test_models_prompt_logprobs`
- Pin transformers<4.54.0 as workaround of
https://github.com/vllm-project/vllm-ascend/issues/2034
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Test locally:
`VLLM_USE_MODELSCOPE=true pytest -sv
tests/e2e/singlecard/test_offline_inference.py::test_models_prompt_logprobs`
- CI passed
- vLLM version: v0.9.2
- vLLM main:
7728dd77bb
---------
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
1. Enable pymarkdown check
2. Enable python `__init__.py` check for vllm and vllm-ascend
3. Make clean code
### How was this patch tested?
- vLLM version: v0.9.2
- vLLM main:
29c6fbe58c
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
1. Add the tutorials for qwen3-embedding-8b
2. Remove VLLM_USE_V1=1 in docs, it's useless any more from 0.9.2
- vLLM version: v0.9.2
- vLLM main:
5923ab9524
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Follow vllm-project/vllm lint way:
https://github.com/vllm-project/vllm/blob/main/.pre-commit-config.yaml
Enable pre-commit to avoid some low level error AMAP.
This pr is one step of #1241, The purpose is make linting system more
clear and convenient, on this step, Mainly did the following things:
yapf, actionlint, ruff, typos, isort, mypy, png-lint, signoff-commit,
enforce-import-regex-instead-of-re.
TODO:
- clang-format(check for csrc with google style)
need clean code, disable for now
- pymarkdown
need clean code, disable for now
- shellcheck
need clean code, disable for now
### Does this PR introduce _any_ user-facing change?
Only developer UX change:
https://vllm-ascend--1256.org.readthedocs.build/en/1256/developer_guide/contributing.html#run-lint-locally
```
pip install -r requirements-lint.txt && pre-commit install
bash format.sh
```
### How was this patch tested?
CI passed with new added/existing test.
Co-authored-by: Yikun [yikunkero@gmail.com](mailto:yikunkero@gmail.com)
Co-authored-by: wangli
[wangli858794774@gmail.com](mailto:wangli858794774@gmail.com)
- vLLM version: v0.9.1
- vLLM main:
5358cce5ff
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
vllm has released 0.9.2. This PR drop 0.9.1 support.
- vLLM version: v0.9.1
- vLLM main:
b942c094e3
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Use Base test and cleanup all manaul patch code
- Cleanup EPLB config to avoid tmp test file
- Use BaseTest with global cache
- Add license
- Add a doc to setup unit test in local env
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Add guidance on how to implement and register new models.
Modified based on PR
https://github.com/vllm-project/vllm-ascend/pull/1126, thanks for the
contribution of @linfeng-yuan.
---------
Signed-off-by: shen-shanshan <467638484@qq.com>
1. Format the developer guide content to make it more clear
2. Add the patch doc for developer guide
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
- Fix
[doctest](https://github.com/vllm-project/vllm-ascend/actions/workflows/vllm_ascend_doctest.yaml?query=event%3Aschedule)
- add system package installation
- Add doc for run doctests
- Cleanup all extra steps in .github/workflows/vllm_ascend_doctest.yaml
- Change schedule job from 4 ---> 12 hours
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- doctest CI passed
- Local test with
`/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh`.
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
1. Add `__init__.py` for vllm_ascend/compilation to make sure it's a
python module
2. Fix model runner bug to keep the same with vllm
3. Add release note for 0.9.0rc2
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Make accuarcy CI and report work
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manaully review
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
1. Update 0.9.0rc1 release date
2. Update feature and model support list
3. Add DP known issue to release note
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
We need to **observe the time consumed in each stage of inference
(including pre-processing, model forward, etc.), without any performance
loss**.
Therefore, we use the event timestamp mechanism of the NPU to mark any
stage during the execution of the NPU device (this marking operation is
executed asynchronously, with no performance loss).
Additionally, we provide a blocking synchronization API
`pop_captured_sync` to be called at an appropriate time, to print the
time consumed in all observed stages.
**model_runner_v1.py file only changed 5 lines, all of which were
`ProfileExecuteDuration()` calls, and nothing else was changed, while
more changes were showed due to the alignment issue.**
### Does this PR introduce _any_ user-facing change?
Use env `VLLM_MODEL_EXECUTE_TIME_OBSERVE `to enable this feature
### How was this patch tested?
Tested in deepseek model,Print like this:
```
5691:(IntegratedWorker pid=1502285) Profile execute duration [Decode]: [post process]:14.17ms [prepare input and forward]:9.57ms [forward]:4.14ms
5695:(IntegratedWorker pid=1502285) Profile execute duration [Decode]: [post process]:14.29ms [prepare input and forward]:10.19ms [forward]:4.14ms
5697:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.81ms [prepare input and forward]:10.29ms [forward]:3.99ms
5701:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.10ms [prepare input and forward]:10.62ms [forward]:4.33ms
5705:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.65ms [prepare input and forward]:9.58ms [forward]:4.20ms
5709:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.43ms [prepare input and forward]:9.88ms [forward]:4.20ms
5711:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.89ms [prepare input and forward]:10.49ms [forward]:4.19ms
5715:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.14ms [prepare input and forward]:11.21ms [forward]:4.18ms
5719:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.71ms [prepare input and forward]:10.15ms [forward]:4.42ms
5723:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.62ms [prepare input and forward]:10.31ms [forward]:4.25ms
5725:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.12ms [prepare input and forward]:10.33ms [forward]:4.24ms
5729:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.58ms [prepare input and forward]:10.85ms [forward]:4.32ms
5733:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.32ms [prepare input and forward]:9.79ms [forward]:4.28ms
5737:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:15.06ms [prepare input and forward]:9.89ms [forward]:4.32ms
5739:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.62ms [prepare input and forward]:10.48ms [forward]:4.27ms
5743:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.60ms [prepare input and forward]:10.71ms [forward]:4.61ms
5747:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.21ms [prepare input and forward]:10.10ms [forward]:4.52ms
5751:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:15.03ms [prepare input and forward]:10.00ms [forward]:4.42ms
```
---------
Signed-off-by: depeng1994 <depengzhang@foxmail.com>
1. Fix format check error to make format.sh work
2. Add codespell check CI
3. Add the missing required package for vllm-ascend.
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
- Move Release Compatibility Matrix to top
- Remove v0.7.x rc info because v0.7.3 final release alread published
- Rename vllm-ascend to vLLM Ascend
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Preview
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Add 0.8.5rc1 release note and bump vllm version to v0.8.5.post1
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
---------
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
1. Provide accuracy test report for development branch release.
2. Models and datasets for accuracy test:
| Model | datasets |
|---------------------------- | --------------------------- |
| Qwen2.5-7B-Instruct | ceval-val, gsm8k, mmlu |
| Qwen3-8B | ceval-val, gsm8k, mmlu |
| Llama-3.1-8B-Instruct | ceval-val, gsm8k, mmlu |
| Qwen2.5-VL-7B-Instruct | mmmu_val |
### Does this PR introduce _any_ user-facing change?
This PR will display the accuracy test report of the release versionin
docs/source/developer_guide/accuracy_report。
Qwen2.5-7B-Instruct.md
Qwen3-8B.md
Llama-3.1-8B-Instruct.md
Qwen2.5-VL-7B-Instruct .md
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Sometimes, user install a dev/editable version of vllm. In this case, we
should make sure vllm-ascend works as well.
This PR add a new env `VLLM_VERSION`. It's used for developers who edit
vllm. In this case, developers should set thie env to make sure which
vllm version is installed and used.
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
The torch-npu 2.5.1 are published:
https://pypi.org/project/torch-npu/2.5.1/
It's time to remove all torch-npu dev version from vllm-ascend code base
### Does this PR introduce _any_ user-facing change?
Yes, using torch-npu 2.5.1
### How was this patch tested?
- [ ] CI passed
- [ ] Manually test
- [ ] Grep all `dev2025`
---------
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
1. remove Chinese doc. The content is out of data and we don't have
enough time to maintain it.
2. Update feature support matrix. Refresh the content and add V1 status.
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
1. install torch-npu before vllm-ascend to ensure custom ops build
success.
2. set `COMPILE_CUSTOM_KERNELS=0` if users want to disable custom ops
build.
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Add developer guide for using lm-eval
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
test manually
---------
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Add developer guide for using OpenCompass
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
test manually
---------
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
This pr upgrades torch-npu to 0320, so that #321,
https://github.com/vllm-project/vllm-ascend/issues/267#issuecomment-2745045743
could be fixed, and #372 should be reverted after this pr
### Does this PR introduce _any_ user-facing change?
upgrade torch-npu to 0320
### How was this patch tested?
tested locally with long seq inferencing.
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it?
Re-arch on tutorials, move singe npu / multi npu / multi node to index.
- Unifiy docker run cmd
- Use dropdown to hide build from source installation doc
- Re-arch tutorials to include Qwen/QwQ/DeepSeek
- Make QwQ doc works
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI test
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>