Commit Graph

89 Commits

Author SHA1 Message Date
lilinsiman
52863c4165 [Refactor][EAGLE] 2/N: load model and generate token (#5437)
### What this PR does / why we need it?
1. Refactor eagle and mtp function: load_model and generate_token_ids
2. Remove redundant code in mtp and eagle file
3. Refactor the UT of file

2/N of Refactor and merge mtp and eagle
Relational RFC: https://github.com/vllm-project/vllm-ascend/issues/5467

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut and tests

- vLLM version: release/v0.13.0
- vLLM main:
81786c8774

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
2026-01-05 14:07:54 +08:00
L4
c23cf30709 [Doc] eval-type not support service but server (#2920)
### What this PR does / why we need it?
fix wrong eval-type in accuracy doc

- vLLM version: v0.10.2
- vLLM main:
fec347dee1

Signed-off-by: root <root@liaolile-laptop.localdomain>
Co-authored-by: root <root@liaolile-laptop.localdomain>
2026-01-05 11:17:39 +08:00
InSec
7cf65d0581 [Doc]modify the quantization user guide and add a quantization adaptation developer guide (#5554)
### What this PR does / why we need it?
This PR makes the following modifications:
1.delete the `user_guide/feature_guide/quantization-llm-compressor.md`
and merge it into `user_guide/feature_guide/quantization.md`.
2.update the content of `user_guide/feature_guide/quantization.md`.
3.add guidance `developer_guide/feature_guide/quantization.md' on the
adaptation of quantization algorithms and quantized models.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
7157596103

---------

Signed-off-by: IncSec <1790766300@qq.com>
Signed-off-by: InSec <1790766300@qq.com>
2026-01-05 09:12:11 +08:00
Li Wang
2ee17e50a1 [2/N] Upgrade nightly doc (#5534)
### What this PR does / why we need it?
Follow up https://github.com/vllm-project/vllm-ascend/pull/5479, upgrade
the corresponding doc for developers

- vLLM version: v0.13.0
- vLLM main:
45c1ca1ca1

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-31 09:11:42 +08:00
zhangsicheng5
8ed87dfa84 [doc] Add context parallel user guide (#5358)
1. Add context parallel user guide
2. Add context parallel related message in supported features/models
- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com>
2025-12-26 17:03:47 +08:00
Qiu
da0b113cf5 [doc]<PCP&DCP> add developer guide for PCP&DCP (#5372)
### What this PR does / why we need it?
add developer guide for PCP&DCP
- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
2025-12-26 16:17:38 +08:00
wangxiyuan
29d2fe653d cleanup ascend config (#5296)
1. refresh additional config doc
2. move kv config logic to platform.
3. improve `dump_config` init logic and rename it to `dump_config_path`
this change is user impacted. dump_config is changed from dict to
string.
4. correct `enable_async_exponential` type
5. remove useless `chunked_prefill_for_mla`

- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-26 14:07:37 +08:00
Li Wang
5ab6d124e5 [Doc] Add a perf tune section (#5127)
### What this PR does / why we need it?
This patch purpose to 
1. add a  section on os point of perf tune doc
2. Set some default env in the image for performance

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-19 14:52:52 +08:00
Li Wang
7d32371b7e [Doc] Refact benchmark doc (#5173)
### What this PR does / why we need it?
Refactor some outdated doc

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-18 22:26:13 +08:00
Ronald
b69b04d3a9 implement model runner v2 basic framework (#5051)
### What this PR does / why we need it?
This PR aim to implement model runner v2 basic framework in vllm-ascend,
the e2e function is not guaranteed by this pr.
 
### Does this PR introduce _any_ user-facing change?
use envs.VLLM_USE_V2_MODEL_RUNNER to decide if choose model_runenr_v2.

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
2025-12-18 15:51:54 +08:00
wangxiyuan
8090914d69 [CI] CI refactor (#4928)
1. rename workflow to better name
2. fix lint error
3. remove accuracy report doc and test

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-14 11:09:56 +08:00
wangxiyuan
e538fa6f9c [Doc] Update tutorial index (#4920)
Update tutorial index and remove useless doc

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-11 20:53:13 +08:00
zhangyiming
c95c271538 [E2E] Optimize nightly testcase. (#4886)
### What this PR does / why we need it?
Optimize nightly testcase.
Changes:
- tests/e2e/nightly/multi_node/config/models/Qwen3-235B-A3B.yaml: Add
accuracy and performance benchmark
- tests/e2e/models/configs/Qwen3-8B-Base.yaml: Delete
- tests/e2e/models/configs/internlm-7b.yaml: Change to
internlm3-8b-instruct
- tests/e2e/nightly/models/test_deepseek_r1_w8a8_eplb.py: Change to
DeepSeek-R1-0528-W8A8 model

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: menogrey <1299267905@qq.com>
2025-12-11 10:15:39 +08:00
zhangyiming
66b0781840 [E2E] Refactor the e2e testcases. (#4789)
### What this PR does / why we need it?
Refactor the e2e testcases.
- tests/e2e/multicard/test_weight_loader.py: Remove the unused code.
- tests/e2e/singlecard/multi-modal/test_internvl.py: Move to accuracy
test.
- tests/e2e/singlecard/test_aclgraph.py: Rename the file.
- tests/e2e/singlecard/test_embedding_aclgraph.py : Combine with
tests/e2e/singlecard/test_bge_model.py
- tests/e2e/singlecard/test_completion_with_prompt_embeds.py: Delete
eager mode and modify model to Qwen3-0.6B
- tests/e2e/singlecard/test_quantization.py: Modify model to
Qwen3-0.6B-W8A8
- tests/e2e/singlecard/test_vlm.py: Modify model to Qwen3-VL-8B

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: menogrey <1299267905@qq.com>
2025-12-11 10:15:00 +08:00
Nengjun Ma
0eefbe75b6 [Doc] Add local running multi-node nightly test case guide (#4884)
### What this PR does / why we need it?
Add local running multi-node nightly test case guide, help running
locally at developer env.
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
Test with local running multi-node test.
Using this document can successfully start multi-node night e2e in
locall

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-12-11 08:56:27 +08:00
wangxiyuan
835b4c8f1d Drop torchair (#4814)
aclgraph is stable and fast now. Let's drop torchair graph mode now.

TODO: some logic to adapt torchair should be cleaned up as well. We'll
do it in the following PR.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-12-10 09:20:40 +08:00
herizhen
bb1610dc25 add hyperlink (#4588)
### What this PR does / why we need it?
add hyperlink

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.11.2

---------

Signed-off-by: herizhen <you@example.com>
Co-authored-by: herizhen <you@example.com>
2025-12-02 14:09:03 +08:00
Chenxi Qian
554f16ae1f [Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804)
### What this PR does / why we need it?

This PR introduces support for adding custom CANN `aclnn` ops to
`vllm-ascend`, allowing users to define and use their own custom
operators.

Key changes include:
- Building and installing custom ops into the `vllm-ascend`-specified
directory
- Binding the `aclnn` op interface to the `torch.ops._C_ascend` module
- Enabling invocation of these ops within `vllm-ascend`

This PR includes a sample custom op:
`aclnnGroupedMatmulSwigluQuantWeightNzTensorList`, which is adapted from
the CANN operator
[`aclnnGroupedMatmulSwigluQuantWeightNZ`](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/API/aolapi/context/aclnnGroupedMatmulSwigluQuantWeightNZ.md).
Its input parameters `weight` and `weight_scale` now accept
`list[torch.Tensor]` (i.e., `at::TensorList`).

### Does this PR introduce _any_ user-facing change?

No.


- vLLM version: v0.11.2

---------

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>
2025-11-28 18:06:39 +08:00
herizhen
3199fe8350 [Doc]Delete equals sign (#4537)
### What this PR does / why we need it?
Delete equals sign in doc
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut


- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

---------

Signed-off-by: herizhen <you@example.com>
Co-authored-by: herizhen <you@example.com>
2025-11-28 17:09:26 +08:00
SILONG ZENG
ab37a7d5ae [main]Upgrade cann to 8.3rc2 (#4350)
### What this PR does / why we need it?
Upgrade cann to 8.3rc2

### Does this PR introduce _any_ user-facing change?
Yes, docker image will use 8.3.RC2


- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
2025-11-28 14:06:01 +08:00
herizhen
e945e91933 Document error correction (#4422)
### What this PR does / why we need it?
The "g" at the beginning of the current sentence is redundant and needs
to be deleted
"MindIE Turbo" is no longer required to be displayed.

### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
ut

- vLLM main:
2918c1b49c

---------

Signed-off-by: herizhen <you@example.com>
Co-authored-by: herizhen <you@example.com>
2025-11-25 14:21:13 +08:00
Tjh-UKN
00ea61ec88 [feature] vllm-ascend support msprobe (eager mode dump) (#4241)
### What this PR does / why we need it?
vllm-ascend need to dump data during model execution to debug some
precision problems, here msprobe provide the corresponding abilities, so
msprobe will join vllm-ascend to make debug easier

### Does this PR introduce _any_ user-facing change?
```
'dump_config': '/path/to/config.json'
```



- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: Tjh-UKN <2559659915@qq.com>
2025-11-24 21:58:31 +08:00
Canlin Guo
d5fef22149 [Docs] Improve the AISBench multi-modal testing docs (#4255)
### What this PR does / why we need it?

Add some of the pitfalls I ran into when using AISBench to test
multi-modal models.

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
2025-11-19 16:00:39 +08:00
thonean
e38fe92f40 [Misc][Doc] Add service profiling feature with user guide (#3756)
### What this PR does / why we need it?
To support the data collection capabilities of the msServiceProfiler on
vLLM-ascned framework and enable customization of data collection points
via configuration file, a default profiling configuration has been added
to vllm-ascend, facilitating debugging and optimization for developers
and users.

### Does this PR introduce _any_ user-facing change?
None

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: minghangc <29514143@qq.com>
2025-11-12 09:07:14 +08:00
wangxiaoteng888
b1a00e0512 [docs] [P/D] add feature guide for disaggregated-prefill (#3950)
### What this PR does / why we need it?
add feature guide for disaggregated-prefill
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
by ci

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng888 <56506195+wangxiaoteng888@users.noreply.github.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
2025-11-10 09:31:30 +08:00
lilinsiman
a3ff765c65 [Info][main] Corrected the errors in the information (#4055)
### What this PR does / why we need it?
Corrected the errors in the information

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
2025-11-08 18:48:59 +08:00
offline893
5cff3069f4 [Doc]Add developer guide of eplb. (#3759)
### What this PR does / why we need it?
Add developer guide of eplb



- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: offline0806 <3337230449@qq.com>
Co-authored-by: offline0806 <3337230449@qq.com>
2025-11-05 18:35:41 +08:00
pz1116
e0c23cb011 [docs] Add kv pool developer guide (#3752)
### What this PR does / why we need it?
Add kv pool developer guide

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>
Signed-off-by: pz1116 <zpbzpb123123@gmail.com>
2025-11-05 18:03:36 +08:00
zouyida2052
1ba158567c [Doc] add mtp doc (#3770)
### What this PR does / why we need it?
add mtp develop doc

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
2025-11-05 16:38:35 +08:00
zzzzwwjj
46d5a77688 [docs] add aclgraph developer guide (#3683)
### What this PR does / why we need it?
Add aclgraph developer guide.


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: zzzzwwjj <1183291235@qq.com>
2025-11-05 10:34:28 +08:00
zhangyiming
5f08e07208 [Doc] Refactor the DeepSeek-V3.2-Exp tutorial. (#3871)
### What this PR does / why we need it?
Refactor the DeepSeek-V3.2-Exp tutorial.

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: menogrey <1299267905@qq.com>
2025-11-04 18:58:33 +08:00
wangxiyuan
cc2cd42ad3 Upgrade CANN to 8.3.rc1 (#3945)
### What this PR does / why we need it?
This PR upgrade CANN from 8.2rc1 to 8.3rc1 and remove the CANN version
check logic.

TODO: we notice that UT runs failed with CANN 8.3 image. So the base
image for UT is still 8.2. We'll fix it later.


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-03 20:21:07 +08:00
wangxiyuan
ff47524b88 [Doc] Remove modeling doc (#3789)
Remove `modeling` doc, it's useless now 

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-30 15:53:02 +08:00
zhangxinyuehfad
789ba4c5c2 [Doc] Update doc (#3836)
### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-10-29 11:03:39 +08:00
Li Wang
7f73c28a24 [CI][Doc] Optimize multi-node CI (#3565)
### What this PR does / why we need it?
This pull request mainly do the following things:
1. Add a doc for multi-node CI, The main content is the mechanism
principle and how to contribute
2. Simplify the config yaml for more developer-friendly
3. Optimized the mooncake installation script to prevent accidental
failures during installation
4. Fix the workflow to ensure the kubernetes can be apply correctly
5. Add Qwen3-235B-W8A8 disaggregated_prefill test
6. Add GLM-4.5 multi dp test
7. Add 2p1d 4nodes disaggregated_prefill test
8. Refactor nightly tests
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
17c540a993

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-25 09:23:47 +08:00
Li Wang
ca104ce6f0 [Doc] Upgrade docker run command (#3645)
### What this PR does / why we need it?
Update the docker run command, specifically: add --shm-size=1g
### Does this PR introduce _any_ user-facing change?
users/developers using docker to pull vllm-ascend, the shared memory of
the container will be increased from the default 64MB to 1G

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-23 11:17:26 +08:00
wangxiyuan
13e8e75143 [Refactor] refactor patch module (#3555)
### What this PR does / why we need it?
we notice that `patch_main` is never used. Usually the patch is for all
version. And if it's for specified version, we can use `vllm_version_is`
instead. So let's remove the useless sub folder in patch module to make
it clear.


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-21 20:19:46 +08:00
TaoYu Chen
5fe883fa43 fix the title of modelrunner's prepare inputs docs (#3457)
### What this PR does / why we need it?
Fix the wrong title of the modelrunner_prepare_inputs docs

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
pass CI

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: ChenTaoyu-SJTU <ctynb@qq.com>
2025-10-14 20:35:58 +08:00
Li Wang
02f89d166f [CI] Update vllm version to 20250922(5aeb925) (#3091)
### What this PR does / why we need it?
This pr bump vllm commit hash to
5aeb925452
fix issues:  
1. https://github.com/vllm-project/vllm/pull/25345 has remove v0
metadata
2. https://github.com/vllm-project/vllm/pull/25332
3. https://github.com/vllm-project/vllm/pull/25334
4. https://github.com/vllm-project/vllm/pull/23558, note that this vllm
commit update the model register logic, which will check all the model
registered have the `vllm.model_executor.models` path , which breaks our
custom registration of the deepseek_v3 model (it doesn't exist in the
vllm model path). so I move deepseek_v3 model registy to deepseek_v2 to
solve temporary

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
9607d5eb44

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-09-22 22:18:13 +08:00
vllm-ascend-ci
3a2a7d88db [Doc] Update accuracy reports for v0.10.1rc1 (#2755)
The accuracy results running on NPU Altlas A2 have changed, updating
reports for: All models (Qwen3-30B-A3B, Qwen2.5-VL-7B-Instruct,
Qwen3-8B-Base, DeepSeek-V2-Lite)

  - [Workflow run][1]
  
[1]:
https://github.com/vllm-project/vllm-ascend/actions/runs/17459225764
- vLLM version: v0.10.1.1
- vLLM main:
2b30afa442

Signed-off-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com>
Co-authored-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com>
2025-09-04 22:17:17 +08:00
TaoYu Chen
9e7c168d99 Add ModelRunner_prepare_inputs doc (#1493)
### What this PR does / why we need it?
To help more developers quickly get started with vLLM, we need to write
clear and easy-to-understand code documentation and technical
interpretations. This will effectively lower the learning curve, attract
more excellent contributors, and collectively build a better developer
community.

Add ModelRunner_prepare_inputs doc

### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
Pass CI


- vLLM version: v0.10.0
- vLLM main:
4be02a3776

---------

Signed-off-by: ChenTaoyu-SJTU <ctynb@qq.com>
2025-08-18 15:41:24 +08:00
zhangxinyuehfad
bcd0b532f5 [Doc] Update user guide for using lm-eval (#1325)
### What this PR does / why we need it?
Update user guide for using lm-eval
1. add using lm-eval on online server
2. add using offline datasets

- vLLM version: v0.10.0
- vLLM main:
9edd1db02b

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-08-07 14:15:49 +08:00
Shanshan Shen
61fc35184b [Doc] Add performance tuning doc to main (#1392)
### What this PR does / why we need it?
Add performance tuning doc to main.

Closes: https://github.com/vllm-project/vllm-ascend/issues/1387


- vLLM version: v0.9.1
- vLLM main:
923147b5e8

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>
2025-07-29 19:36:34 +08:00
Yikun Jiang
17a430f7b8 Upgrade vLLM to v0.10.0 (#1927)
### What this PR does / why we need it?
- Upgrade to v0.10.0
- Drop v0.9.2 version compatibility
- Add patch for
`vllm_ascend/patch/worker/patch_common/patch_sampler_gather_logprobs.py`
as workaround of
f3a683b7c9
for v0.10.0 and also add e2e test `test_models_prompt_logprobs`
- Pin transformers<4.54.0 as workaround of
https://github.com/vllm-project/vllm-ascend/issues/2034

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Test locally:
`VLLM_USE_MODELSCOPE=true pytest -sv
tests/e2e/singlecard/test_offline_inference.py::test_models_prompt_logprobs`
- CI passed

- vLLM version: v0.9.2
- vLLM main:
7728dd77bb

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-26 15:43:29 +08:00
Li Wang
bdfb065b5d [1/2/N] Enable pymarkdown and python __init__ for lint system (#2011)
### What this PR does / why we need it?
1. Enable pymarkdown check
2. Enable python `__init__.py` check for vllm and vllm-ascend
3. Make clean code

### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
29c6fbe58c

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-25 22:16:10 +08:00
wangxiyuan
b5b7e0ecc7 [Doc] Add qwen3 embedding 8b guide (#1734)
1. Add the tutorials for qwen3-embedding-8b
2. Remove VLLM_USE_V1=1  in docs, it's useless any more from 0.9.2


- vLLM version: v0.9.2
- vLLM main:
5923ab9524

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-11 17:40:17 +08:00
Li Wang
c7446438a9 [1/N][CI] Move linting system to pre-commits hooks (#1256)
### What this PR does / why we need it?

Follow vllm-project/vllm lint way:
https://github.com/vllm-project/vllm/blob/main/.pre-commit-config.yaml

Enable pre-commit to avoid some low level error  AMAP.

This pr is one step of #1241, The purpose is make linting system more
clear and convenient, on this step, Mainly did the following things:
yapf, actionlint, ruff, typos, isort, mypy, png-lint, signoff-commit,
enforce-import-regex-instead-of-re.

TODO: 
- clang-format(check for csrc with google style)
need clean code, disable for now 
- pymarkdown
need clean code, disable for now 
- shellcheck
need clean code, disable for now 

### Does this PR introduce _any_ user-facing change?

Only developer UX change:

https://vllm-ascend--1256.org.readthedocs.build/en/1256/developer_guide/contributing.html#run-lint-locally

```
pip install -r requirements-lint.txt && pre-commit install
bash format.sh
```

### How was this patch tested?

CI passed with new added/existing test.

Co-authored-by: Yikun [yikunkero@gmail.com](mailto:yikunkero@gmail.com)
Co-authored-by: wangli
[wangli858794774@gmail.com](mailto:wangli858794774@gmail.com)
- vLLM version: v0.9.1
- vLLM main:
5358cce5ff

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-10 14:17:15 +08:00
wangxiyuan
830332ebfc Clean up v0.9.1 code (#1672)
vllm has released 0.9.2. This PR drop 0.9.1 support.

- vLLM version: v0.9.1
- vLLM main:
b942c094e3

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-09 08:52:24 +08:00
Yikun Jiang
0c1d239df4 Add unit test local cpu guide and enable base testcase (#1566)
### What this PR does / why we need it?
Use Base test and cleanup all manaul patch code
- Cleanup EPLB config to avoid tmp test file
- Use BaseTest with global cache
- Add license
- Add a doc to setup unit test in local env 

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-06 10:42:27 +08:00
Shanshan Shen
3687676fa7 [Doc] Add guidance on how to implement and register new models (#1426)
### What this PR does / why we need it?
Add guidance on how to implement and register new models.

Modified based on PR
https://github.com/vllm-project/vllm-ascend/pull/1126, thanks for the
contribution of @linfeng-yuan.

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-06-27 16:46:49 +08:00