Commit Graph

6 Commits

Author SHA1 Message Date
ttanzhiqiang
2498d297ae add custom ascendc kernel vocabparallelembedding (#796)
This PR add custom ascendc kernel vocabparallelembedding support in
vllm-ascend, related CMakeLists and setuptools is also added in this PR.

pytest -s benchmarks/ops/ben_vocabparallelembedding.py
pytest -s tests/ops/test_vocabparallelembedding.py

---------

Signed-off-by: ttanzhiqiang <389825161@qq.com>
2025-06-12 10:44:33 +08:00
Wan_Danfeng
5cf9ff18e9 [Performance]: Custom AscendC Kernel of Multi-Step Prepare Input (#814)
### What this PR does / why we need it?

- According to https://github.com/vllm-project/vllm-ascend/issues/807,
we pull request for customer ascendc kernel of multi-step.
- also a bug we found in multi_step_runner.py is fixed when we use
multi-step on V0 Engine.


### Does this PR introduce _any_ user-facing change?

no user-facing change


### How was this patch tested?
we add Unit Test file and offline inference file to test the custom
ascendc kernel. See test/ops/test_multi_step.py and
examples/offline_multi_step.py

---------

Signed-off-by: wan_danfeng <wonderful199082@126.com>
2025-05-20 09:31:30 +08:00
Yikun Jiang
2e20797934 [BUILD] Upgrade torch-npu to 2.5.1 (#661)
### What this PR does / why we need it?
The torch-npu 2.5.1 are published:
https://pypi.org/project/torch-npu/2.5.1/
It's time to remove all torch-npu dev version from vllm-ascend code base

### Does this PR introduce _any_ user-facing change?
Yes, using torch-npu 2.5.1

### How was this patch tested?
- [ ] CI passed
- [ ] Manually test
- [ ] Grep all `dev2025`

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-04-27 17:28:29 +08:00
Shuqiao Li
84563fc65d Add sleep mode feature for Ascend NPU (#513)
### What this PR does / why we need it?
This PR adds sleep mode feature for vllm-ascend, when sleeps, we do
mainly two things:

- offload model weights
- discard kv cache

RLHF tools(such as https://github.com/volcengine/verl and
https://github.com/OpenRLHF/OpenRLHF) have a strong need of sleep mode
to accelerate the training process.

This PR may solve #375 and #320 .

### Does this PR introduce _any_ user-facing change?
No existing user interfaces changed.
Users will have two new methods(`sleep()` and `wake_up()`) to use.

### How was this patch tested?
This PR is tested with Qwen/Qwen2.5-0.5B-Instruct.

At first, we have free NPU memory M1.

After `llm = LLM("Qwen/Qwen2.5-0.5B-Instruct", enable_sleep_mode=True)`
executed, we have free NPU memory M2. M2 < M1.

Then we call `llm.sleep(level=1)`, we have free NPU memory M3.

We have M3 > M2, M3 is very close to M1.

Plus, we have the same output tokens before sleep and after wake up,
with the config of `SamplingParams(temperature=0, max_tokens=10)` and
with the same input tokens of course.


This PR is utilizing the CMake procedure of #371 , thanks a lot.

Signed-off-by: Shuqiao Li <celestialli@outlook.com>
2025-04-18 13:11:39 +08:00
wangxiyuan
9c7428b3d5 [CI] enable custom ops build (#466)
### What this PR does / why we need it?
This PR enable custom ops build  by default. 

### Does this PR introduce _any_ user-facing change?

Yes, users now install vllm-ascend from source will trigger custom ops
build step.

### How was this patch tested?
By image build and e2e CI

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-12 10:24:53 +08:00
Pleaplusone
ce8259975e [core] Support custom ascendc kernels in vllm-ascend (#233)
This PR add custom ascendc kernel rotary_embedding support in
vllm-ascend, related CMakeLists and setuptools is also added in this PR.

Related: https://github.com/vllm-project/vllm-ascend/issues/156

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-04-03 14:52:34 +08:00