Commit Graph

7 Commits

Author SHA1 Message Date
wangxiyuan
4a008c4dac [Misc]Clean up useless import from vllm (#2049)
Clean up useless  import from vllm to make code more clear.

- vLLM version: v0.10.0
- vLLM main:
18cc33dd60

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-28 16:01:59 +08:00
wangxiyuan
5968dff4e0 [Build] Add build info (#1386)
Add static build_info py file to show soc and sleep mode info. It helps
to make the code clean and the error info will be more friendly for
users

This PR also added the unit test for vllm_ascend/utils.py

This PR also added the base test class for all ut in tests/ut/base.py

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-06-27 09:14:43 +08:00
Li Wang
f8029945c3 [Bugfix] Remove cuda related lines and add additional pip mirror (#1252)
### What this PR does / why we need it?
- For npu environment, we should use `PYTORCH_NPU_ALLOC_CONF ` rather
than `PYTORCH_CUDA_ALLOC_CONF`
- Add `PIP_EXTRA_INDEX_URL` to make nightly_benchmarks happy


---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-06-17 21:25:40 +08:00
yangpuPKU
46df67a5e9 [bugfix] Improve log level and info for custom ops build (#937)
### What this PR does / why we need it?
Fix the bug of #703, where vllm wrong raised the ERROR : Failed to
import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'. The
format for reporting import vllm_ascend_C failure is unified by warning
("Failed to import vllm_ascend_C:%s", e).

### Does this PR introduce _any_ user-facing change?
No

---------

Signed-off-by: yangpuPKU <604425840@qq.com>
2025-05-23 10:05:57 +08:00
wangxiyuan
b917361ca5 [MISC] Clean up torch_npu (#688)
torch_npu 2.5.1 support autoload now. This patch does:
1. remove useless torch_npu import
2. replace `torch_npu.npu` to `torch.npu`.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-29 18:03:38 +08:00
Shuqiao Li
a127cc83f8 catch ImportError when C code not compiled (#575)
### What this PR does / why we need it?
Found a problem when ImportError raised but not ModuleNotFoundError.


### Does this PR introduce _any_ user-facing change?
No


### How was this patch tested?
CI passed

Signed-off-by: Shuqiao Li <celestialli@outlook.com>
2025-04-18 18:11:49 +08:00
Shuqiao Li
84563fc65d Add sleep mode feature for Ascend NPU (#513)
### What this PR does / why we need it?
This PR adds sleep mode feature for vllm-ascend, when sleeps, we do
mainly two things:

- offload model weights
- discard kv cache

RLHF tools(such as https://github.com/volcengine/verl and
https://github.com/OpenRLHF/OpenRLHF) have a strong need of sleep mode
to accelerate the training process.

This PR may solve #375 and #320 .

### Does this PR introduce _any_ user-facing change?
No existing user interfaces changed.
Users will have two new methods(`sleep()` and `wake_up()`) to use.

### How was this patch tested?
This PR is tested with Qwen/Qwen2.5-0.5B-Instruct.

At first, we have free NPU memory M1.

After `llm = LLM("Qwen/Qwen2.5-0.5B-Instruct", enable_sleep_mode=True)`
executed, we have free NPU memory M2. M2 < M1.

Then we call `llm.sleep(level=1)`, we have free NPU memory M3.

We have M3 > M2, M3 is very close to M1.

Plus, we have the same output tokens before sleep and after wake up,
with the config of `SamplingParams(temperature=0, max_tokens=10)` and
with the same input tokens of course.


This PR is utilizing the CMake procedure of #371 , thanks a lot.

Signed-off-by: Shuqiao Li <celestialli@outlook.com>
2025-04-18 13:11:39 +08:00