Commit Graph

8 Commits

Author SHA1 Message Date
Yikun Jiang
097e7149f7 [Platform] Add initial experimental support for Altlas 300I series (#1333)
### What this PR does / why we need it?
Add initial experimental support for Ascend 310P, this patch squash
below PR into one to help validation:

- https://github.com/vllm-project/vllm-ascend/pull/914
- https://github.com/vllm-project/vllm-ascend/pull/1318
- https://github.com/vllm-project/vllm-ascend/pull/1327


### Does this PR introduce _any_ user-facing change?
User can run vLLM on Altlas 300I DUO series

### How was this patch tested?
CI passed with:
- E2E image build for 310P
- CI test on A2 with e2e test and longterm test
- Unit test missing because need a real 310P image to have the test,
will add in a separate PR later.
- Manually e2e test:
- Qwen2.5-7b-instruct, Qwen2.5-0.5b, Qwen3-0.6B, Qwen3-4B, Qwen3-8B:
https://github.com/vllm-project/vllm-ascend/pull/914#issuecomment-2942989322
  - Pangu MGoE 72B


The patch has been tested locally on Ascend 310P hardware to ensure that
the changes do not break existing functionality and that the new
features work as intended.

#### ENV information

CANN, NNAL version: 8.1.RC1
> [!IMPORTANT]  
> PTA 2.5.1 version >= torch_npu-2.5.1.post1.dev20250528 to support NZ
format and calling NNAL operators on 310P

#### Code example

##### Build vllm-ascend from source code

```shell
# download source code as vllm-ascend
cd vllm-ascend
export SOC_VERSION=Ascend310P3
pip install -v -e .
cd ..
```

##### Run offline inference

```python
from vllm import LLM, SamplingParams
prompts = ["水的沸点是100摄氏度吗?请回答是或者否。", "若腋下体温为38摄氏度,请问这人是否发烧?请回答是或者否。",
           "水的沸点是100摄氏度吗?请回答是或者否。", "若腋下体温为38摄氏度,请问这人是否发烧?请回答是或者否。"]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.0, top_p=0.95, max_tokens=10)
# Create an LLM.
llm = LLM(
    model="Qwen/Qwen2.5-7B-Instruct",
    max_model_len=4096,
    max_num_seqs=4,
    dtype="float16", # IMPORTANT cause some ATB ops cannot support bf16 on 310P
    disable_custom_all_reduce=True,
    trust_remote_code=True,
    tensor_parallel_size=2,
    compilation_config={"custom_ops":['none', "+rms_norm", "+rotary_embedding"]},
)

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

```

---------

Signed-off-by: Vincent Yuan <farawayboat@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: Vincent Yuan <farawayboat@gmail.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: shen-shanshan <467638484@qq.com>
2025-06-21 09:00:16 +08:00
Mengqing Cao
96fa7ff63b [DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (#1235)
### What this PR does / why we need it?
1. Fix rank set in DP scenario. The new poc version of torch-npu support
setting `ASCEND_RT_VISIBLE_DEVICES` dynamically, thus we could use the
rank set in `DPEngineCoreProc` directly instead of calculating local
rank across dp by hand in the patched `_init_data_parallel`

Closes: https://github.com/vllm-project/vllm-ascend/issues/1170

2. Bump torch-npu version to 2.5.1.post1.dev20250528

Closes: https://github.com/vllm-project/vllm-ascend/pull/1242
Closes: https://github.com/vllm-project/vllm-ascend/issues/1232


### How was this patch tested?
CI passed with new added test.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Icey <1790571317@qq.com>
2025-06-16 23:09:53 +08:00
Yikun Jiang
966557a2a3 [Build] Speedup image build (#1216)
### What this PR does / why we need it?
1. Rename workflow name to show OS info
2. Speedup image build:
- PR: only arm64 build on openEuler arm64, only amd64 build on Ubuntu
amd64
- Push/Tag: still keep origin logic use qemu on amd64

This PR actually drop the e2e image build per PR but I think it's fine
consider it's stable enough, if we still meet some problem we can revert
this PR

43-44mins ---> about 8-10 mins

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-06-16 09:02:53 +08:00
Mengqing Cao
399b03830d [Build][Bugfix] Fix source code path to avoid reference error (#726)
### What this PR does / why we need it?
Fix source code path to avoid reference error in docker image
fix https://github.com/vllm-project/vllm-ascend/issues/725

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-04-30 17:38:13 +08:00
Icey
ee7a0e2cd4 Update openEuler dockerfile for COMPILE_CUSTOM_KERNELS=1 (#689)
### What this PR does / why we need it?
Update openEuler dockerfile for COMPILE_CUSTOM_KERNELS=1

### Does this PR introduce _any_ user-facing change?
No

Signed-off-by: Icey <1790571317@qq.com>
2025-04-28 11:45:46 +08:00
Yikun Jiang
96d6fa7c90 [Docker] Fix openEuler image suffix (#586)
### What this PR does / why we need it?
There was a bug when we release v0.8.4rc1 (openEuler image tag was wrong
set to 0.8.4rc1), according doc of docker-meta-action, it should be
append suffix:
```
tags: |
  type=pep440,enable=true,priority=900,prefix=,suffix=,pattern=,value=
```

This patch just fix openEuler image suffix to make pep440 tag rule work.

This patch also remove the cache step because the cache step bring more
than 10mins export, but reduce less time in next trigger.

### Does this PR introduce _any_ user-facing change?
Yes, docker image tag set to right

### How was this patch tested?
I test with in my fork repo by setting default branch:
- release a tag: v0.7.88rc1 (pep440 tag)
- The log show `--label
org.opencontainers.image.version=v0.7.88rc1-openeuler` is right rule


https://github.com/Yikun/vllm-ascend/actions/runs/14560411481/job/40842950165#step:9:205

Related: https://github.com/vllm-project/vllm-ascend/pull/489

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-04-21 08:55:26 +08:00
wangxiyuan
9c7428b3d5 [CI] enable custom ops build (#466)
### What this PR does / why we need it?
This PR enable custom ops build  by default. 

### Does this PR introduce _any_ user-facing change?

Yes, users now install vllm-ascend from source will trigger custom ops
build step.

### How was this patch tested?
By image build and e2e CI

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-12 10:24:53 +08:00
Icey
d05ea17427 Add openEuler based container image for vLLM Ascend (#489)
### What this PR does / why we need it?

Provide users with openEuler-based vllm images, so modify the quick
start readme

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

There is no need for performing any test.

---------

Signed-off-by: Icey <1790571317@qq.com>
2025-04-10 14:30:49 +08:00