xc-llm-ascend

Author	SHA1	Message	Date
Yikun Jiang	097e7149f7	[Platform] Add initial experimental support for Altlas 300I series (#1333 ) ### What this PR does / why we need it? Add initial experimental support for Ascend 310P, this patch squash below PR into one to help validation: - https://github.com/vllm-project/vllm-ascend/pull/914 - https://github.com/vllm-project/vllm-ascend/pull/1318 - https://github.com/vllm-project/vllm-ascend/pull/1327 ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas 300I DUO series ### How was this patch tested? CI passed with: - E2E image build for 310P - CI test on A2 with e2e test and longterm test - Unit test missing because need a real 310P image to have the test, will add in a separate PR later. - Manually e2e test: - Qwen2.5-7b-instruct, Qwen2.5-0.5b, Qwen3-0.6B, Qwen3-4B, Qwen3-8B: https://github.com/vllm-project/vllm-ascend/pull/914#issuecomment-2942989322 - Pangu MGoE 72B The patch has been tested locally on Ascend 310P hardware to ensure that the changes do not break existing functionality and that the new features work as intended. #### ENV information CANN, NNAL version: 8.1.RC1 > [!IMPORTANT] > PTA 2.5.1 version >= torch_npu-2.5.1.post1.dev20250528 to support NZ format and calling NNAL operators on 310P #### Code example ##### Build vllm-ascend from source code ```shell # download source code as vllm-ascend cd vllm-ascend export SOC_VERSION=Ascend310P3 pip install -v -e . cd .. ``` ##### Run offline inference ```python from vllm import LLM, SamplingParams prompts = ["水的沸点是100摄氏度吗？请回答是或者否。", "若腋下体温为38摄氏度，请问这人是否发烧？请回答是或者否。", "水的沸点是100摄氏度吗？请回答是或者否。", "若腋下体温为38摄氏度，请问这人是否发烧？请回答是或者否。"] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.0, top_p=0.95, max_tokens=10) # Create an LLM. llm = LLM( model="Qwen/Qwen2.5-7B-Instruct", max_model_len=4096, max_num_seqs=4, dtype="float16", # IMPORTANT cause some ATB ops cannot support bf16 on 310P disable_custom_all_reduce=True, trust_remote_code=True, tensor_parallel_size=2, compilation_config={"custom_ops":['none', "+rms_norm", "+rotary_embedding"]}, ) # Generate texts from the prompts. outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` --------- Signed-off-by: Vincent Yuan <farawayboat@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: Vincent Yuan <farawayboat@gmail.com> Co-authored-by: angazenn <zengyanjia@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: shen-shanshan <467638484@qq.com>	2025-06-21 09:00:16 +08:00
Mengqing Cao	96fa7ff63b	[DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (#1235 ) ### What this PR does / why we need it? 1. Fix rank set in DP scenario. The new poc version of torch-npu support setting `ASCEND_RT_VISIBLE_DEVICES` dynamically, thus we could use the rank set in `DPEngineCoreProc` directly instead of calculating local rank across dp by hand in the patched `_init_data_parallel` Closes: https://github.com/vllm-project/vllm-ascend/issues/1170 2. Bump torch-npu version to 2.5.1.post1.dev20250528 Closes: https://github.com/vllm-project/vllm-ascend/pull/1242 Closes: https://github.com/vllm-project/vllm-ascend/issues/1232 ### How was this patch tested? CI passed with new added test. --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Icey <1790571317@qq.com> Co-authored-by: Icey <1790571317@qq.com>	2025-06-16 23:09:53 +08:00
Yikun Jiang	966557a2a3	[Build] Speedup image build (#1216 ) ### What this PR does / why we need it? 1. Rename workflow name to show OS info 2. Speedup image build: - PR: only arm64 build on openEuler arm64, only amd64 build on Ubuntu amd64 - Push/Tag: still keep origin logic use qemu on amd64 This PR actually drop the e2e image build per PR but I think it's fine consider it's stable enough, if we still meet some problem we can revert this PR 43-44mins ---> about 8-10 mins ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-16 09:02:53 +08:00
Mengqing Cao	399b03830d	[Build][Bugfix] Fix source code path to avoid reference error (#726 ) ### What this PR does / why we need it? Fix source code path to avoid reference error in docker image fix https://github.com/vllm-project/vllm-ascend/issues/725 Signed-off-by: MengqingCao <cmq0113@163.com>	2025-04-30 17:38:13 +08:00
Icey	ee7a0e2cd4	Update openEuler dockerfile for COMPILE_CUSTOM_KERNELS=1 (#689 ) ### What this PR does / why we need it? Update openEuler dockerfile for COMPILE_CUSTOM_KERNELS=1 ### Does this PR introduce _any_ user-facing change? No Signed-off-by: Icey <1790571317@qq.com>	2025-04-28 11:45:46 +08:00
Yikun Jiang	96d6fa7c90	[Docker] Fix openEuler image suffix (#586 ) ### What this PR does / why we need it? There was a bug when we release v0.8.4rc1 (openEuler image tag was wrong set to 0.8.4rc1), according doc of docker-meta-action, it should be append suffix: ``` tags: \| type=pep440,enable=true,priority=900,prefix=,suffix=,pattern=,value= ``` This patch just fix openEuler image suffix to make pep440 tag rule work. This patch also remove the cache step because the cache step bring more than 10mins export, but reduce less time in next trigger. ### Does this PR introduce _any_ user-facing change? Yes, docker image tag set to right ### How was this patch tested? I test with in my fork repo by setting default branch: - release a tag: v0.7.88rc1 (pep440 tag) - The log show `--label org.opencontainers.image.version=v0.7.88rc1-openeuler` is right rule https://github.com/Yikun/vllm-ascend/actions/runs/14560411481/job/40842950165#step:9:205 Related: https://github.com/vllm-project/vllm-ascend/pull/489 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-21 08:55:26 +08:00
wangxiyuan	9c7428b3d5	[CI] enable custom ops build (#466 ) ### What this PR does / why we need it? This PR enable custom ops build by default. ### Does this PR introduce _any_ user-facing change? Yes, users now install vllm-ascend from source will trigger custom ops build step. ### How was this patch tested? By image build and e2e CI --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-12 10:24:53 +08:00
Icey	d05ea17427	Add openEuler based container image for vLLM Ascend (#489 ) ### What this PR does / why we need it? Provide users with openEuler-based vllm images, so modify the quick start readme ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? There is no need for performing any test. --------- Signed-off-by: Icey <1790571317@qq.com>	2025-04-10 14:30:49 +08:00

8 Commits