xc-llm-ascend

Author	SHA1	Message	Date
wangxiyuan	c556038ef0	[New model] Qwen3-next support (#2917 ) ### What this PR does / why we need it? Add Qwen3-next support. ### Does this PR introduce _any_ user-facing change? Yes, users can use Qwen3 next. Related doc: https://github.com/vllm-project/vllm-ascend/pull/2916 the tutorial will be ready in [here](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_qwen3_next.html) ### How was this patch tested? Doc CI passed Related: https://github.com/vllm-project/vllm-ascend/issues/2884 Co-Authored-By: Angazenn <supperccell@163.com> Co-Authored-By: zzzzwwjj <1183291235@qq.com> Co-Authored-By: MengqingCao <cmq0113@163.com> Co-Authored-By: linfeng-yuan <1102311262@qq.com> Co-Authored-By: hust17yixuan <303660421@qq.com> Co-Authored-By: SunnyLee219 <3294305115@qq.com> Co-Authored-By: maoxx241 <maoxx241@umn.edu> - vLLM version: v0.10.2 - vLLM main: `b834b4cbf1` --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: Your Name <you@example.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: linfeng-yuan <1102311262@qq.com> Signed-off-by: hust17yixuan <303660421@qq.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: Angazenn <supperccell@163.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: linfeng-yuan <1102311262@qq.com> Co-authored-by: hust17yixuan <303660421@qq.com>	2025-09-16 01:17:42 +08:00
wangxiyuan	7e494e94a9	[CI] Fix broken ci (#2530 ) vLLM commit https://github.com/vllm-project/vllm/pull/22711 changed the encode cache entries logic, this PR adapt the same change for vllm ascend to make CI happy. Co-Authored-By: zhoux77899 <zhouxiang100@huawei.com> - vLLM version: v0.10.1.1 - vLLM main: `0ff902f3b4` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-08-26 07:42:24 +08:00
wangxiyuan	eccfb715f6	[CI] Fix UT (#2452 ) Make UT CI happy - vLLM version: v0.10.0 - vLLM main: `d983769c41` --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: MengqingCao <cmq0113@163.com>	2025-08-20 16:26:07 +08:00
Mengqing Cao	1327f9be1c	Fix some ci issue and refactor modelrunner (#2445 ) ### What this PR does / why we need it? Fix some ci issue and refactor modelrunner ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: `4d9c61993a` --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: weiguihua2 <weiguihua2@huawei.com>	2025-08-20 09:01:04 +08:00
Mengqing Cao	61866b8ac6	[Quickfix] update CachedRequestState as NewRequestData changed (#2367 ) ### What this PR does / why we need it? 1. update `CachedRequestState` as `NewRequestData` changed in https://github.com/vllm-project/vllm/pull/22570 2. drop maintenance of vllm v0.10.0 in the branch main ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: `92ff41abea` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-08-15 07:35:27 +08:00
Yikun Jiang	5f0b42e414	[FOLLOWUP] Use base test to avoid patch everwhere (#1634 ) ### What this PR does / why we need it? Use base test to avoid patch everwhere. Followup here: https://github.com/vllm-project/vllm-ascend/pull/1566 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ut ci passed - vLLM version: v0.9.2 - vLLM main: `8d0a01a5f2` Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-07-22 09:03:40 +08:00
Li Wang	5f8241c25c	[V1][ModelRunner] Support pooling model for v1 engine (#1359 ) ### What this PR does / why we need it? Change as little existing code as possible to add v1 pooling task's support, notice that i move down the `vllm.v1.worker.gpu_input_batch` to vllm-ascend, Considering the frequent changes in upstream interfaces, in order to decouple, so i move it here ### How was this patch tested? CI passed with new added/existing test, and I have a simple test was first conducted locally which is adapted from https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-0.6B, just like bellow： ```python import os import torch from vllm import LLM os.environ["VLLM_USE_MODELSCOPE"]="True" def get_detailed_instruct(task_description: str, query: str) -> str: return f'Instruct: {task_description}\nQuery:{query}' # Each query must come with a one-sentence instruction that describes the task task = 'Given a web search query, retrieve relevant passages that answer the query' queries = [ get_detailed_instruct(task, 'What is the capital of China?'), get_detailed_instruct(task, 'Explain gravity') ] # No need to add instruction for retrieval documents documents = [ "The capital of China is Beijing.", "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun." ] input_texts = queries + documents model = LLM(model="Qwen/Qwen3-Embedding-0.6B", task="embed") outputs = model.embed(input_texts) embeddings = torch.tensor([o.outputs.embedding for o in outputs]) scores = (embeddings[:2] @ embeddings[2:].T) print(scores.tolist()) # [[0.7620252966880798, 0.14078938961029053], [0.1358368694782257, 0.6013815999031067]] ``` --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: wangli <858794774@qq.com> Co-authored-by: wangli <858794774@qq.com>	2025-06-30 16:31:12 +08:00

7 Commits