xc-llm-ascend

Author	SHA1	Message	Date
Yikun Jiang	007aeaa48b	[Doc] Change distributed_executor_backend to mp (#287 ) ### What this PR does / why we need it? Fix `ValueError: Unrecognized distributed executor backend tp. Supported values are 'ray', 'mp' 'uni', 'external_launcher' or custom ExecutorBase subclass.` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test on my local node Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-10 11:27:26 +08:00
Yikun Jiang	38334f5daa	[Docs] Re-arch on doc and make QwQ doc work (#271 ) ### What this PR does / why we need it? Re-arch on tutorials, move singe npu / multi npu / multi node to index. - Unifiy docker run cmd - Use dropdown to hide build from source installation doc - Re-arch tutorials to include Qwen/QwQ/DeepSeek - Make QwQ doc works ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI test Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-10 09:27:48 +08:00
Yikun Jiang	18bb8d1f52	Adapt vLLM requirements changes to fix main CI (#279 ) ### What this PR does / why we need it? Adapt vLLM requirements changes: `206e2577fa (diff-01ec17406c969585ed075609a2bbf2f2f4fe3e3def36946694abe6d4eb60a6f2)` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-09 16:07:45 +08:00
Yikun Jiang	be58d5f3d8	Bump torch_npu version to dev20250308.3 (#276 ) ### What this PR does / why we need it? Bump torch_npu version to dev20250308.3 to fix performance regression on multi-stream case: `e04c580d07` . ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-09 15:59:15 +08:00
Mengqing Cao	91f7d8115d	[CI/Build] Bump torch_npu to dev20250307.3 (#265 ) Update torch-npu version to fix torch npu exponential_ accuracy With this update, the percision issue when setting `temperature > 0` is fixed. --------- Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-03-07 20:34:07 +08:00
Yikun Jiang	cff08f9df8	[Doc] Add initial FAQs (#247 ) ### What this PR does / why we need it? Add initial FAQs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-06 10:42:42 +08:00
wangxiyuan	ae49bfd13a	[Core] Support pooling (#229 ) This PR added pooling support for vllm-ascend Tested with `bge-base-en-v1.5` by encode: ``` from vllm import LLM # Sample prompts. prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] # Create an LLM. model = LLM(model="./bge-base-en-v1.5", enforce_eager=True) # Generate embedding. The output is a list of EmbeddingRequestOutputs. outputs = model.encode(prompts) # Print the outputs. for output in outputs: print(output.outputs.embedding) # list of 4096 floats ``` Tested by embedding: ``` from vllm import LLM, SamplingParams llm = LLM(model="./bge-base-en-v1.5", task="embed") (output,) = llm.embed("Hello, my name is") embeds = output.outputs.embedding print(f"Embeddings: {embeds!r} (size={len(embeds)})") ``` Related: https://github.com/vllm-project/vllm-ascend/issues/200 ## Known issue The accuracy is not correct since this feature rely on `enc-dec` support. It'll be done in the following PR by @MengqingCao Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-03-04 15:59:34 +08:00
Shanshan Shen	8fda31cafe	[Doc] Update Feature Support doc (#234 ) ### What this PR does / why we need it? Update Feature Support doc. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? no. --------- Signed-off-by: Shanshan Shen <467638484@qq.com>	2025-03-04 14:18:32 +08:00
Yikun Jiang	ebe14f20cf	Recover vllm-ascend dev image (#209 ) ### What this PR does / why we need it? Recover vllm-ascend dev image ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-03 09:08:41 +08:00
Yikun Jiang	6e358c4bef	Add Document Branch Policy (#217 ) ### What this PR does / why we need it? Add Document Branch Policy ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Related: https://github.com/vllm-project/vllm-ascend/issues/214 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-03 09:07:39 +08:00
Mengqing Cao	03dc5c01fd	[Doc] update multinode doc (#181 ) Update multinode doc fix #167 #168 Signed-off-by: MengqingCao <cmq0113@163.com>	2025-02-27 19:29:49 +08:00
wangxiyuan	6042c210bc	[CI] upgrade to newest pta (#187 ) Upgrade to newest torch-npu Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: angazenn <zengyanjia@huawei.com>	2025-02-27 16:40:23 +08:00
Shanshan Shen	ee43179767	[ModelRunner] Fix cuda hard code in model runner (#155 ) ### What this PR does / why we need it? 1. Fix cuda hard code in model runner. 2. Fix tutorials doc rendering error. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? no. Signed-off-by: Shanshan Shen <467638484@qq.com>	2025-02-27 14:16:46 +08:00
wangxiyuan	51ae37b22a	[Doc] update readme (#147 ) Fix doc issue in README --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-25 11:00:58 +08:00
Yikun Jiang	d21b3be685	Mark v0.7.1 as unmaintained and v0.7.3 as maintained (#139 ) ### What this PR does / why we need it? Mark v0.7.1 as unmaintained and v0.7.3 as maintained: vLLM released the v0.7.3 version: https://github.com/vllm-project/vllm/releases/tag/v0.7.3 which include serval commits: - https://github.com/vllm-project/vllm/pull/12874 - https://github.com/vllm-project/vllm/pull/12432 - https://github.com/vllm-project/vllm/pull/13208 We'd better to bump the versions to v0.7.3. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-21 22:41:44 +08:00
HongtaoYang	fd2cc1b883	[Docs] Add Tutorials for Online Serving on Multi Machine (#120 ) Add Tutorials for Online Serving on Multi Machine --------- Signed-off-by: SidaoY <1024863041@qq.com> Co-authored-by: yx0716 <jinyx1007@foxmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-02-21 11:03:00 +08:00
Yikun Jiang	3a4ce2aa15	[Docs] Fix vllm and vllm-ascend version (#107 ) ### What this PR does / why we need it? Fix vllm and vllm-ascend version \| branch/tag \| vllm_version \| vllm_ascend_version\|pip_vllm_ascend_version\|pip_vllm_version\| \|----\|----\|----\|----\|----\| \| main \| main \| main \| v0.7.1rc1 \| v0.7.1 \| \| v0.7.1-dev \| v0.7.1 \| v0.7.1rc1 \| v0.7.1rc1 \| v0.7.1 \| \| v0.7.1rc1 \| v0.7.1 \| v0.7.1rc1 \| v0.7.1rc1 \| v0.7.1 \| ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-20 11:05:35 +08:00
wangxiyuan	cff03a4913	[CI] change to quay.io (#102 ) change docker registry to quay Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-19 17:04:46 +08:00
wangxiyuan	fafd70e91c	[Doc] Update doc to work with release (#85 ) 1. Update CANN image name 2. Add pta install step 3. update vllm-ascend docker image name to ghcr 4. update quick_start to use vllm-ascend image directly. 5. fix `note` style Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-19 09:51:43 +08:00
Yikun Jiang	17de078d83	[Docs] Add dynamic version in docs (#90 ) ### What this PR does / why we need it? Add dynamic version in docs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview: https://vllm-ascend--90.org.readthedocs.build/en/90/ Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-19 08:57:27 +08:00
wangxiyuan	7606977739	[Doc] Add release note (#59 ) Add release note template and init the first release note content Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-18 11:20:06 +08:00
Yikun Jiang	7cc024a2d3	[Docs] Refeactor installation doc (#78 ) ### What this PR does / why we need it? Refeactor installation doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI, preview Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-17 22:12:07 +08:00
Shanshan Shen	7c8bdc3a18	[Doc] Update tutorials (#79 ) ### What this PR does / why we need it? Update tutorials. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? no. --------- Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>	2025-02-17 22:11:04 +08:00
Shanshan Shen	2a678141d4	[Doc] Add vllm-ascend usage doc & fix doc format (#53 ) ### What this PR does / why we need it? 1. Add vllm-ascend tutorial doc for Qwen/Qwen2.5-7B-Instruct model serving doc 2. fix format of files in `docs` dir, e.g. format tables, add underline for links, add line feed... ### Does this PR introduce _any_ user-facing change? <!-- Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> no. ### How was this patch tested? doc CI passed --------- Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>	2025-02-17 18:37:29 +08:00
Mengqing Cao	c935b7006c	[doc] fix feature support (#70 ) Check and update the feature support table. - both multi-step and speculative decoding require adaptation of corresponding workers - prompt adapter (finetune method) require adaption in worker.py and model_runner.py Signed-off-by: MengqingCao <cmq0113@163.com>	2025-02-17 15:43:37 +08:00
Yikun Jiang	a6f91f70b7	[Doc] Add versioning_policy doc (#62 ) ### What this PR does / why we need it? This patch add the versioning policy doc for vllm-ascend Reference: - https://spark.apache.org/versioning-policy.html - https://docs.openstack.org/project-team-guide/stable-branches.html - https://github.com/pytorch/pytorch/blob/main/RELEASE.md ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? preview: https://vllm-ascend--62.org.readthedocs.build/en/62/ Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-17 14:13:28 +08:00
wangxiyuan	e264987af2	[Doc] Add install doc (#49 ) Add official install guide. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-14 10:22:15 +08:00
Yikun Jiang	46977f9f06	[Doc] Add sphinx build for vllm-ascend (#55 ) ### What this PR does / why we need it? This patch enables the doc build for vllm-ascend - Add sphinx build for vllm-ascend - Enable readthedocs for vllm-ascend - Fix CI: - exclude vllm-empty/tests/mistral_tool_use to skip `You need to agree to share your contact information to access this model` which introduce in `314cfade02` - Install test req to fix https://github.com/vllm-project/vllm-ascend/actions/runs/13304112758/job/37151690770: ``` vllm-empty/tests/mistral_tool_use/conftest.py:4: in <module> import pytest_asyncio E ModuleNotFoundError: No module named 'pytest_asyncio' ``` - exclude docs PR ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? 1. test locally: ```bash # Install dependencies. pip install -r requirements-docs.txt # Build the docs and preview make clean; make html; python -m http.server -d build/html/ ``` Launch browser and open http://localhost:8000/. 2. CI passed with preview: https://vllm-ascend--55.org.readthedocs.build/en/55/ Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-13 18:44:17 +08:00
Yikun Jiang	63b11ec7e9	[Doc] Add Quickstart doc (#44 ) ### What this PR does / why we need it? This PR add the quickstart doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-13 16:29:36 +08:00
Yikun Jiang	eb189aac81	Followup fix on official doc update (#34 ) ### What this PR does / why we need it? - Fix typos: vllm-ascned --> vllm-ascend - For version info ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? preview Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-11 14:28:26 +08:00
wangxiyuan	51eadc68b9	[Docs] Add official doc index (#29 ) Add official doc index. Move the release content to the right place. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-11 12:00:27 +08:00
Li Wang	8cb5615fb0	[Doc]Add chinese doc (#10 ) ### What this PR does / why we need it? This PR adds Chinese documents for vllm-ascend for Chinese-speaking developers ### Does this PR introduce _any_ user-facing change? Change as follows - add README.zh.md - add environment.zh.md - add CONTRIBUTING.zh.md ### How was this patch tested? By CI --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-02-06 14:49:43 +08:00
wangxiyuan	a48b9addef	[Doc] Update Readme (#11 ) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? Add feature and model support matrix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI test is enough Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-06 14:08:44 +08:00
Yikun Jiang	d5e7756028	[Core] Init vllm-ascend (#3 ) ### What this PR does / why we need it? vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on the Ascend NPU. This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [RFC]: Hardware pluggable, providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM. This patch also include changes to make CI work and use cache speed up e2e test, including: 1. Change push (post merge ci) and pull_request (pr ci) trigger branch to main 2. Make mypy work by ignore base_communicator and clear unused deps 3. Several improvements for vllm_ascend_test: - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins) - switch `git clone` command to `action/checkout` to speedup checkout and - Enable sv for pytest for better info dump - Remove network host to resole `docker: conflicting ontions: cannot attach both user-defined and non-user-definednetwork-modes`, which is a problem on docker 1.45 but not on 1.39. 4. Adapt MLA decode optimizations: `cabaf4eff3` ### Does this PR introduce _any_ user-facing change? Yes, init the PR. ### How was this patch tested? - This is the first PR to make ascend NPU work on vLLM. All code is tested on ascend with vLLM V0 Engine. - CI passed --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: wangli <wangli858794774@gmail.com>	2025-02-05 10:53:12 +08:00

34 Commits