xc-llm-ascend

Author	SHA1	Message	Date
Yikun Jiang	007aeaa48b	[Doc] Change distributed_executor_backend to mp (#287 ) ### What this PR does / why we need it? Fix `ValueError: Unrecognized distributed executor backend tp. Supported values are 'ray', 'mp' 'uni', 'external_launcher' or custom ExecutorBase subclass.` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test on my local node Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-10 11:27:26 +08:00
Yikun Jiang	38334f5daa	[Docs] Re-arch on doc and make QwQ doc work (#271 ) ### What this PR does / why we need it? Re-arch on tutorials, move singe npu / multi npu / multi node to index. - Unifiy docker run cmd - Use dropdown to hide build from source installation doc - Re-arch tutorials to include Qwen/QwQ/DeepSeek - Make QwQ doc works ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI test Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-10 09:27:48 +08:00
Yikun Jiang	18bb8d1f52	Adapt vLLM requirements changes to fix main CI (#279 ) ### What this PR does / why we need it? Adapt vLLM requirements changes: `206e2577fa (diff-01ec17406c969585ed075609a2bbf2f2f4fe3e3def36946694abe6d4eb60a6f2)` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-09 16:07:45 +08:00
Yikun Jiang	be58d5f3d8	Bump torch_npu version to dev20250308.3 (#276 ) ### What this PR does / why we need it? Bump torch_npu version to dev20250308.3 to fix performance regression on multi-stream case: `e04c580d07` . ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-09 15:59:15 +08:00
Mengqing Cao	91f7d8115d	[CI/Build] Bump torch_npu to dev20250307.3 (#265 ) Update torch-npu version to fix torch npu exponential_ accuracy With this update, the percision issue when setting `temperature > 0` is fixed. --------- Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-03-07 20:34:07 +08:00
Yikun Jiang	cff08f9df8	[Doc] Add initial FAQs (#247 ) ### What this PR does / why we need it? Add initial FAQs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-06 10:42:42 +08:00
wangxiyuan	ae49bfd13a	[Core] Support pooling (#229 ) This PR added pooling support for vllm-ascend Tested with `bge-base-en-v1.5` by encode: ``` from vllm import LLM # Sample prompts. prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] # Create an LLM. model = LLM(model="./bge-base-en-v1.5", enforce_eager=True) # Generate embedding. The output is a list of EmbeddingRequestOutputs. outputs = model.encode(prompts) # Print the outputs. for output in outputs: print(output.outputs.embedding) # list of 4096 floats ``` Tested by embedding: ``` from vllm import LLM, SamplingParams llm = LLM(model="./bge-base-en-v1.5", task="embed") (output,) = llm.embed("Hello, my name is") embeds = output.outputs.embedding print(f"Embeddings: {embeds!r} (size={len(embeds)})") ``` Related: https://github.com/vllm-project/vllm-ascend/issues/200 ## Known issue The accuracy is not correct since this feature rely on `enc-dec` support. It'll be done in the following PR by @MengqingCao Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-03-04 15:59:34 +08:00
Shanshan Shen	8fda31cafe	[Doc] Update Feature Support doc (#234 ) ### What this PR does / why we need it? Update Feature Support doc. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? no. --------- Signed-off-by: Shanshan Shen <467638484@qq.com>	2025-03-04 14:18:32 +08:00
Yikun Jiang	ebe14f20cf	Recover vllm-ascend dev image (#209 ) ### What this PR does / why we need it? Recover vllm-ascend dev image ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-03 09:08:41 +08:00
Yikun Jiang	6e358c4bef	Add Document Branch Policy (#217 ) ### What this PR does / why we need it? Add Document Branch Policy ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Related: https://github.com/vllm-project/vllm-ascend/issues/214 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-03 09:07:39 +08:00
Mengqing Cao	03dc5c01fd	[Doc] update multinode doc (#181 ) Update multinode doc fix #167 #168 Signed-off-by: MengqingCao <cmq0113@163.com>	2025-02-27 19:29:49 +08:00
wangxiyuan	6042c210bc	[CI] upgrade to newest pta (#187 ) Upgrade to newest torch-npu Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: angazenn <zengyanjia@huawei.com>	2025-02-27 16:40:23 +08:00
Shanshan Shen	ee43179767	[ModelRunner] Fix cuda hard code in model runner (#155 ) ### What this PR does / why we need it? 1. Fix cuda hard code in model runner. 2. Fix tutorials doc rendering error. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? no. Signed-off-by: Shanshan Shen <467638484@qq.com>	2025-02-27 14:16:46 +08:00
wangxiyuan	51ae37b22a	[Doc] update readme (#147 ) Fix doc issue in README --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-25 11:00:58 +08:00
Yikun Jiang	d21b3be685	Mark v0.7.1 as unmaintained and v0.7.3 as maintained (#139 ) ### What this PR does / why we need it? Mark v0.7.1 as unmaintained and v0.7.3 as maintained: vLLM released the v0.7.3 version: https://github.com/vllm-project/vllm/releases/tag/v0.7.3 which include serval commits: - https://github.com/vllm-project/vllm/pull/12874 - https://github.com/vllm-project/vllm/pull/12432 - https://github.com/vllm-project/vllm/pull/13208 We'd better to bump the versions to v0.7.3. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-21 22:41:44 +08:00
HongtaoYang	fd2cc1b883	[Docs] Add Tutorials for Online Serving on Multi Machine (#120 ) Add Tutorials for Online Serving on Multi Machine --------- Signed-off-by: SidaoY <1024863041@qq.com> Co-authored-by: yx0716 <jinyx1007@foxmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-02-21 11:03:00 +08:00
Yikun Jiang	3a4ce2aa15	[Docs] Fix vllm and vllm-ascend version (#107 ) ### What this PR does / why we need it? Fix vllm and vllm-ascend version \| branch/tag \| vllm_version \| vllm_ascend_version\|pip_vllm_ascend_version\|pip_vllm_version\| \|----\|----\|----\|----\|----\| \| main \| main \| main \| v0.7.1rc1 \| v0.7.1 \| \| v0.7.1-dev \| v0.7.1 \| v0.7.1rc1 \| v0.7.1rc1 \| v0.7.1 \| \| v0.7.1rc1 \| v0.7.1 \| v0.7.1rc1 \| v0.7.1rc1 \| v0.7.1 \| ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-20 11:05:35 +08:00
wangxiyuan	cff03a4913	[CI] change to quay.io (#102 ) change docker registry to quay Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-19 17:04:46 +08:00
wangxiyuan	fafd70e91c	[Doc] Update doc to work with release (#85 ) 1. Update CANN image name 2. Add pta install step 3. update vllm-ascend docker image name to ghcr 4. update quick_start to use vllm-ascend image directly. 5. fix `note` style Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-19 09:51:43 +08:00
Yikun Jiang	17de078d83	[Docs] Add dynamic version in docs (#90 ) ### What this PR does / why we need it? Add dynamic version in docs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview: https://vllm-ascend--90.org.readthedocs.build/en/90/ Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-19 08:57:27 +08:00
wangxiyuan	7606977739	[Doc] Add release note (#59 ) Add release note template and init the first release note content Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-18 11:20:06 +08:00
Yikun Jiang	7cc024a2d3	[Docs] Refeactor installation doc (#78 ) ### What this PR does / why we need it? Refeactor installation doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI, preview Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-17 22:12:07 +08:00
Shanshan Shen	7c8bdc3a18	[Doc] Update tutorials (#79 ) ### What this PR does / why we need it? Update tutorials. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? no. --------- Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>	2025-02-17 22:11:04 +08:00
Shanshan Shen	2a678141d4	[Doc] Add vllm-ascend usage doc & fix doc format (#53 ) ### What this PR does / why we need it? 1. Add vllm-ascend tutorial doc for Qwen/Qwen2.5-7B-Instruct model serving doc 2. fix format of files in `docs` dir, e.g. format tables, add underline for links, add line feed... ### Does this PR introduce _any_ user-facing change? <!-- Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> no. ### How was this patch tested? doc CI passed --------- Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>	2025-02-17 18:37:29 +08:00
Mengqing Cao	c935b7006c	[doc] fix feature support (#70 ) Check and update the feature support table. - both multi-step and speculative decoding require adaptation of corresponding workers - prompt adapter (finetune method) require adaption in worker.py and model_runner.py Signed-off-by: MengqingCao <cmq0113@163.com>	2025-02-17 15:43:37 +08:00
Yikun Jiang	a6f91f70b7	[Doc] Add versioning_policy doc (#62 ) ### What this PR does / why we need it? This patch add the versioning policy doc for vllm-ascend Reference: - https://spark.apache.org/versioning-policy.html - https://docs.openstack.org/project-team-guide/stable-branches.html - https://github.com/pytorch/pytorch/blob/main/RELEASE.md ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? preview: https://vllm-ascend--62.org.readthedocs.build/en/62/ Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-17 14:13:28 +08:00
wangxiyuan	e264987af2	[Doc] Add install doc (#49 ) Add official install guide. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-14 10:22:15 +08:00
Yikun Jiang	46977f9f06	[Doc] Add sphinx build for vllm-ascend (#55 ) ### What this PR does / why we need it? This patch enables the doc build for vllm-ascend - Add sphinx build for vllm-ascend - Enable readthedocs for vllm-ascend - Fix CI: - exclude vllm-empty/tests/mistral_tool_use to skip `You need to agree to share your contact information to access this model` which introduce in `314cfade02` - Install test req to fix https://github.com/vllm-project/vllm-ascend/actions/runs/13304112758/job/37151690770: ``` vllm-empty/tests/mistral_tool_use/conftest.py:4: in <module> import pytest_asyncio E ModuleNotFoundError: No module named 'pytest_asyncio' ``` - exclude docs PR ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? 1. test locally: ```bash # Install dependencies. pip install -r requirements-docs.txt # Build the docs and preview make clean; make html; python -m http.server -d build/html/ ``` Launch browser and open http://localhost:8000/. 2. CI passed with preview: https://vllm-ascend--55.org.readthedocs.build/en/55/ Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-13 18:44:17 +08:00

28 Commits