xc-llm-ascend

Author	SHA1	Message	Date
Zhu Yi Lin	538dd357e6	Add graph mode and improve on multi_npu_moge.md (#1849 ) ### What this PR does / why we need it? Add graph mode and improve on multi_npu_moge.md ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? CI passed with new existing test. - vLLM version: v0.9.2 - vLLM main: `5a7fb3ab9e` Signed-off-by: GDzhu01 <809721801@qq.com>	2025-07-17 17:53:37 +08:00
wangxiyuan	eb921d2b6f	[Doc] Fix 404 error (#1797 ) Fix url 404 error in doc - vLLM version: v0.9.2 - vLLM main: `9ad0a4588b` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-15 11:52:38 +08:00
Li Wang	afcfe91dfa	[Doc] Fix multi node doc (#1783 ) ### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? Pin docker image to latest release ### How was this patch tested? - vLLM version: v0.9.2 - vLLM main: `1e9438e0b0` Signed-off-by: wangli <wangli858794774@gmail.com>	2025-07-14 17:56:57 +08:00
wangxiyuan	3c404de1b1	[Release]Update release note (#1753 ) There is still issue with pp in some case. such as aclgraph, ray. Remove the related doc in release note Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-11 17:58:26 +08:00
wangxiyuan	b5b7e0ecc7	[Doc] Add qwen3 embedding 8b guide (#1734 ) 1. Add the tutorials for qwen3-embedding-8b 2. Remove VLLM_USE_V1=1 in docs, it's useless any more from 0.9.2 - vLLM version: v0.9.2 - vLLM main: `5923ab9524` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-11 17:40:17 +08:00
wangxiyuan	9c560b009a	[Release] Add 0.9.2rc1 release note (#1725 ) Add release note for 0.9.2rc1, we'll release soon - vLLM version: v0.9.2 - vLLM main: `7bd4c37ae7` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-11 17:36:05 +08:00
wangxiyuan	3d1e6a5929	[Doc] Update user doc index (#1581 ) Add user doc index to make the user guide more clear - vLLM version: v0.9.1 - vLLM main: `49e8c7ea25` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-10 14:26:59 +08:00
Li Wang	c7446438a9	[1/N][CI] Move linting system to pre-commits hooks (#1256 ) ### What this PR does / why we need it? Follow vllm-project/vllm lint way: https://github.com/vllm-project/vllm/blob/main/.pre-commit-config.yaml Enable pre-commit to avoid some low level error AMAP. This pr is one step of #1241, The purpose is make linting system more clear and convenient, on this step, Mainly did the following things: yapf, actionlint, ruff, typos, isort, mypy, png-lint, signoff-commit, enforce-import-regex-instead-of-re. TODO: - clang-format(check for csrc with google style) need clean code, disable for now - pymarkdown need clean code, disable for now - shellcheck need clean code, disable for now ### Does this PR introduce _any_ user-facing change? Only developer UX change: https://vllm-ascend--1256.org.readthedocs.build/en/1256/developer_guide/contributing.html#run-lint-locally ``` pip install -r requirements-lint.txt && pre-commit install bash format.sh ``` ### How was this patch tested? CI passed with new added/existing test. Co-authored-by: Yikun [yikunkero@gmail.com](mailto:yikunkero@gmail.com) Co-authored-by: wangli [wangli858794774@gmail.com](mailto:wangli858794774@gmail.com) - vLLM version: v0.9.1 - vLLM main: `5358cce5ff` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-07-10 14:17:15 +08:00
Yikun Jiang	997f156a51	Use ci_vllm_version when recording vLLM commit (#1689 ) ### What this PR does / why we need it? Use ci_vllm_version when recording vllm commit Followup on https://github.com/vllm-project/vllm-ascend/pull/1623 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Test mannually. $ python3 docs/source/conf.py \| jq .ci_vllm_version \| tr -d '"' v0.9.2 - Test on my local repo: https://github.com/Yikun/vllm-ascend/pull/35 - vLLM version: v0.9.1 - vLLM main: `49e8c7ea25` Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-07-10 11:07:27 +08:00
Li Wang	0c4aa2b4f1	[Doc] Add multi node data parallel doc (#1685 ) ### What this PR does / why we need it? add multi node data parallel doc ### Does this PR introduce _any_ user-facing change? add multi node data parallel doc ### How was this patch tested? - vLLM version: v0.9.1 - vLLM main: `805d62ca88` Signed-off-by: wangli <wangli858794774@gmail.com>	2025-07-10 09:36:37 +08:00
leo-pony	b4b19ea588	[Doc] Add multi-npu qwen3-MoE-32B Tutorials (#1419 ) Signed-off-by: leo-pony <nengjunma@outlook.com> ### What this PR does / why we need it? Add multi-npu qwen3-MoE-32B Tutorials Relate RFC: https://github.com/vllm-project/vllm-ascend/issues/1248 - vLLM version: v0.9.1 - vLLM main: `5358cce5ff` --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-07-10 09:06:51 +08:00
wangxiyuan	830332ebfc	Clean up v0.9.1 code (#1672 ) vllm has released 0.9.2. This PR drop 0.9.1 support. - vLLM version: v0.9.1 - vLLM main: `b942c094e3` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-09 08:52:24 +08:00
Yikun Jiang	e4e9ea02ab	Upgrade vLLM version to v0.9.2 (#1652 ) ### What this PR does / why we need it? This patch upgrade vLLM version to v0.9.2, this patch didn't remove the v0.9.1 compatible code to easy review. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.9.1 - vLLM main: `14601f5fba` - Accuracy test with 0.9.2: https://github.com/vllm-project/vllm-ascend/actions/runs/16121612087 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-07-08 14:18:17 +08:00
Yikun Jiang	0c1d239df4	Add unit test local cpu guide and enable base testcase (#1566 ) ### What this PR does / why we need it? Use Base test and cleanup all manaul patch code - Cleanup EPLB config to avoid tmp test file - Use BaseTest with global cache - Add license - Add a doc to setup unit test in local env ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-07-06 10:42:27 +08:00
Angazenn	a5f33590d3	[CORE]initial support for torchair with non-mla backend (#1506 ) ### What this PR does / why we need it? This PR supports torchair graph mode with non-mla backend on both 800IA2 and 300I Duo platforms. The main change is to add `attention_v1_torchair.py` to support specific attention related operations that are required by torchair. ### Does this PR introduce _any_ user-facing change? Before this PR, vLLM-Ascend only allows deepseek to use torchair. Now we can also use it with pangu. Besides, we add a support model list to control which type of models that can use torchair. ### How was this patch tested? We have test it with PanguProMoE on both 800IA2 and 300I Duo platforms, and model generates answer normally. --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Signed-off-by: tianyitang <tangtianyi4@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com> Co-authored-by: tianyitang <tangtianyi4@huawei.com>	2025-07-03 22:21:42 +08:00
yupeng	d96da1f00c	[DOC] Fix word spelling (#1595 ) ### What this PR does / why we need it? Fix word spelling in DOC. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No. Signed-off-by: paulyu12 <507435917@qq.com>	2025-07-02 21:42:39 +08:00
yupeng	c3c8c9317c	[DOC] add LoRA user guide (#1265 ) ### What this PR does / why we need it? Add LoRA user guide to DOC. The content refers to [LoRA Adapters](https://docs.vllm.ai/en/latest/features/lora.html). ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No --------- Signed-off-by: paulyu12 <507435917@qq.com>	2025-07-02 14:41:31 +08:00
leo-pony	53ec583bbb	[Docs] Update Altlas 300I series doc and fix CI lint (#1537 ) ### What this PR does / why we need it? - Update Altlas 300I series doc: cleanup unused parameters and enable optimized ops - Fix code spell CI ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-30 23:34:00 +08:00
Shanshan Shen	ba577dfc52	[Doc] Add Structured Output guide (#1499 ) ### What this PR does / why we need it? Add Structured Output guide. Signed-off-by: shen-shanshan <467638484@qq.com>	2025-06-30 17:21:44 +08:00
Yikun Jiang	e4df0a4395	Add Pangu MoE Pro for 300I series docs (#1516 ) ### What this PR does / why we need it? Add Pangu MoE Pro for 300I series docs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-30 13:37:22 +08:00
Yikun Jiang	cad4c693c6	Add Pangu MoE Pro docs (#1512 ) ### What this PR does / why we need it? This PR add Pangu MoE Pro 72B docs [1] https://gitcode.com/ascend-tribe/pangu-pro-moe-model ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-30 12:15:33 +08:00
Zhu Yi Lin	b308a7a258	support pangumoe w8a8c8 and docs (#1477 ) ### What this PR does / why we need it? support pangu moe w8a8c8 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed with new added test. Signed-off-by: zhuyilin <809721801@qq.com>	2025-06-28 18:51:07 +08:00
Shanshan Shen	99e685532d	[Doc] Add Qwen2.5-VL eager mode doc (#1394 ) ### What this PR does / why we need it? Add Qwen2.5-VL eager mode doc. --------- Signed-off-by: shen-shanshan <467638484@qq.com>	2025-06-28 09:08:51 +08:00
Shanshan Shen	3687676fa7	[Doc] Add guidance on how to implement and register new models (#1426 ) ### What this PR does / why we need it? Add guidance on how to implement and register new models. Modified based on PR https://github.com/vllm-project/vllm-ascend/pull/1126, thanks for the contribution of @linfeng-yuan. --------- Signed-off-by: shen-shanshan <467638484@qq.com>	2025-06-27 16:46:49 +08:00
Zesheng Zong	192dbbcc6e	Optimize Patch developer guide (#1452 ) ### What this PR does / why we need it? Fix some terms in the user guide. Signed-off-by: zeshengzong <zesheng.zong@outlook.com>	2025-06-26 19:10:16 +08:00
Shanshan Shen	4e2daf5ab7	[Doc] Add qwen2-audio eager mode tutorial (#1371 ) ### What this PR does / why we need it? Add qwen2-audio eager mode tutorial. Signed-off-by: shen-shanshan <467638484@qq.com>	2025-06-26 16:56:05 +08:00
leo-pony	1025344912	Doc Enhancement: Single NPU(Qwen3-8B) aclgraph mode + eager mode (#1374 ) ### What this PR does / why we need it? Doc Enhancement: Single NPU(Qwen3-8B) aclgraph mode + eager mode. Relate RFC: https://github.com/vllm-project/vllm-ascend/issues/1248 ### Does this PR introduce _any_ user-facing change? No changes. ### How was this patch tested? Preview Signed-off-by: leo-pony <nengjunma@outlook.com> Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-06-26 16:52:54 +08:00
wangxiyuan	205cb85a1e	[Doc] Fix doc typo (#1424 ) 1. Fix the typo 2. Fix 404 url 3. update graph mode and additional config user guide Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-06-25 19:28:26 +08:00
Li Wang	15df8be937	[Doc] Add sleep mode doc (#1295 ) ### What this PR does / why we need it? Add sleep related doc and example --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-06-25 14:07:14 +08:00
wangxiyuan	e4e0b7af05	[Doc] Add patch doc (#1414 ) 1. Format the developer guide content to make it more clear 2. Add the patch doc for developer guide Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-06-25 12:00:45 +08:00
Mengqing Cao	c1c5d56255	[Doc] Update FAQ and add test guidance (#1360 ) ### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-25 09:59:23 +08:00
Yikun Jiang	917c6b71af	[TEST][DOC] Fix doctest and add system package installation (#1375 ) ### What this PR does / why we need it? - Fix [doctest](https://github.com/vllm-project/vllm-ascend/actions/workflows/vllm_ascend_doctest.yaml?query=event%3Aschedule) - add system package installation - Add doc for run doctests - Cleanup all extra steps in .github/workflows/vllm_ascend_doctest.yaml - Change schedule job from 4 ---> 12 hours ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - doctest CI passed - Local test with `/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh`. Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-23 20:50:33 +08:00
Icey	08cfc7cb4b	Modify installation.md for adding pip extra index of torch-npu (#1272 ) ### What this PR does / why we need it? Modify installation.md for adding pip extra index of torch-npu ### How was this patch tested? No need --------- Signed-off-by: Icey <1790571317@qq.com>	2025-06-23 15:37:50 +08:00
weiguihua2	e1123172d1	[Doc] Add reinstall instructions doc (#1303 ) Add a new FAQ, if users re-install vllm-ascend with pip, the `build` folder should be removed first --------- Signed-off-by: rjg-lyh <1318825571@qq.com> Signed-off-by: weiguihua <weiguihua2@huawei.com> Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2025-06-23 14:06:27 +08:00
Pleaplusone	7e6efbf2a9	update torch-npu to 2.5.1.post1.dev20250619 (#1347 ) ### What this PR does / why we need it? This PR update the torch_npu to newest release version 2.5.1.post1.dev20250619 . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI tested will guarantee the update Signed-off-by: ganyi <pleaplusone.gy@gmail.com>	2025-06-23 09:02:09 +08:00
xleoken	4447e53d7a	[Doc] Change not to no in faqs.md (#1357 ) ### What this PR does / why we need it? Change not to no in faqs.md. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Local Test Signed-off-by: xleoken <xleoken@163.com>	2025-06-23 09:01:00 +08:00
Yikun Jiang	2e5f312530	Cleanup ununsed doc (#1352 ) ### What this PR does / why we need it? Cleanup ununsed doc for MoGE model, we will add back this when MoGE model ready. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-22 15:05:30 +08:00
Yikun Jiang	c30ddb8331	Bump v0.9.1rc1 release (#1349 ) ### What this PR does / why we need it? Bump v0.9.1rc1 release Closes: https://github.com/vllm-project/vllm-ascend/pull/1341 Closes: https://github.com/vllm-project/vllm-ascend/pull/1334 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: shen-shanshan <467638484@qq.com>	2025-06-22 13:15:36 +08:00
wangxiyuan	45be1aac0c	[CI] Add codespell check for doc (#1314 ) Add codespell check test for doc only PR Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-06-20 16:48:14 +08:00
22dimensions	761bd3d9d7	Add user guide for quantization (#1206 ) ### What this PR does / why we need it? Add user guide for quantization ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-06-20 15:53:25 +08:00
Yikun Jiang	05dec7eda9	[Doc] Refactor and init user story page (#1224 ) ### What this PR does / why we need it? This PR refactor the user stories page: - Move it to community - Add initial info of LLaMA-Factory, Huggingface/trl, MindIE Turbo, GPUStack, verl - Add a new page for LLaMA-Factory ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview locally Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-17 09:36:35 +08:00
Yikun Jiang	9d3cbc0953	[Doctest] add installation doctest (#1179 ) ### What this PR does / why we need it? Install doctest ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Related: https://github.com/vllm-project/vllm-ascend/pull/983 Co-authored-by: wangli <wangli858794774@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: wangli <wangli858794774@gmail.com>	2025-06-17 08:52:26 +08:00
Mengqing Cao	96fa7ff63b	[DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (#1235 ) ### What this PR does / why we need it? 1. Fix rank set in DP scenario. The new poc version of torch-npu support setting `ASCEND_RT_VISIBLE_DEVICES` dynamically, thus we could use the rank set in `DPEngineCoreProc` directly instead of calculating local rank across dp by hand in the patched `_init_data_parallel` Closes: https://github.com/vllm-project/vllm-ascend/issues/1170 2. Bump torch-npu version to 2.5.1.post1.dev20250528 Closes: https://github.com/vllm-project/vllm-ascend/pull/1242 Closes: https://github.com/vllm-project/vllm-ascend/issues/1232 ### How was this patch tested? CI passed with new added test. --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Icey <1790571317@qq.com> Co-authored-by: Icey <1790571317@qq.com>	2025-06-16 23:09:53 +08:00
22dimensions	0d2074a1ec	[Doc] fix VLLM_USE_V1 value in graph mode docs (#1226 ) os.environ["VLLM_USE_V1"] must be assigned with str, not other type. ![image](https://github.com/user-attachments/assets/9d337ae5-00e5-4179-832e-c6c917dd5798) Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-06-15 15:41:11 +08:00
fems14	ab5d110fcc	vllm-ascend support chunked prefill (#1172 ) ### What this PR does / why we need it? vllm-ascend support chunked prefill for MLA --------- Signed-off-by: fems14 <1804143737@qq.com>	2025-06-14 22:31:16 +08:00
Mengqing Cao	a3b5af8307	[CI/UT][Graph] Add ut for torchair graph mode (#1103 ) ### What this PR does / why we need it? Add ut for torchair graph mode on DeepSeekV3 ### How was this patch tested? CI passed with new added test. --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-06-14 16:59:00 +08:00
Yikun Jiang	94a52cf577	Add ShouJian Zheng (@jianzs) as vLLM Ascend maintainer (#1203 ) ### What this PR does / why we need it? Add @jianzs as vLLM Ascend maintainer @jianzs ---- I would like to nominate Shoujian Zheng (@jianzs <https://github.com/jianzs>) as a maintainer, starting with my +1. - He focuses on the code quality and good design with solid reviews in P/D disaggregation and DeepSeek improvement area about 30+ high quality review, such as #issuecomment-2811764833, #discussion_r2069927605 and #pullrequestreview-2820996674. This is the most important reason why I nominated him, because helping community developers complete PRs with high quality and continuously ensure the quality of codebase is one of the important responsibilities of a maintainer. We believe he is a great addition. - Shoujian's main expertise is distributed inference. He has a lot of experience in production about AI infra. He has very good habits and explains in great detail all changes #issue-3023082580 anqd share results open: #issuecomment-2853140443. And High quality PR: #706, #774, #852. - Community Involvement: Active involved in community discussion, he is collaborative and helps the users solve problems, involved in 30+ PR and issue, such as #issuecomment-2911934292 and #issuecomment-2833523571. Reference: [1] https://vllm-ascend.readthedocs.io/en/latest/community/contributors.html [2] https://vllm-ascend.readthedocs.io/en/latest/community/governance.html Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-13 18:25:50 +08:00
sdmyzlp	e72f94e38f	Support multistream of MLA vector operations (#1135 ) ### What this PR does / why we need it? Move all vector operations to a secondary stream, with the expected overlaping being: ``` \| q_rmsnorm \| \| kv_norm_rope_cache \| \| q_rope \| \| matmul W_DQ \| matmul W_DKV \| index \| index \| matmul W_UQ \| split \| matmul W_KV_T \| ``` Currently, the `IndexByTensor` operators introduced by computation of `cos` and `sin` can't be offloaded to the secondary stream due to a known bug of graph fusion optimization pass. So we instead keep it in the main stream, only requires it be computed before `matmul W_UQ` to avoid hindering later overlapping. The problem may be solved by later optimization (#993), which hoists the computation of `cos` and `sin` up to the first layer. ### Does this PR introduce _any_ user-facing change? Controlled by `torchair_graph_config.enable_multistream_mla`, defaulted to False. ### How was this patch tested? Tested on 1x16 910 node, with tailored 2 layer DSKv2. Signed-off-by: sdmyzlp <lrwei2@petalmail.com>	2025-06-12 21:42:09 +08:00
Wan_Danfeng	55c0e68883	[Doc] Add Referer header for CANN package download url. (#1192 ) ### What this PR does / why we need it? fix the CANN download url ### Does this PR introduce _any_ user-facing change? no, do not have any user-facing change ### How was this patch tested? run the wget command and cann package is rightly downloaded. --------- Signed-off-by: wan_danfeng <wonderful199082@126.com>	2025-06-12 21:22:23 +08:00
chenwaner	e46dc142bf	Enable kvcache_nz for the decode process in torchair graph mode (#1098 ) What this PR does / why we need it? Enable kvcache_nz for the decode process in torchair graph mode, which reduces the time consumed by FA in long sequences. Does this PR introduce any user-facing change? If need to enable kvcache_nz, should set the additional_config.torchair_graph_config.enable_kv_nz=True How was this patch tested? 1. Tested in deepseek model: with batchsize 64 and seq_len 1k+3k, 61 layers FA total time improves 20.80ms -> 19.76ms 2. operator precision test: [aclnnFusedInferAttentionScoreV3_result.csv](https://github.com/user-attachments/files/20664138/aclnnFusedInferAttentionScoreV3_result.csv) 3. tpot test from @ttanzhiqiang, and curl one result is normal https://github.com/vllm-project/vllm-ascend/pull/1098#issuecomment-2948542159 https://github.com/vllm-project/vllm-ascend/pull/1098#issuecomment-2954496588 --------- Signed-off-by: chenwaner <861645847@qq.com>	2025-06-11 14:09:28 +08:00

1 2 3 4

156 Commits