xc-llm-ascend

Author	SHA1	Message	Date
sunshine202600	1dd1de8153	[Doc][Misc] Improve readability and fix typos in documentation (#8340 ) ### What this PR does / why we need it? This PR improves the readability of the documentation by fixing typos, correcting command extensions, and fixing broken links in the Chinese README. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Documentation changes only. --------- Signed-off-by: sunshine202600 <sunshine202600@163.com>	2026-04-17 08:54:38 +08:00
herizhen	95726d20eb	[Doc][Misc] Correcting the document and uploading the model deployment template (#8287 ) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? Correcting the document and uploading the model deployment template ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>	2026-04-15 16:03:11 +08:00
Angazenn	9e8da00f95	[V0.18.0][Doc] add preemption in FAQs (#8136 ) ### What this PR does / why we need it? This PR adds description of preemption into FAQs in vLLM-Ascend. This FAQ stats: - how preemption affects the performance of a vLLM server. - how reduce the negative impacts of preemption. The reason why we add this FAQ is that we find that the origin description of preemption in vLLM is not very straightforward. If preemption causes performance drop, users might not be aware that this is caused by Preemption. ### Does this PR introduce _any_ user-facing change? No. Signed-off-by: Angazenn <supperccell@163.com>	2026-04-10 17:36:45 +08:00
herizhen	0d1424d81a	[Doc][Misc] Comprehensive documentation cleanup and grammatical fixes (#8073 ) What this PR does / why we need it? This pull request performs a comprehensive cleanup of the vLLM Ascend documentation. It fixes numerous typos, grammatical errors, and phrasing issues across community guidelines, developer documents, hardware tutorials, and feature guides. Key improvements include correcting hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code examples (removing duplicate flags and trailing commas), and improving the clarity of technical explanations. These changes are necessary to ensure the documentation is professional, accurate, and easy for users to follow. Does this PR introduce any user-facing change? No, this PR contains documentation-only updates. How was this patch tested? The changes were manually reviewed for accuracy and grammatical correctness. No functional code changes were introduced. --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>	2026-04-09 15:37:57 +08:00
Mengqing Cao	fdd0726ae4	[v0.18.0][Triton] Fix triton-ascend version in Dockerfile (#7766 ) ### What this PR does / why we need it? Triton-ascend occasionally encounters compilation errors, which is a known issue in triton-ascend 3.2.0. However, we want to use the official version rather than the development version, so we only changed the triton-ascend version in the Dockerfile and added a FAQ to explain this issue. --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2026-03-30 14:43:16 +08:00
Mengqing Cao	e20f0b1a0d	[ReleaseNote] Add release note for v0.17.0rc1 (#7240 ) ### What this PR does / why we need it? This pull request adds the release notes for `v0.17.0rc1`. It also updates version numbers across various documentation files, including `README.md`, `README.zh.md`, `docs/source/community/versioning_policy.md`, and `docs/source/conf.py` to reflect the new release. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e`	2026-03-15 22:47:47 +08:00
NJX	bb506a1c99	[Doc][Installation] Clarify SOC_VERSION for CPU-only source builds (#7278 ) ### What this PR does / why we need it? - Clarify that `SOC_VERSION` must be set when building from source in a CPU-only environment where `npu-smi` is unavailable. - Add concrete `SOC_VERSION` examples (A2/A3/300I/A5) and point users to `Dockerfile*` defaults. - Improve the `setup.py` error message so users get actionable guidance when `SOC_VERSION` is missing. Fixes #6816. ### Does this PR introduce _any_ user-facing change? - Yes. Documentation is updated and the build-time error message is more informative. ### How was this patch tested? - (Local) Syntax check: `python -m compileall setup.py`. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` Signed-off-by: NJX-njx <3771829673@qq.com>	2026-03-14 22:38:25 +08:00
Canlin Guo	a78a00e0b1	[Doc][ReleaseNote] Add release notes for v0.16.0rc1 (#7067 ) Add release notes for v0.16.0rc1 - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: Canlin Guo <961750412@qq.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2026-03-10 22:45:05 +08:00
wangxiyuan	3d43ed997e	add release note for 0.15.0rc1 (#6839 ) Add release note for 0.15.0rc1 - vLLM version: v0.15.0 - vLLM main: `83b47f67b1` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-27 11:55:55 +08:00
wangxiyuan	a95c0b8b82	[Doc] fix the nit in docs (#6826 ) Refresh the doc, fix the nit in the docs - vLLM version: v0.15.0 - vLLM main: `83b47f67b1` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-27 11:50:27 +08:00
Cao Yi	6de207de88	[main][Docs] Fix typos across documentation (#6728 ) ## Summary Fix typos and improve grammar consistency across 50 documentation files. ### Changes include: - Spelling corrections (e.g., "Facotory" → "Factory", "certainty" → "determinism") - Grammar improvements (e.g., "multi-thread" → "multi-threaded", "re-routed" → "re-run") - Punctuation fixes (semicolon consistency in filter parameters) - Code style fixes (correct flag name `--num-prompts` instead of `--num-prompt`) - Capitalization consistency (e.g., "python" → "Python", "ascend" → "Ascend") - vLLM version: v0.15.0 - vLLM main: `9562912cea` --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>	2026-02-13 15:50:05 +08:00
Cao Yi	1c7d1163f5	[main][Docs] Fix spelling errors across documentation (#6649 ) Fix various spelling mistakes in the project documentation to improve clarity and correctness. - vLLM version: v0.15.0 - vLLM main: `d7e17aaacd` --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>	2026-02-10 11:14:57 +08:00
wangxiyuan	c38166eefa	[Doc] backport 0.13.0 release note (#6584 ) ### What this PR does / why we need it? Backport 0.13.0 release note to main branch and update related doc link ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? by doc CI - vLLM version: v0.15.0 - vLLM main: `d7e17aaacd` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-06 10:29:15 +08:00
wangxiyuan	52d4acfa51	[Doc] add release note for v0.14.0rc1 (#6225 ) Add release note for v0.14.0rc1 - vLLM version: v0.14.0 - vLLM main: `d68209402d` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-26 14:22:40 +08:00
wangxiyuan	21833a4321	[Doc] Add release note for 0.13.0rc2 (#6207 ) Add release note for 0.13.0rc2 - vLLM version: v0.14.0 - vLLM main: `d68209402d` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-24 12:51:47 +08:00
SILONG ZENG	4811ba62e0	[Lint]Style: reformat markdown files via markdownlint (#5884 ) ### What this PR does / why we need it? reformat markdown files via markdownlint - vLLM version: v0.13.0 - vLLM main: `bde38c11df` --------- Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>	2026-01-15 09:06:01 +08:00
wangxiyuan	354ee3b330	[Doc] Update doc url link (#5781 ) Drop `dev` suffix for doc url. Rename url to `https://docs.vllm.ai/projects/ascend` - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-12 11:21:31 +08:00
Mengqing Cao	1b5d5abf86	[ReleaseNote] Add release note for v0.13.0rc1 (#5334 ) ### What this PR does / why we need it? Add release note for v0.13.0rc1 - vLLM version: release/v0.13.0 - vLLM main: `bc0a5a0c08` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-12-27 18:46:57 +08:00
lilinsiman	3f7a2fba70	[main][doc] Instructions for using permissions added to docker (#5092 ) ### What this PR does / why we need it? Instructions for using permissions added to docker ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-12-17 15:26:09 +08:00
wangxiyuan	d11b74a571	Add release note for v0.11.0 (#4918 ) Add release note for v0.11.0. We'll release soon. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-16 17:31:45 +08:00
wangxiyuan	42ceaf08a1	add release note for 0.12.0 (#4995 ) Add release note for v0.12.0rc1 Update deepseek3.2 tutorial doc - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-13 22:09:59 +08:00
wangxiyuan	e538fa6f9c	[Doc] Update tutorial index (#4920 ) Update tutorial index and remove useless doc - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-11 20:53:13 +08:00
wangxiaoteng888	a77045f355	[P/D][main]Offline the llmdatadist connector related parts of the code and files. (#4780 ) ### What this PR does / why we need it? As support for the mooncake connector is now available, the llmdatadist connector is no longer being maintained, so the llmdatadist-related files need to be retired. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-12-09 22:36:43 +08:00
wangxiyuan	9a73c22b1c	[Doc] add release note for v0.11.0rc3 (#4646 ) Add release note for 0.11.0rc3. We'll release it today. - vLLM version: 86e178f7c4d8c3b0eaf3c8e3f810a83f63b90e24 - vLLM main: `86e178f7c4` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-03 11:49:44 +08:00
wangxiyuan	fff258bce1	[Doc] add release note for v0.11.0rc2 (#4348 ) add release note for v0.11.0rc2 - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-21 23:03:32 +08:00
lilinsiman	adee9dd3b1	[Info][main] Correct the mistake in information documents (#4157 ) ### What this PR does / why we need it? Correct the mistake in information documents ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-11-13 15:53:58 +08:00
wangxiyuan	64220c68c5	[Doc] Add release note for v0.11.0rc1 (#3931 ) Add release note for v0.11.0rc1. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-10 21:01:50 +08:00
lilinsiman	a3ff765c65	[Info][main] Corrected the errors in the information (#4055 ) ### What this PR does / why we need it? Corrected the errors in the information ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-11-08 18:48:59 +08:00
Liwx	eed1957f03	Add FAQ for docker pull error on Kylin OS (#3870 ) Added instructions for resolving 'invalid tar header' error on Kylin OS with an ARM64 architecture on Atlas300I hardware during docker pull, including steps for offline loading of docker images. --- ### What this PR does / why we need it? The primary motivation for this PR is to address a critical `docker pull` failure that occurs on specific, yet important, enterprise environments. Specifically, when operating on Kylin OS (麒麟操作系统) with an ARM64 architecture on Atlas300I hardware, users frequently encounter an `archive/tar: invalid tar header` error, which completely blocks the setup process. This issue has been consistently reproduced, with multiple retries failing with the same error, confirming that it is a persistent environmental problem rather than a transient network issue. <img width="2060" height="525" alt="image" src="https://github.com/user-attachments/assets/6c1c5728-de27-476f-8df4-723564fc290b" /> This guide provides a robust, step-by-step workaround using an offline-loading method (`docker save` on a host machine and `docker load` on the target machine). This solution is crucial for enabling users on this platform to use vLLM. This contribution does not directly fix an existing issue number, but it proactively solves a significant environmental and usability problem for a growing user base. ### Does this PR introduce _any_ user-facing change? No.It does not alter any code, APIs, interfaces, or existing behavior of the vLLM project. ### How was this patch tested? The instructions and troubleshooting steps in this guide were validated through a real-world, end-to-end test case on the my hardware and OS. The testing process was as follows: 1. Problem Reproduction: An attempt was made to directly `docker pull` the `vllm-ascend:v0.10.0rc1-310p` image on a target machine running Kylin OS (ARM64). The `invalid tar header` failure was successfully and consistently reproduced, confirming the existence of the problem. 2. Solution Implementation: The workaround detailed in the guide was executed: * On a separate host machine (Ubuntu x86_64), the image was successfully pulled using the `--platform linux/arm64` flag. * The image was then saved to a `.tar` archive using `docker save`. * The `.tar` archive was transferred to the target Kylin OS machine. * The image was successfully loaded from the archive using `docker load -i ...`. 3. End-to-End Validation: After loading the image, the vLLM container was launched on the target machine following the instructions in the guide. Both online inference (via `curl` to the API server) and offline inference (via the Python script) were executed successfully, confirming that the entire workflow described in the document is accurate and effective. Since this is a documentation-only change based on a validated workflow, no new unit or integration tests were added to the codebase. - vLLM version: v0.11.0rc3 - vLLM main: `83f478bb19` --------- Signed-off-by: Liwx <liweixuan1014@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-30 14:10:52 +08:00
zhangxinyuehfad	789ba4c5c2	[Doc] Update doc (#3836 ) ### What this PR does / why we need it? Update doc ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-10-29 11:03:39 +08:00
wangxiyuan	da5f2cc1e3	[Doc] Update FAQ (#3792 ) Many FAQ content is out of date, this PR refresh it. - vLLM version: v0.11.0rc3 - vLLM main: `c9461e05a4` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-27 20:32:17 +08:00
leo-pony	291c00a224	[Doc] pin version that can stable running 310I Duo to vllm-ascend v0.10.0rc1 (#3455 ) Pin version that can stable running 310I Duo to vllm-ascend v0.10.0rc1. ### What this PR does / why we need it? Since PR #2614 310I Duo been broken. Although we are currently working on fixing the issue with the 310I Duo being broken, there is no confirmed timeline for a fix in the short term. To allow users to quickly find a working version instead of going back and forth on trial and error, this PR fixes the version in the 310I Duo guide. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-10-16 08:54:09 +08:00
wangxiyuan	00ba071022	[Doc] Release note for v0.11.0rc0 (#3224 ) ### What this PR does / why we need it? Add release note for v0.11.0rc0 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-30 03:26:18 +08:00
weiguihua2	065486820b	[Doc] add faqs:install vllm-ascend will overwrite existing torch-npu (#3245 ) ### What this PR does / why we need it? add faqs:install vllm-ascend will overwrite existing torch-npu ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2025-09-29 12:02:23 +08:00
lilinsiman	1705501ae2	[BugFix] Fix ACLgraph bug in Qwen3_32b_int8 case (#3204 ) ### What this PR does / why we need it? 1. Solved the issue where sizes capture failed for the Qwen3-32b-int8 model when aclgraph, dp1, and tp4 were enabled. 2. Added the exception thrown when sizes capture fails and provided a solution 3. Add this common problem to the FAQ doc ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.10.2 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-09-28 17:44:04 +08:00
wangxiyuan	048bfd5553	[Release] Add release note for v0.10.2rc1 (#2921 ) Add release note for v0.10.2rc1 - vLLM version: v0.10.2 - vLLM main: `b834b4cbf1` --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-16 01:20:05 +08:00
Mengqing Cao	7e16b4a7cd	[ReleaseNote] Add Release Note for v0.10.1rc1 (#2635 ) Add Release Note for v0.10.1rc1 - vLLM version: v0.10.1.1 - vLLM main: `b5ee1e3261` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-09-04 11:26:47 +08:00
wangxiyuan	41b028aa5f	[Doc] add v0.9.1 release note (#2646 ) Add release note for 0.9.1 - vLLM version: v0.10.1.1 - vLLM main: `8bd5844989` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-03 18:04:27 +08:00
Shanshan Shen	98c68220c1	[Doc] Update `v0.9.1rc3` doc (#2512 ) ### What this PR does / why we need it? Update `v0.9.1rc3` doc, which are supplements to https://github.com/vllm-project/vllm-ascend/pull/2488. - vLLM version: v0.10.0 - vLLM main: `170e8ea9ea` Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>	2025-08-25 11:39:29 +08:00
jack	8bfd16a145	[Doc] Add container image save/load FAQ for offline environments (#2347 ) ### What this PR does / why we need it? Add Docker export/import guide for air-gapped environments ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? NA - vLLM version: v0.10.0 - vLLM main: `d16aa3dae4` Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>	2025-08-13 16:00:43 +08:00
Mengqing Cao	49ec6c98b7	[Doc] Update faq (#2334 ) ### What this PR does / why we need it? - update determinitic calculation - update support device ### Does this PR introduce _any_ user-facing change? - Users should update ray and protobuf when using ray as distributed backend - Users should change to use `export HCCL_DETERMINISTIC=true` when enabling determinitic calculation ### How was this patch tested? N/A - vLLM version: v0.10.0 - vLLM main: `ea1292ad3e` Signed-off-by: MengqingCao <cmq0113@163.com>	2025-08-12 14:12:53 +08:00
Mengqing Cao	4604882a3e	[ReleaseNote] Release note of v0.10.0rc1 (#2225 ) ### What this PR does / why we need it? Release note of v0.10.0rc1 - vLLM version: v0.10.0 - vLLM main: `8e8e0b6af1` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-08-07 14:46:49 +08:00
Yikun Jiang	54ace9e12b	Add release note for v0.9.1rc2 (#2188 ) ### What this PR does / why we need it? Add release note for v0.9.1rc2 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed - vLLM version: v0.10.0 - vLLM main: `c494f96fbc` Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-08-06 09:04:46 +08:00
JohnJan	54f2b31184	[Doc] Add a doc for qwen omni (#1867 ) Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com> ### What this PR does / why we need it? Add FAQ note for qwen omni Fixes https://github.com/vllm-project/vllm-ascend/issues/1760 issue1 - vLLM version: v0.9.2 - vLLM main: `b9a21e9173`	2025-07-20 09:05:41 +08:00
wangxiyuan	9c560b009a	[Release] Add 0.9.2rc1 release note (#1725 ) Add release note for 0.9.2rc1, we'll release soon - vLLM version: v0.9.2 - vLLM main: `7bd4c37ae7` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-11 17:36:05 +08:00
Mengqing Cao	c1c5d56255	[Doc] Update FAQ and add test guidance (#1360 ) ### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-25 09:59:23 +08:00
weiguihua2	e1123172d1	[Doc] Add reinstall instructions doc (#1303 ) Add a new FAQ, if users re-install vllm-ascend with pip, the `build` folder should be removed first --------- Signed-off-by: rjg-lyh <1318825571@qq.com> Signed-off-by: weiguihua <weiguihua2@huawei.com> Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2025-06-23 14:06:27 +08:00
xleoken	4447e53d7a	[Doc] Change not to no in faqs.md (#1357 ) ### What this PR does / why we need it? Change not to no in faqs.md. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Local Test Signed-off-by: xleoken <xleoken@163.com>	2025-06-23 09:01:00 +08:00
Yikun Jiang	c30ddb8331	Bump v0.9.1rc1 release (#1349 ) ### What this PR does / why we need it? Bump v0.9.1rc1 release Closes: https://github.com/vllm-project/vllm-ascend/pull/1341 Closes: https://github.com/vllm-project/vllm-ascend/pull/1334 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: shen-shanshan <467638484@qq.com>	2025-06-22 13:15:36 +08:00
Mengqing Cao	8dd686dfa2	[MLA][Graph] Improve assertion on Graph mode with MLA (#933 ) ### What this PR does / why we need it? Improve assertion on Graph mode with MLA. When running deepseek with graph mode, the fused MLA op only support `numHeads / numKvHeads ∈ {32, 64, 128}`, thus we improve the assertion info here to avoid users confused with this. ### Does this PR introduce _any_ user-facing change? Adjusting tp size is required when running deepseek-v3/r1 with graph mode. deepseek-v2-lite is not supported in graph mode. ### How was this patch tested? Test locally as the CI machine could not run V3 due to the HBM limits. --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-06-10 22:26:53 +08:00

1 2

72 Commits