xc-llm-ascend

Author	SHA1	Message	Date
Pleaplusone	df0ec55162	Disaggregate prefill for kv cache register style (#950 ) ### What this PR does / why we need it? This PR adopt `LLMDataDist` for kv cache register and `pull_blocks` style disaggregate prefill implementation. The interface implementation mainly follows the design of NIXL PR https://github.com/vllm-project/vllm/pull/17751/files#diff-7eaad0b7dee0626bf29d10081b0f0c5e3ea15a4af97e7b182a4e0d35f8346953 . This PR can be test with the following step: - Generate the rank table for all machine. - execute`toy_proxy.py` to launch the disaggregate prefill proxy server, specify the prefill ip, port and the decode ip, port - Run the prefill server and decode server. - send the request to the disaggregate prefill proxy ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.9.2 - vLLM main: `8d0a01a5f2` --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com> Signed-off-by: machenglong <machenglong_yewu@cmss.chinamobile.com> Signed-off-by: liziyu179 <3475441767@qq.com> Signed-off-by: underfitc <hucong24@huawei.com> Signed-off-by: zouyida2052 <zouyida@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: underfituu <hzhucong@163.com> Co-authored-by: machenglong <machenglong_yewu@cmss.chinamobile.com> Co-authored-by: liziyu179 <3475441767@qq.com> Co-authored-by: underfitc <hucong24@huawei.com> Co-authored-by: zouyida2052 <zouyida@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com> Co-authored-by: underfituu <hzhucong@163.com>	2025-07-26 17:15:47 +08:00
li chaoran	ff97740b8d	Use mirror images (#1912 ) ### What this PR does / why we need it? More discussion can be found [here](https://github.com/ascend-gha-runners/docs/issues/23). The infra team deployed a internal registry since both `m.daocloud.io` and `quay.io` suffered a unstable connect quality. CI will benefit both the connection and download speed by switching to the internal registry. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? tested locally - vLLM version: v0.9.2 - vLLM main: `6b46c4b653` --------- Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>	2025-07-24 10:47:05 +08:00
li chaoran	3e39d7234c	[CI] Switching to infra cache server to reduce network pressure (#1792 ) ### What this PR does / why we need it? This PR introduce the infra cache server to speed up apt/pip package installation ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? Tested locally, with this config, the network bandwith reduce from 100% to 5% usage when a new PR was submitted. <img width="807" height="334" alt="image" src="https://github.com/user-attachments/assets/16f03bce-4531-4c71-ab6e-8308dc2c022c" /> - vLLM version: v0.9.2 - vLLM main: `8dfb45ca33` --------- Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>	2025-07-18 18:39:25 +08:00
zhangxinyuehfad	4e910186de	[CI/UT] Unify model usage via ModelScope in CI (#1207 ) ### What this PR does / why we need it? Unify Model Usage via ModelScope ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-07-04 10:52:17 +08:00
Mengqing Cao	20767a043c	[CI/UT] Fix disaggregated prefill ci (#1313 ) ### What this PR does / why we need it? Use eager mode to run disaggregated prefill ci ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new existing test. --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-06-24 17:11:00 +08:00
Mengqing Cao	96fa7ff63b	[DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (#1235 ) ### What this PR does / why we need it? 1. Fix rank set in DP scenario. The new poc version of torch-npu support setting `ASCEND_RT_VISIBLE_DEVICES` dynamically, thus we could use the rank set in `DPEngineCoreProc` directly instead of calculating local rank across dp by hand in the patched `_init_data_parallel` Closes: https://github.com/vllm-project/vllm-ascend/issues/1170 2. Bump torch-npu version to 2.5.1.post1.dev20250528 Closes: https://github.com/vllm-project/vllm-ascend/pull/1242 Closes: https://github.com/vllm-project/vllm-ascend/issues/1232 ### How was this patch tested? CI passed with new added test. --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Icey <1790571317@qq.com> Co-authored-by: Icey <1790571317@qq.com>	2025-06-16 23:09:53 +08:00
wangxiyuan	4f5964420e	[CI] Upgrade vllm to 0.9.1 (#1165 ) 1. upgrade vllm to 0.9.1. 0.9.0 is not supported for main branch now. keep doc to 0.9.0 until we release the first 0.9.1 release. 2. disable V0 test for PR 3. move actionlint check to lint job Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-06-11 16:33:11 +08:00
Yikun Jiang	9e855b70be	Adjust concurrency group for each npu workflow (#1068 ) ### What this PR does / why we need it? Adjust concurrency group for each npu workflow - for pd and benchmarks share the static-08-01, so only one job can runs on - other job one PR/schedule should have only 1 job runs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-05 09:17:04 +08:00
Mengqing Cao	6eddbd2521	[CI/UT][PD Disaggreate] Initialize PD Disaggreate UT (#889 ) Initialize PD Disaggreate UT --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-05-29 10:17:12 +08:00
wangxiyuan	f6e5decc10	[CI] upgrade to vllm 0.9.0 (#959 ) Upgrade to vllm 0.9.0. 0.8.5 will not be supported any more. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-28 21:18:41 +08:00
wangxiyuan	e2a0c19cea	[CI] Refactor CI (#952 ) 1. remove some useless test func and file 2. fix format.sh problem 3. enable full test for singlecard and multicard 4. move long term test to long_term folder. For this kind of test, it only runs by labeled and daily test. Include: spec decode、accuracy test ## After refactor: There are 4 test modules - `singlecard`: contains the test running on one NPU. It'll be run for each PR and daily test. - `multicard`: contains the test running on multi NPUs. It'll be run for each PR and daily test. - `long_term`: contains the test that cost much time(Now include `spec decode` and `accuracy` test). It'll be run for the PR with `long-term-test` labeled and daily test. - `e2e`: contains the test for doc and pd feature. It'll be run for the PR with `pd-test` labeled and daily test. ## Todo: 1. some test are skipped, they should be fixed and reenabled in the future. 2. pyhccl test for multicard doesn't work at all. It should be enabled as well. 3. ensure long-term-test pass by daily test. ### Know issue Now, `ready` labels is required to start pd test or long term test. And when `long-term-test` or `pd-test` is labeled after another one, the old labeled test will be re-run again. So the labeled test should be ran in the following step: 1. decide which test need run, then label it. `long-term-test` or `pd-test` or both. 2. add `ready-for-test` label, then the test will be ran. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-28 06:31:35 +08:00
Yikun Jiang	508242425c	[CI][1/N] Add basic ci for PD disaggregation (#830 ) ### What this PR does / why we need it? Add basic CI for PD disaggregation, and enable it when schedule and label with `module:pd` - Updated `.github/actionlint.yaml` to add a new self-hosted runner configuration: `linux-arm64-npu-static-8`. - Introduced a new GitHub Actions workflow `.github/workflows/vllm_ascend_test_pd.yaml` for PD disaggregation testing: - Scheduled to run daily at 23:00 UTC and triggered by pull request label `module:pd`. - Added steps for baisci installation and other steps will add in followup PR Related: https://github.com/vllm-project/vllm-ascend/issues/841 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed - No trigger by default <img width="847" alt="image" src="https://github.com/user-attachments/assets/23aa128f-526d-447f-91c8-8ebf6be8400f" /> - Trigger only if we tag with pd <img width="930" alt="image" src="https://github.com/user-attachments/assets/aef1caca-2029-48e8-a6e6-860136adcd37" /> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-05-14 18:04:16 +08:00

12 Commits