xc-llm-ascend

Author	SHA1	Message	Date
Mengqing Cao	b64ee7d346	[Dist] Set device as rank (#202 ) ### What this PR does / why we need it? The rank returned by `torch.distributed.get_rank(device_group)` is the local rank, but rank (or rank in process group (PG)) is expected. Thus we change to use `torch.npu.current_device()` to set device ```python # difference between `local_rank` and `rank_in_group`: # if we have a group of size 4 across two nodes: # Process \| Node \| Rank \| Local Rank \| Rank in Group # 0 \| 0 \| 0 \| 0 \| 0 # 1 \| 0 \| 1 \| 1 \| 1 # 2 \| 1 \| 2 \| 0 \| 2 # 3 \| 1 \| 3 \| 1 \| 3 ``` Tested by @wwfu109 with `vllm/tests/distributed/test_customops::test_multi_process_tensor_parallel_pipeline_parallel` Signed-off-by: MengqingCao <cmq0113@163.com>	2025-03-03 09:23:13 +08:00
Mengqing Cao	4544e99d88	[dist] revert communicator patch (#66 ) ### What this PR does / why we need it? Revert communicator patch as https://github.com/vllm-project/vllm/pull/13208 has been merged. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? test locally by https://github.com/vllm-project/vllm-ascend/pull/30#issuecomment-2650251266 Signed-off-by: MengqingCao <cmq0113@163.com>	2025-02-17 11:42:33 +08:00
wangxiyuan	f762ee89cc	[Communicator] Add monkey patch (#30 ) Some PR for plugin support is not merged by vllm yet. This PR add monkey patch to vllm-ascend to make vllm-ascend work with vllm directly. This patch code should be removed once the related function is supported by vllm originally. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-11 19:15:35 +08:00
Yikun Jiang	d5e7756028	[Core] Init vllm-ascend (#3 ) ### What this PR does / why we need it? vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on the Ascend NPU. This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [RFC]: Hardware pluggable, providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM. This patch also include changes to make CI work and use cache speed up e2e test, including: 1. Change push (post merge ci) and pull_request (pr ci) trigger branch to main 2. Make mypy work by ignore base_communicator and clear unused deps 3. Several improvements for vllm_ascend_test: - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins) - switch `git clone` command to `action/checkout` to speedup checkout and - Enable sv for pytest for better info dump - Remove network host to resole `docker: conflicting ontions: cannot attach both user-defined and non-user-definednetwork-modes`, which is a problem on docker 1.45 but not on 1.39. 4. Adapt MLA decode optimizations: `cabaf4eff3` ### Does this PR introduce _any_ user-facing change? Yes, init the PR. ### How was this patch tested? - This is the first PR to make ascend NPU work on vLLM. All code is tested on ascend with vLLM V0 Engine. - CI passed --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: wangli <wangli858794774@gmail.com>	2025-02-05 10:53:12 +08:00

4 Commits