### What this PR does / why we need it?
Add V1Engine LoRA support.
Add LoRA e2e test on single card and multiple cards.
### Does this PR introduce _any_ user-facing change?
support lora for V1
### How was this patch tested?
CI passed with new added test
---------
Signed-off-by: jesse <szxfml@gmail.com>
Signed-off-by: paulyu <paulyu0307@gmail.com>
Signed-off-by: paulyu12 <507435917@qq.com>
Co-authored-by: jesse <szxfml@gmail.com>
Co-authored-by: paulyu <paulyu0307@gmail.com>
### What this PR does / why we need it?
Part of #499
Add qwen2.5-vl test on single npu, v1 engine is excluded because
qwen2.5-vl has some problems with v1 now, at the same time, this test
can also make #639 more credible
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
- Add a new runner to the continuous integration system and keep the
original CI runner until the new runner runs stably
- Add distributed test cases
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.
This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.
This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
2. Make mypy work by ignore base_communicator and clear unused deps
3. Several improvements for vllm_ascend_test:
- use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
- Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
cabaf4eff3
### Does this PR introduce _any_ user-facing change?
Yes, init the PR.
### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>