[Bugfix][LoRA] Fix the issue when enable LoRA + tp + fully_sharded_loras (#6650)
### What this PR does / why we need it?
Fix the issue #6143 .
### Does this PR introduce _any_ user-facing change?
Allow to start the server with "--enable-lora && --fully-sharded-loras
&& --tensor_parallel_size 2".
### How was this patch tested?
pytest -sv tests/e2e/multicard/2-cards/test_llama32_lora_tp2.py
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd
---------
Signed-off-by: paulyu12 <507435917@qq.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
2
.github/workflows/scripts/config.yaml
vendored
2
.github/workflows/scripts/config.yaml
vendored
@@ -97,6 +97,8 @@ e2e-multicard-2-cards:
|
||||
estimated_time: 400
|
||||
- name: tests/e2e/multicard/2-cards/test_ilama_lora_tp2.py
|
||||
estimated_time: 60
|
||||
- name: tests/e2e/multicard/2-cards/test_llama32_lora_tp2.py
|
||||
estimated_time: 223
|
||||
# Run the test in a separate step to avoid oom
|
||||
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_deepseek_multistream_moe_tp2
|
||||
estimated_time: 100
|
||||
|
||||
Reference in New Issue
Block a user