xc-llm-ascend/tests/e2e/multicard/2-cards/test_ilama_lora_tp2.py

import pytest

from tests.e2e.conftest import VllmRunner
from tests.e2e.singlecard.test_ilama_lora import EXPECTED_LORA_OUTPUT, MODEL_PATH, do_sample


@pytest.mark.parametrize("distributed_executor_backend", ["mp"])
def test_ilama_lora_tp2(distributed_executor_backend, ilama_lora_files):
    with VllmRunner(
        MODEL_PATH,
        enable_lora=True,
        max_loras=4,
        dtype="half",
        max_model_len=1024,
        max_num_seqs=16,
        tensor_parallel_size=2,
        cudagraph_capture_sizes=[1, 2, 4, 8],
        distributed_executor_backend=distributed_executor_backend,
        enforce_eager=True,
    ) as vllm_model:
        output = do_sample(vllm_model.model, ilama_lora_files, lora_id=2)

    for i in range(len(EXPECTED_LORA_OUTPUT)):
        assert output[i] == EXPECTED_LORA_OUTPUT[i]
[V1][LoRA][Test] V1 Engine LoRA support & e2e test (#893) ### What this PR does / why we need it? Add V1Engine LoRA support. Add LoRA e2e test on single card and multiple cards. ### Does this PR introduce _any_ user-facing change? support lora for V1 ### How was this patch tested? CI passed with new added test --------- Signed-off-by: jesse <szxfml@gmail.com> Signed-off-by: paulyu <paulyu0307@gmail.com> Signed-off-by: paulyu12 <507435917@qq.com> Co-authored-by: jesse <szxfml@gmail.com> Co-authored-by: paulyu <paulyu0307@gmail.com> 2025-05-22 19:20:51 +08:00			`import pytest`

[Test] Remove VLLM_USE_V1 in example and tests (#1733) V1 is enabled by default, no need to set it by hand now. This PR remove the useless setting in example and tests - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/9ad0a4588ba4e9c979cda0d178dec4fcdb89fd0c Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> 2025-07-15 12:49:57 +08:00			`from tests.e2e.conftest import VllmRunner`
[Lint]Style: Convert `test/` to ruff format(Batch #1) (#6738) ### What this PR does / why we need it? Scope of Changes: \| File Path \| \| :--- \| \| `tests/e2e/310p/multicard/test_vl_model_multicard.py` \| \| `tests/e2e/310p/singlecard/test_vl_model_singlecard.py` \| \| `tests/e2e/310p/test_utils.py` \| \| `tests/e2e/conftest.py` \| \| `tests/e2e/model_utils.py` \| \| `tests/e2e/models/conftest.py` \| \| `tests/e2e/models/test_lm_eval_correctness.py` \| \| `tests/e2e/multicard/2-cards/spec_decode/test_spec_decode.py` \| \| `tests/e2e/multicard/2-cards/test_aclgraph_capture_replay.py` \| \| `tests/e2e/multicard/2-cards/test_data_parallel.py` \| \| `tests/e2e/multicard/2-cards/test_disaggregated_encoder.py` \| \| `tests/e2e/multicard/2-cards/test_expert_parallel.py` \| \| `tests/e2e/multicard/2-cards/test_external_launcher.py` \| \| `tests/e2e/multicard/2-cards/test_full_graph_mode.py` \| \| `tests/e2e/multicard/2-cards/test_ilama_lora_tp2.py` \| \| `tests/e2e/multicard/2-cards/test_offline_inference_distributed.py` \| \| `tests/e2e/multicard/2-cards/test_offline_weight_load.py` \| \| `tests/e2e/multicard/2-cards/test_pipeline_parallel.py` \| \| `tests/e2e/multicard/2-cards/test_prefix_caching.py` \| \| `tests/e2e/multicard/2-cards/test_quantization.py` \| \| `tests/e2e/multicard/2-cards/test_qwen3_moe.py` \| \| `tests/e2e/multicard/2-cards/test_qwen3_moe_routing_replay.py` \| \| `tests/e2e/multicard/2-cards/test_qwen3_performance.py` \| \| `tests/e2e/multicard/2-cards/test_shared_expert_dp.py` \| \| `tests/e2e/multicard/2-cards/test_single_request_aclgraph.py` \| \| `tests/e2e/multicard/2-cards/test_sp_pass.py` \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007 Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> 2026-03-10 09:52:50 +08:00			`from tests.e2e.singlecard.test_ilama_lora import EXPECTED_LORA_OUTPUT, MODEL_PATH, do_sample`
[V1][LoRA][Test] V1 Engine LoRA support & e2e test (#893) ### What this PR does / why we need it? Add V1Engine LoRA support. Add LoRA e2e test on single card and multiple cards. ### Does this PR introduce _any_ user-facing change? support lora for V1 ### How was this patch tested? CI passed with new added test --------- Signed-off-by: jesse <szxfml@gmail.com> Signed-off-by: paulyu <paulyu0307@gmail.com> Signed-off-by: paulyu12 <507435917@qq.com> Co-authored-by: jesse <szxfml@gmail.com> Co-authored-by: paulyu <paulyu0307@gmail.com> 2025-05-22 19:20:51 +08:00

			`@pytest.mark.parametrize("distributed_executor_backend", ["mp"])`
			`def test_ilama_lora_tp2(distributed_executor_backend, ilama_lora_files):`
[CI] refect e2e ci test (#5246) ### What this PR does / why we need it? efect e2e ci test： 1. tests/e2e/singlecard/pooling/test_embedding.py: remove the eager parameter and rename test case 2. tests/e2e/singlecard/pooling/test_scoring.py: Rename test cases 3. tests/e2e/singlecard/pooling/test_classification.py: Rename test case 4. tests/e2e/singlecard/test_quantization.py: remove the eager parameter and chage model to vllm-ascend/Qwen2.5-0.6B-W8A8 and Rename test case 5. tests/e2e/multicard/test_shared_expert_dp.py: Rename test cases 6. tests/e2e/singlecard/test_sampler.py: Rename test cases 7. tests/e2e/singlecard/test_aclgraph_accuracy.py: Rename test cases 8. tests/e2e/multicard/test_offline_inference_distributed.py: Rename test cases and remove the eager parameter 9. tests/e2e/multicard/long_sequence/test_accuracy.py: Rename test cases and remove the eager parameter 10. tests/e2e/multicard/long_sequence/test_basic.py: Rename test cases and remove the eager parameter 11.tests/e2e/multicard/test_expert_parallel.py:remove the eager parameter 12.tests/e2e/multicard/test_full_graph_mode.py:remove the eager parameter 13.tests/e2e/multicard/test_ilama_lora_tp2.py:remove the eager parameter 14.tests/e2e/singlecard/spec_decode_v1/test_v1_mtp_correctness.py:remove the eager parameter 15.tests/e2e/singlecard/spec_decode_v1/test_v1_spec_decode.py:remove the eager parameter 16.tests/e2e/singlecard/test_aclgraph_accuracy.py:remove the eager parameter 17.tests/e2e/singlecard/test_camem.py:remove the eager parameter 18.tests/e2e/singlecard/test_ilama_lora.py:remove the eager parameter 19.tests/e2e/singlecard/test_multistream_overlap_shared_expert.py:remove the eager parameter 20.tests/e2e/singlecard/test_vlm.py:remove the eager parameter 21.tests/e2e/singlecard/test_xli:remove the eager parameter ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: hfadzxy <starmoon_zhang@163.com> 2025-12-23 18:42:35 +08:00			`with VllmRunner(`
[Lint]Style: Convert `test/` to ruff format(Batch #1) (#6738) ### What this PR does / why we need it? Scope of Changes: \| File Path \| \| :--- \| \| `tests/e2e/310p/multicard/test_vl_model_multicard.py` \| \| `tests/e2e/310p/singlecard/test_vl_model_singlecard.py` \| \| `tests/e2e/310p/test_utils.py` \| \| `tests/e2e/conftest.py` \| \| `tests/e2e/model_utils.py` \| \| `tests/e2e/models/conftest.py` \| \| `tests/e2e/models/test_lm_eval_correctness.py` \| \| `tests/e2e/multicard/2-cards/spec_decode/test_spec_decode.py` \| \| `tests/e2e/multicard/2-cards/test_aclgraph_capture_replay.py` \| \| `tests/e2e/multicard/2-cards/test_data_parallel.py` \| \| `tests/e2e/multicard/2-cards/test_disaggregated_encoder.py` \| \| `tests/e2e/multicard/2-cards/test_expert_parallel.py` \| \| `tests/e2e/multicard/2-cards/test_external_launcher.py` \| \| `tests/e2e/multicard/2-cards/test_full_graph_mode.py` \| \| `tests/e2e/multicard/2-cards/test_ilama_lora_tp2.py` \| \| `tests/e2e/multicard/2-cards/test_offline_inference_distributed.py` \| \| `tests/e2e/multicard/2-cards/test_offline_weight_load.py` \| \| `tests/e2e/multicard/2-cards/test_pipeline_parallel.py` \| \| `tests/e2e/multicard/2-cards/test_prefix_caching.py` \| \| `tests/e2e/multicard/2-cards/test_quantization.py` \| \| `tests/e2e/multicard/2-cards/test_qwen3_moe.py` \| \| `tests/e2e/multicard/2-cards/test_qwen3_moe_routing_replay.py` \| \| `tests/e2e/multicard/2-cards/test_qwen3_performance.py` \| \| `tests/e2e/multicard/2-cards/test_shared_expert_dp.py` \| \| `tests/e2e/multicard/2-cards/test_single_request_aclgraph.py` \| \| `tests/e2e/multicard/2-cards/test_sp_pass.py` \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007 Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> 2026-03-10 09:52:50 +08:00			`MODEL_PATH,`
			`enable_lora=True,`
			`max_loras=4,`
			`dtype="half",`
			`max_model_len=1024,`
			`max_num_seqs=16,`
			`tensor_parallel_size=2,`
			`cudagraph_capture_sizes=[1, 2, 4, 8],`
			`distributed_executor_backend=distributed_executor_backend,`
			`enforce_eager=True,`
[CI] refect e2e ci test (#5246) ### What this PR does / why we need it? efect e2e ci test： 1. tests/e2e/singlecard/pooling/test_embedding.py: remove the eager parameter and rename test case 2. tests/e2e/singlecard/pooling/test_scoring.py: Rename test cases 3. tests/e2e/singlecard/pooling/test_classification.py: Rename test case 4. tests/e2e/singlecard/test_quantization.py: remove the eager parameter and chage model to vllm-ascend/Qwen2.5-0.6B-W8A8 and Rename test case 5. tests/e2e/multicard/test_shared_expert_dp.py: Rename test cases 6. tests/e2e/singlecard/test_sampler.py: Rename test cases 7. tests/e2e/singlecard/test_aclgraph_accuracy.py: Rename test cases 8. tests/e2e/multicard/test_offline_inference_distributed.py: Rename test cases and remove the eager parameter 9. tests/e2e/multicard/long_sequence/test_accuracy.py: Rename test cases and remove the eager parameter 10. tests/e2e/multicard/long_sequence/test_basic.py: Rename test cases and remove the eager parameter 11.tests/e2e/multicard/test_expert_parallel.py:remove the eager parameter 12.tests/e2e/multicard/test_full_graph_mode.py:remove the eager parameter 13.tests/e2e/multicard/test_ilama_lora_tp2.py:remove the eager parameter 14.tests/e2e/singlecard/spec_decode_v1/test_v1_mtp_correctness.py:remove the eager parameter 15.tests/e2e/singlecard/spec_decode_v1/test_v1_spec_decode.py:remove the eager parameter 16.tests/e2e/singlecard/test_aclgraph_accuracy.py:remove the eager parameter 17.tests/e2e/singlecard/test_camem.py:remove the eager parameter 18.tests/e2e/singlecard/test_ilama_lora.py:remove the eager parameter 19.tests/e2e/singlecard/test_multistream_overlap_shared_expert.py:remove the eager parameter 20.tests/e2e/singlecard/test_vlm.py:remove the eager parameter 21.tests/e2e/singlecard/test_xli:remove the eager parameter ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: hfadzxy <starmoon_zhang@163.com> 2025-12-23 18:42:35 +08:00			`) as vllm_model:`
[V1][LoRA][Test] V1 Engine LoRA support & e2e test (#893) ### What this PR does / why we need it? Add V1Engine LoRA support. Add LoRA e2e test on single card and multiple cards. ### Does this PR introduce _any_ user-facing change? support lora for V1 ### How was this patch tested? CI passed with new added test --------- Signed-off-by: jesse <szxfml@gmail.com> Signed-off-by: paulyu <paulyu0307@gmail.com> Signed-off-by: paulyu12 <507435917@qq.com> Co-authored-by: jesse <szxfml@gmail.com> Co-authored-by: paulyu <paulyu0307@gmail.com> 2025-05-22 19:20:51 +08:00			`output = do_sample(vllm_model.model, ilama_lora_files, lora_id=2)`

			`for i in range(len(EXPECTED_LORA_OUTPUT)):`
			`assert output[i] == EXPECTED_LORA_OUTPUT[i]`