xc-llm-ascend

Author	SHA1	Message	Date
yupeng	40f7d93f1a	[bugfix][LoRA] Fix the lora accuracy issue introduced by the upstream vLLM changed. (#6958 ) ### What this PR does / why we need it? Fix the LoRA e2e test accuracy issue that introduced by the upstream PR https://github.com/vllm-project/vllm/pull/32005 ### How was this patch tested? pytest -sv tests/e2e/singlecard/test_llama32_lora.py - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` --------- Signed-off-by: paulyu12 <507435917@qq.com> Signed-off-by: yupeng <507435917@qq.com>	2026-03-10 10:43:18 +08:00
SILONG ZENG	62ea664aa7	[Lint]Style: Convert `test/` to ruff format(Batch #5 ) (#6747 ) ### What this PR does / why we need it? \| File Path \| \| :--- \| \| `tests/e2e/singlecard/compile/backend.py` \| \| `tests/e2e/singlecard/compile/test_graphex_norm_quant_fusion.py` \| \| `tests/e2e/singlecard/compile/test_graphex_qknorm_rope_fusion.py` \| \| `tests/e2e/singlecard/compile/test_norm_quant_fusion.py` \| \| `tests/e2e/singlecard/model_runner_v2/test_basic.py` \| \| `tests/e2e/singlecard/test_aclgraph_accuracy.py` \| \| `tests/e2e/singlecard/test_aclgraph_batch_invariant.py` \| \| `tests/e2e/singlecard/test_aclgraph_mem.py` \| \| `tests/e2e/singlecard/test_async_scheduling.py` \| \| `tests/e2e/singlecard/test_auto_fit_max_mode_len.py` \| \| `tests/e2e/singlecard/test_batch_invariant.py` \| \| `tests/e2e/singlecard/test_camem.py` \| \| `tests/e2e/singlecard/test_completion_with_prompt_embeds.py` \| \| `tests/e2e/singlecard/test_cpu_offloading.py` \| \| `tests/e2e/singlecard/test_guided_decoding.py` \| \| `tests/e2e/singlecard/test_ilama_lora.py` \| \| `tests/e2e/singlecard/test_llama32_lora.py` \| \| `tests/e2e/singlecard/test_models.py` \| \| `tests/e2e/singlecard/test_multistream_overlap_shared_expert.py` \| \| `tests/e2e/singlecard/test_quantization.py` \| \| `tests/e2e/singlecard/test_qwen3_multi_loras.py` \| \| `tests/e2e/singlecard/test_sampler.py` \| \| `tests/e2e/singlecard/test_vlm.py` \| \| `tests/e2e/singlecard/test_xlite.py` \| \| `tests/e2e/singlecard/utils.py` \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `9562912cea` --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2026-02-24 15:50:00 +08:00
wangxiyuan	2a826b5fad	[Misc] upgrade to vllm main (#6646 ) ### What this PR does / why we need it? This PR upgrades the core vLLM dependency to a newer version from the main branch (`13397841ab469cecf1ed425c3f52a9ffc38139b5`). This is necessary to keep our project up-to-date with the latest features and fixes from upstream vLLM. 1. `ac32e66cf9` pass file is moved. - vLLM version: v0.15.0 - vLLM main: `d7e17aaacd` --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wxsIcey <1790571317@qq.com>	2026-02-10 14:08:59 +08:00
yupeng	8d44ddacb0	[Test][LoRA] Add e2e test for base model inference (#6624 ) ### What this PR does / why we need it? This PR adds an end-to-end test case to verify the correctness of base model inference when LoRA is enabled. This is to ensure that after a LoRA base model request issue was fixed, the functionality remains correct and does not regress. The new test case calls `do_sample` with `lora_id=0` to target the base model and asserts the output against expected SQL queries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with the new test case. The test can be run with: ```bash pytest -sv tests/e2e/singlecard/test_llama32_lora.py Signed-off-by: paulyu12 <507435917@qq.com>	2026-02-09 21:06:49 +08:00
Li Wang	ca297eb57f	[CI] Migrate e2e test runner to hk (#5344 ) ### What this PR does / why we need it? This patch add new runner labels for the HK region, and e2e single-card testing has been migrated to this runner. - vLLM version: release/v0.13.0 - vLLM main: `bc0a5a0c08` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-01-26 09:00:51 +08:00
yupeng	5b95c6b03a	[Test][e2e][LoRA] Add more e2e tests to cover scenarios of LoRA (#4075 ) ### What this PR does / why we need it? This PR depends on PR https://github.com/vllm-project/vllm-ascend/pull/4046. And only if the latter merged, it will work. This PR aims to solve the issue https://github.com/vllm-project/vllm-ascend/issues/3240. The new-added Llama-2-7b-hf and Qwen3-0.6B testcases will cover the senarios that the LoRA weights are added to q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj, embed_tokens and lm_head modules. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? pytest -sv tests/e2e/singlecard/test_llama2_lora.py pytest -sv tests/e2e/singlecard/test_qwen3_multi_loras.py - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: paulyu12 <507435917@qq.com>	2026-01-13 16:32:28 +08:00

6 Commits