xc-llm-ascend

Author	SHA1	Message	Date
SILONG ZENG	62ea664aa7	[Lint]Style: Convert `test/` to ruff format(Batch #5 ) (#6747 ) ### What this PR does / why we need it? \| File Path \| \| :--- \| \| `tests/e2e/singlecard/compile/backend.py` \| \| `tests/e2e/singlecard/compile/test_graphex_norm_quant_fusion.py` \| \| `tests/e2e/singlecard/compile/test_graphex_qknorm_rope_fusion.py` \| \| `tests/e2e/singlecard/compile/test_norm_quant_fusion.py` \| \| `tests/e2e/singlecard/model_runner_v2/test_basic.py` \| \| `tests/e2e/singlecard/test_aclgraph_accuracy.py` \| \| `tests/e2e/singlecard/test_aclgraph_batch_invariant.py` \| \| `tests/e2e/singlecard/test_aclgraph_mem.py` \| \| `tests/e2e/singlecard/test_async_scheduling.py` \| \| `tests/e2e/singlecard/test_auto_fit_max_mode_len.py` \| \| `tests/e2e/singlecard/test_batch_invariant.py` \| \| `tests/e2e/singlecard/test_camem.py` \| \| `tests/e2e/singlecard/test_completion_with_prompt_embeds.py` \| \| `tests/e2e/singlecard/test_cpu_offloading.py` \| \| `tests/e2e/singlecard/test_guided_decoding.py` \| \| `tests/e2e/singlecard/test_ilama_lora.py` \| \| `tests/e2e/singlecard/test_llama32_lora.py` \| \| `tests/e2e/singlecard/test_models.py` \| \| `tests/e2e/singlecard/test_multistream_overlap_shared_expert.py` \| \| `tests/e2e/singlecard/test_quantization.py` \| \| `tests/e2e/singlecard/test_qwen3_multi_loras.py` \| \| `tests/e2e/singlecard/test_sampler.py` \| \| `tests/e2e/singlecard/test_vlm.py` \| \| `tests/e2e/singlecard/test_xlite.py` \| \| `tests/e2e/singlecard/utils.py` \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `9562912cea` --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2026-02-24 15:50:00 +08:00
zhangyiming	45c5bcd962	[E2E] Optimize the E2E test time. (#5294 ) ### What this PR does / why we need it? Add cudagraph_capture_sizes for E2E CI test. - vLLM version: release/v0.13.0 - vLLM main: `ad32e3e19c` Signed-off-by: menogrey <1299267905@qq.com>	2025-12-26 14:17:50 +08:00
wangxiyuan	29d2fe653d	cleanup ascend config (#5296 ) 1. refresh additional config doc 2. move kv config logic to platform. 3. improve `dump_config` init logic and rename it to `dump_config_path` this change is user impacted. dump_config is changed from dict to string. 4. correct `enable_async_exponential` type 5. remove useless `chunked_prefill_for_mla` - vLLM version: release/v0.13.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-26 14:07:37 +08:00
zhangxinyuehfad	8ae7fca947	[CI] refect e2e ci test (#5246 ) ### What this PR does / why we need it? efect e2e ci test： 1. tests/e2e/singlecard/pooling/test_embedding.py: remove the eager parameter and rename test case 2. tests/e2e/singlecard/pooling/test_scoring.py: Rename test cases 3. tests/e2e/singlecard/pooling/test_classification.py: Rename test case 4. tests/e2e/singlecard/test_quantization.py: remove the eager parameter and chage model to vllm-ascend/Qwen2.5-0.6B-W8A8 and Rename test case 5. tests/e2e/multicard/test_shared_expert_dp.py: Rename test cases 6. tests/e2e/singlecard/test_sampler.py: Rename test cases 7. tests/e2e/singlecard/test_aclgraph_accuracy.py: Rename test cases 8. tests/e2e/multicard/test_offline_inference_distributed.py: Rename test cases and remove the eager parameter 9. tests/e2e/multicard/long_sequence/test_accuracy.py: Rename test cases and remove the eager parameter 10. tests/e2e/multicard/long_sequence/test_basic.py: Rename test cases and remove the eager parameter 11.tests/e2e/multicard/test_expert_parallel.py:remove the eager parameter 12.tests/e2e/multicard/test_full_graph_mode.py:remove the eager parameter 13.tests/e2e/multicard/test_ilama_lora_tp2.py:remove the eager parameter 14.tests/e2e/singlecard/spec_decode_v1/test_v1_mtp_correctness.py:remove the eager parameter 15.tests/e2e/singlecard/spec_decode_v1/test_v1_spec_decode.py:remove the eager parameter 16.tests/e2e/singlecard/test_aclgraph_accuracy.py:remove the eager parameter 17.tests/e2e/singlecard/test_camem.py:remove the eager parameter 18.tests/e2e/singlecard/test_ilama_lora.py:remove the eager parameter 19.tests/e2e/singlecard/test_multistream_overlap_shared_expert.py:remove the eager parameter 20.tests/e2e/singlecard/test_vlm.py:remove the eager parameter 21.tests/e2e/singlecard/test_xli:remove the eager parameter ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: `ad32e3e19c` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-12-23 18:42:35 +08:00
YuhanBai	5d02eed16f	[Performance] Add async exponential while model executing (#4501 ) ### What this PR does / why we need it? Add a control to enable the exponential distribution operator overlapping with model executing (default is OFF due to this feature might not perform well on MOE models, i.e. For Qwen3-30B). Enable async exponential overlapping will provides performance improvement. Also, overlapping the exponential operator with module execution can cover the performance drop introduced by AICPU-version's exponential operator. UPDATE: (12/12) Now our overlap will use the same stream that introduced in this pr: #4908 . We move the `do_async_exponential` from `model_runner_v1.py` to `sampler.py`. Now we are using `additional_config` to enable async exponential: Add `"enable_async_exponential": 1` in `addition_config`. Now we ONLY support default exponential/AI-CPU exponential, the old `"enable_async_exponential": 2` option has been aborted to keep consistency. ### Does this PR introduce _any_ user-facing change? YES, added a new `additional_config` : `"enable_async_exponential": 1`. When `enable_async_exponential` is set to 1, we enable the async exponential and overlap with model runner. When `enable_async_exponential` is set to 0 (default is 0), we disable the async exponential, but exponential will still running on a different stream using stream introduced in #4908. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: YuhanBai <yuhan.bai0830@gmail.com> Signed-off-by: YuhanBai yuhan.bai0830@gmail.com	2025-12-20 21:23:21 +08:00
wangxiyuan	fef18b60bc	Refactor e2e CI (#2276 ) Refactor E2E CI to make it clear and faster 1. remove some uesless e2e test 2. remove some uesless function 3. Make sure all test runs with VLLMRunner to avoid oom error 4. Make sure all ops test end with torch.empty_cache to avoid oom error 5. run the test one by one to avoid resource limit error - vLLM version: v0.10.1.1 - vLLM main: `a344a5aa0a` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-02 09:02:22 +08:00
Pr0Wh1teGivee	d13fb0766e	[Perf] add patch to optimize apply_topk_topp (#1732 ) ### What this PR does / why we need it? Performance optimization for apply_top_k_top_p ### Does this PR introduce _any_ user-facing change? Use VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATION to enable this feature ### How was this patch tested? e2e & ut - vLLM version: v0.9.2 - vLLM main: `6a9e6b2abf` Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>	2025-07-11 15:32:02 +08:00
wangxiyuan	830332ebfc	Clean up v0.9.1 code (#1672 ) vllm has released 0.9.2. This PR drop 0.9.1 support. - vLLM version: v0.9.1 - vLLM main: `b942c094e3` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-09 08:52:24 +08:00
wangxiyuan	a45dfde283	[CI] Fix FusedMoEConfig and input batch failure to recover CI (#1602 ) Make CI happy 1. `c1909e7e8c` changed moeConfig init way 2. `48fb076cbc` changed input batch logic. This PR address these change to vllm-ascend. Closes: https://github.com/vllm-project/vllm-ascend/issues/1600 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-03 18:36:17 +08:00
wangxiyuan	69b817ed65	[CI] Add unit test framework (#1201 ) This PR added the unit test framework to enable ut for vLLM Ascend. Unit test runs on CPU machines. It'll be ran once lint check is passed the same as e2e test. For unit test, this PR created a new folder called `ut` under `tests` module. All the test file in `ut` should keep the same with the code in `vllm-ascend`. The file name should be start with `test_` prefix. For example, in this PR. the `test_ascend_config.py` is added for `ascend_config.py` test. A new fille `worker/test_worker_v1.py` is also added as the placeholder. This file should be the unit test for `vllm-ascend/worker/worker_v1.py`. Additional, a new `fake_weight` folder is added, it contains the config.json from `facebook/opt-125m`, so that the test will not always visit huggingface. TODO: We should add all the unit test file one by one in the future. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-06-16 18:32:28 +08:00

10 Commits