xc-llm-ascend

Author	SHA1	Message	Date
yupeng	29f195a91c	[Bugfix][LoRA] Fix the bug when runs Qwen3-Reranker-0.6B with LoRA. (#7156 ) ### What this PR does / why we need it? Fix the error that reports while initializing qwen3-reranker-0.6b model with `--enable-lora`. And add a testcase to verify the fix. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: paulyu12 <507435917@qq.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2026-03-15 17:55:42 +08:00
yupeng	830f39dd70	[Bugfix][LoRA] Fix the issue when enable LoRA + tp + fully_sharded_loras (#6650 ) ### What this PR does / why we need it? Fix the issue #6143 . ### Does this PR introduce _any_ user-facing change? Allow to start the server with "--enable-lora && --fully-sharded-loras && --tensor_parallel_size 2". ### How was this patch tested? pytest -sv tests/e2e/multicard/2-cards/test_llama32_lora_tp2.py - vLLM version: v0.15.0 - vLLM main: `d7e17aaacd` --------- Signed-off-by: paulyu12 <507435917@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-03-11 15:43:15 +08:00
SILONG ZENG	6ccccad102	[Lint]Style: Convert `vllm-ascend/` to ruff format(Batch #5 ) (#5996 ) ### What this PR does / why we need it? Scope of Changes: \| File Path \| \| :--- \| \| `.../distributed/kv_transfer/kv_pool/ascend_store/ascend_store_connector.py` \| \| `vllm_ascend/distributed/kv_transfer/kv_pool/ascend_store/backend/backend.py` \| \| ` .../distributed/kv_transfer/kv_pool/ascend_store/backend/memcache_backend.py` \| \| ` .../distributed/kv_transfer/kv_pool/ascend_store/backend/mooncake_backend.py` \| \| ` vllm_ascend/distributed/kv_transfer/kv_pool/ascend_store/config_data.py` \| \| ` vllm_ascend/distributed/kv_transfer/kv_pool/ascend_store/kv_transfer.py` \| \| ` vllm_ascend/distributed/kv_transfer/kv_pool/ascend_store/pool_scheduler.py` \| \| ` vllm_ascend/distributed/kv_transfer/kv_pool/ascend_store/pool_worker.py` \| \| ` .../distributed/kv_transfer/kv_pool/cpu_offload/cpu_kv_cache_manager.py` \| \| ` .../distributed/kv_transfer/kv_pool/cpu_offload/cpu_offload_connector.py` \| \| ` vllm_ascend/distributed/kv_transfer/kv_pool/cpu_offload/metadata.py` \| \| ` vllm_ascend/distributed/kv_transfer/kv_pool/ucm_connector.py` \| \| ` vllm_ascend/distributed/kv_transfer/utils/mooncake_transfer_engine.py` \| \| ` vllm_ascend/distributed/kv_transfer/utils/utils.py` \| \| ` vllm_ascend/kv_offload/cpu_npu.py` \| \| ` vllm_ascend/kv_offload/npu.py` \| \| ` vllm_ascend/lora/lora_ops.py` \| \| ` vllm_ascend/lora/punica_npu.py` \| \| ` vllm_ascend/lora/utils.py` \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `2c24bc6996` --------- Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: SILONG ZENG <2609716663@qq.com>	2026-01-24 22:45:38 +08:00
yupeng	9caf6fbaf5	[Bugfix][LoRA] Fix LoRA bug after supporting Qwen3-Next (#3044 ) ### What this PR does / why we need it? LoRA e2e test uses ilama-3.2-1B model. It uses transformers.py model files. Its self-attention layer names end with "\.attn", not "\.self_attn". There are some other model attention layer names end with "*.attn", such as baichuan.py, bert.py. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? pytest -sv tests/e2e/singlecard/test_ilama_lora.py pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py - vLLM version: v0.10.2 - vLLM main: `17b4c6685c` --------- Signed-off-by: paulyu12 <507435917@qq.com>	2025-09-26 11:12:45 +08:00
wangxiyuan	7d6d9449a8	[Misc] Move lora patch file into lora module (#2797 ) Cleanup useless file in patch module. Update the lora support list is OK in vLLM Ascend, no need to patch vLLM - vLLM version: v0.10.1.1 - vLLM main: `f4962a6d55` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-08 21:42:12 +08:00

5 Commits