xc-llm-ascend

Files

paulyu12 697908f5cd [Platform][Worker][ModelRunner] Add LoRA & Multi-LoRA support (#521 )

### What this PR does / why we need it?
According to this RFC [[RFC]: Join the MultiLora and MultiLora Dynammic
Serving feature develop
#396](https://github.com/vllm-project/vllm-ascend/issues/396) and this
[vLLM Ascend Roadmap Q2 2025
#448](https://github.com/vllm-project/vllm-ascend/issues/448), we pull
request relavant code to support (1) Multi-LoRA and (2) Multi-LoRA
Dynamic Serving.

LoRA reference is here: [LoRA
reference](https://docs.vllm.ai/en/latest/features/lora.html)

### Does this PR introduce _any_ user-facing change?

Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

### How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

---------

Signed-off-by: paulyu <paulyu0307@gmail.com>
Co-authored-by: paulyu <paulyu0307@gmail.com>

2025-04-17 16:48:46 +08:00

__init__.py

[Core] Support pooling (#229 )

2025-03-04 15:59:34 +08:00

model_runner_v1.py

[CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460 )

2025-04-17 14:59:56 +08:00

model_runner.py

[Platform][Worker][ModelRunner] Add LoRA & Multi-LoRA support (#521 )

2025-04-17 16:48:46 +08:00

multi_step_runner.py

[MISC] fix logger (#515 )