Yineng Zhang
|
033c715b46
|
cleanup models dependencies 1/n (#2948)
|
2025-01-17 23:46:48 +08:00 |
|
Yineng Zhang
|
85e1a6f3aa
|
Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-02 23:22:13 +08:00 |
|
Xuehai Pan
|
62a4a339eb
|
docs: fix module docstrings and copyright headers (#2077)
|
2024-11-22 22:16:53 +08:00 |
|
Yineng Zhang
|
8bee20f80b
|
Update vllm to 0.6.3 (#1711) (#1720)
Co-authored-by: Ke Bao <ISPObaoke@163.com>
|
2024-10-19 20:45:41 -07:00 |
|
Lianmin Zheng
|
6a5b352aaf
|
Use is_flashinfer_available to replace is_hip for flashinfer check (#1596)
Co-authored-by: Zhang Liangang <liangang.zhang@intel.com>
|
2024-10-06 22:54:05 -07:00 |
|
Ying Sheng
|
9c064bf78a
|
[LoRA, Performance] Speedup multi-LoRA serving - Step 1 (#1587)
|
2024-10-06 10:33:44 -07:00 |
|
Minsang Song
|
e6852b0dd2
|
[Fix] Fix AttributeError in Qwen2.5 LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_hidden_dim' (#1536)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-10-02 20:41:15 -07:00 |
|
Ying Sheng
|
0f4fb19bc8
|
[Fix, LoRA] fix LoRA with updates in main (#1545)
|
2024-09-30 10:06:08 -07:00 |
|
Lianmin Zheng
|
36d5acfca5
|
Rename InputMetadata -> ForwardBatch (#1543)
|
2024-09-30 02:41:11 -07:00 |
|
Lianmin Zheng
|
3f0fe08d37
|
Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541)
|
2024-09-29 20:28:45 -07:00 |
|
HAI
|
3a6e04185b
|
[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1420)
|
2024-09-17 07:43:52 +00:00 |
|
Ying Sheng
|
37963394aa
|
[Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433)
|
2024-09-15 12:46:04 -07:00 |
|
Ying Sheng
|
712216928f
|
[Feature] Initial support for multi-LoRA serving (#1307)
|
2024-09-12 16:46:14 -07:00 |
|