Commit Graph

62 Commits

Author SHA1 Message Date
fzyzcjy
39efad4fbc Tiny disable model that does not work (#5175) 2025-04-08 18:42:37 -07:00
Baizhou Zhang
42873eac09 [Fix] Improve Lora tests and reduce CI runtime (#4925) 2025-03-30 19:40:14 -07:00
Lianmin Zheng
4ede6770cd Fix retract for page size > 1 (#4914) 2025-03-30 02:57:15 -07:00
chaobo jia
ef9a378a20 [Feature] add multi-rank support for Lora (#4492)
Co-authored-by: rudy152 <czh1137892874@gmail.com>
2025-03-28 09:38:44 -07:00
Qiaolin Yu
9fdc6d6abc Fix the lora adapter when lora path is none (#4799)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
2025-03-27 21:03:08 -07:00
Pan Lyu
c913ed4046 support clip embedding model (#4506) 2025-03-27 00:18:15 -07:00
fzyzcjy
15ddd84322 Add retry for flaky tests in CI (#4755) 2025-03-25 16:53:12 -07:00
Ximingwang-09
22c3702e1e [Model] Support Qwen2ForSequenceClassification (#4609)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
2025-03-24 19:13:44 -07:00
aoshen524
588865f0e0 [Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-03-18 20:33:07 -07:00
Lianmin Zheng
f0afaf5289 Add a dummy grok test case (#4399) 2025-03-13 15:29:48 -07:00
Lianmin Zheng
e8a69e4d0c Clean up fp8 support (#4230) 2025-03-09 21:46:35 -07:00
Pan Lyu
361971b859 Add Support for Qwen2-VL Multi-modal Embedding Models (#3694) 2025-03-06 16:46:20 -08:00
Lianmin Zheng
77a3954bf7 Simplify eagle tests and TP sync in grammar backend (#4066) 2025-03-04 13:40:40 -08:00
fzyzcjy
e3e0bc50a9 [Feature] SPMD for SGLang + Verl (#3852) 2025-02-28 09:53:10 -08:00
Lianmin Zheng
27a46317b6 Fix dependency (#3813) 2025-02-24 03:50:58 -08:00
aoshen524
e79f7420be [Fix] Fix bugs and refactor codes in lora for better scalability. (#3652)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-20 11:51:57 -08:00
Baizhou Zhang
c45cab1c00 [Fix] Fix accuracy bug and refactor codes for lora (#3413) 2025-02-10 13:29:00 +08:00
Baizhou Zhang
70817a7eae [Feature] Define backends and add Triton backend for Lora (#3161)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2025-02-03 22:09:13 -08:00
Lianmin Zheng
a4331cd260 Add accuracy and latency tests of eagle into CI (#3027) 2025-01-21 02:55:14 -08:00
Lianmin Zheng
61f42b5732 Move sgl.Runtime under sglang/lang (#2990) 2025-01-19 17:10:29 -08:00
Ke Bao
d47c5101f1 Add ut for qwen model (#2947) 2025-01-18 00:03:54 +08:00
Fred Reiss
993956c6b1 Add support for IBM Granite 3.x models (#2437) 2024-12-11 06:30:23 -08:00
Jani Monoses
db674e3d24 Add OLMo2 model. (#2233) 2024-11-28 00:15:20 -08:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
James Xu
f6f713797b Add support for Qwen2-VL-based embedding models (#2055) 2024-11-21 14:24:25 -08:00
Tanjiro
8c280cee55 add phi-3 small support (#2062)
Co-authored-by: Tushar Goel <114812108+AI-Tushar@users.noreply.github.com>
2024-11-17 18:47:43 -08:00
Xiaoyu Zhang
eff468dd5a fix test_embedding_models prompt length too long's bug (#2015) 2024-11-12 23:21:16 +08:00
Chayenne
c77c1e05ba fix black in pre-commit (#1940) 2024-11-08 07:42:47 +08:00
Xuehai Pan
a5e0defb5a minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926) 2024-11-06 13:46:04 +00:00
Chayenne
704f8e8ed1 Add Reward API Docs etc (#1910)
Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>
2024-11-03 22:33:03 -08:00
Lianmin Zheng
2ce32db6fb Let reward model take text inputs instead of message lists (#1907)
Co-authored-by: Kyle Corbitt <kyle@corbt.com>
2024-11-03 13:27:12 -08:00
DanielC12321
5e00ddebc0 Add new model: Gpt2 (#1833) 2024-10-29 17:52:33 -07:00
Lianmin Zheng
00611286a1 Fix sliding window attention and gemma-2 unit tests in CI (#1746) 2024-10-21 13:47:12 -07:00
sixgod
45d5af2416 Add GLM-4 TextGeneration Model support for SGLang (#1736) 2024-10-21 04:08:30 +00:00
Lianmin Zheng
7feba41584 Fix failed ci tests on long prompts; Better error messages for embedding models (#1700) 2024-10-17 09:23:29 -07:00
Lianmin Zheng
30ee36305e Fix the failed unit tests (#1699) 2024-10-17 08:13:29 -07:00
Jani Monoses
a5114b6f91 Add OLMo model (#1676) 2024-10-16 00:11:18 -07:00
Lianmin Zheng
aba9eae4c6 Fix the correctness test in bench_latency.py when tp > 1 and test_generation_models.py (#1631) 2024-10-11 05:03:20 -07:00
Minsang Song
e6852b0dd2 [Fix] Fix AttributeError in Qwen2.5 LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_hidden_dim' (#1536)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-10-02 20:41:15 -07:00
Theresa Barton
2c7d0a5b8b [Fix] Fix all the Huggingface paths (#1553) 2024-10-02 10:12:07 -07:00
Ying Sheng
0f4fb19bc8 [Fix, LoRA] fix LoRA with updates in main (#1545) 2024-09-30 10:06:08 -07:00
Lianmin Zheng
3f0fe08d37 Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541) 2024-09-29 20:28:45 -07:00
Ying Sheng
9aa6553d2a [Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525) 2024-09-27 23:32:11 -07:00
TianyiQ
3c93187caf Add support for tie_word_embeddings when loading weights + support for SmolLM (#1508) 2024-09-24 21:50:20 -07:00
Lianmin Zheng
fb2d0680e0 [Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510) 2024-09-24 21:37:33 -07:00
Lianmin Zheng
167591e864 Better unit tests for adding a new model (#1488) 2024-09-22 01:50:37 -07:00
Ying Sheng
712216928f [Feature] Initial support for multi-LoRA serving (#1307) 2024-09-12 16:46:14 -07:00
Ying Sheng
689ff588ec [CI] Return output logprobs in unit test (#1361) 2024-09-09 13:05:13 -07:00
Yineng Zhang
c411f32e1c feat: replace GeluAndMul (#1234) 2024-08-28 14:07:02 +00:00
Yineng Zhang
66975360e7 fix: increase max_new_tokens when testing generation models (#1244) 2024-08-28 22:12:36 +10:00