fzyzcjy
|
39efad4fbc
|
Tiny disable model that does not work (#5175)
|
2025-04-08 18:42:37 -07:00 |
|
Baizhou Zhang
|
42873eac09
|
[Fix] Improve Lora tests and reduce CI runtime (#4925)
|
2025-03-30 19:40:14 -07:00 |
|
Lianmin Zheng
|
4ede6770cd
|
Fix retract for page size > 1 (#4914)
|
2025-03-30 02:57:15 -07:00 |
|
chaobo jia
|
ef9a378a20
|
[Feature] add multi-rank support for Lora (#4492)
Co-authored-by: rudy152 <czh1137892874@gmail.com>
|
2025-03-28 09:38:44 -07:00 |
|
Qiaolin Yu
|
9fdc6d6abc
|
Fix the lora adapter when lora path is none (#4799)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
|
2025-03-27 21:03:08 -07:00 |
|
Pan Lyu
|
c913ed4046
|
support clip embedding model (#4506)
|
2025-03-27 00:18:15 -07:00 |
|
fzyzcjy
|
15ddd84322
|
Add retry for flaky tests in CI (#4755)
|
2025-03-25 16:53:12 -07:00 |
|
Ximingwang-09
|
22c3702e1e
|
[Model] Support Qwen2ForSequenceClassification (#4609)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-03-24 19:13:44 -07:00 |
|
aoshen524
|
588865f0e0
|
[Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-03-18 20:33:07 -07:00 |
|
Lianmin Zheng
|
f0afaf5289
|
Add a dummy grok test case (#4399)
|
2025-03-13 15:29:48 -07:00 |
|
Lianmin Zheng
|
e8a69e4d0c
|
Clean up fp8 support (#4230)
|
2025-03-09 21:46:35 -07:00 |
|
Pan Lyu
|
361971b859
|
Add Support for Qwen2-VL Multi-modal Embedding Models (#3694)
|
2025-03-06 16:46:20 -08:00 |
|
Lianmin Zheng
|
77a3954bf7
|
Simplify eagle tests and TP sync in grammar backend (#4066)
|
2025-03-04 13:40:40 -08:00 |
|
fzyzcjy
|
e3e0bc50a9
|
[Feature] SPMD for SGLang + Verl (#3852)
|
2025-02-28 09:53:10 -08:00 |
|
Lianmin Zheng
|
27a46317b6
|
Fix dependency (#3813)
|
2025-02-24 03:50:58 -08:00 |
|
aoshen524
|
e79f7420be
|
[Fix] Fix bugs and refactor codes in lora for better scalability. (#3652)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-20 11:51:57 -08:00 |
|
Baizhou Zhang
|
c45cab1c00
|
[Fix] Fix accuracy bug and refactor codes for lora (#3413)
|
2025-02-10 13:29:00 +08:00 |
|
Baizhou Zhang
|
70817a7eae
|
[Feature] Define backends and add Triton backend for Lora (#3161)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2025-02-03 22:09:13 -08:00 |
|
Lianmin Zheng
|
a4331cd260
|
Add accuracy and latency tests of eagle into CI (#3027)
|
2025-01-21 02:55:14 -08:00 |
|
Lianmin Zheng
|
61f42b5732
|
Move sgl.Runtime under sglang/lang (#2990)
|
2025-01-19 17:10:29 -08:00 |
|
Ke Bao
|
d47c5101f1
|
Add ut for qwen model (#2947)
|
2025-01-18 00:03:54 +08:00 |
|
Fred Reiss
|
993956c6b1
|
Add support for IBM Granite 3.x models (#2437)
|
2024-12-11 06:30:23 -08:00 |
|
Jani Monoses
|
db674e3d24
|
Add OLMo2 model. (#2233)
|
2024-11-28 00:15:20 -08:00 |
|
Xuehai Pan
|
62a4a339eb
|
docs: fix module docstrings and copyright headers (#2077)
|
2024-11-22 22:16:53 +08:00 |
|
James Xu
|
f6f713797b
|
Add support for Qwen2-VL-based embedding models (#2055)
|
2024-11-21 14:24:25 -08:00 |
|
Tanjiro
|
8c280cee55
|
add phi-3 small support (#2062)
Co-authored-by: Tushar Goel <114812108+AI-Tushar@users.noreply.github.com>
|
2024-11-17 18:47:43 -08:00 |
|
Xiaoyu Zhang
|
eff468dd5a
|
fix test_embedding_models prompt length too long's bug (#2015)
|
2024-11-12 23:21:16 +08:00 |
|
Chayenne
|
c77c1e05ba
|
fix black in pre-commit (#1940)
|
2024-11-08 07:42:47 +08:00 |
|
Xuehai Pan
|
a5e0defb5a
|
minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926)
|
2024-11-06 13:46:04 +00:00 |
|
Chayenne
|
704f8e8ed1
|
Add Reward API Docs etc (#1910)
Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>
|
2024-11-03 22:33:03 -08:00 |
|
Lianmin Zheng
|
2ce32db6fb
|
Let reward model take text inputs instead of message lists (#1907)
Co-authored-by: Kyle Corbitt <kyle@corbt.com>
|
2024-11-03 13:27:12 -08:00 |
|
DanielC12321
|
5e00ddebc0
|
Add new model: Gpt2 (#1833)
|
2024-10-29 17:52:33 -07:00 |
|
Lianmin Zheng
|
00611286a1
|
Fix sliding window attention and gemma-2 unit tests in CI (#1746)
|
2024-10-21 13:47:12 -07:00 |
|
sixgod
|
45d5af2416
|
Add GLM-4 TextGeneration Model support for SGLang (#1736)
|
2024-10-21 04:08:30 +00:00 |
|
Lianmin Zheng
|
7feba41584
|
Fix failed ci tests on long prompts; Better error messages for embedding models (#1700)
|
2024-10-17 09:23:29 -07:00 |
|
Lianmin Zheng
|
30ee36305e
|
Fix the failed unit tests (#1699)
|
2024-10-17 08:13:29 -07:00 |
|
Jani Monoses
|
a5114b6f91
|
Add OLMo model (#1676)
|
2024-10-16 00:11:18 -07:00 |
|
Lianmin Zheng
|
aba9eae4c6
|
Fix the correctness test in bench_latency.py when tp > 1 and test_generation_models.py (#1631)
|
2024-10-11 05:03:20 -07:00 |
|
Minsang Song
|
e6852b0dd2
|
[Fix] Fix AttributeError in Qwen2.5 LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_hidden_dim' (#1536)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-10-02 20:41:15 -07:00 |
|
Theresa Barton
|
2c7d0a5b8b
|
[Fix] Fix all the Huggingface paths (#1553)
|
2024-10-02 10:12:07 -07:00 |
|
Ying Sheng
|
0f4fb19bc8
|
[Fix, LoRA] fix LoRA with updates in main (#1545)
|
2024-09-30 10:06:08 -07:00 |
|
Lianmin Zheng
|
3f0fe08d37
|
Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541)
|
2024-09-29 20:28:45 -07:00 |
|
Ying Sheng
|
9aa6553d2a
|
[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525)
|
2024-09-27 23:32:11 -07:00 |
|
TianyiQ
|
3c93187caf
|
Add support for tie_word_embeddings when loading weights + support for SmolLM (#1508)
|
2024-09-24 21:50:20 -07:00 |
|
Lianmin Zheng
|
fb2d0680e0
|
[Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510)
|
2024-09-24 21:37:33 -07:00 |
|
Lianmin Zheng
|
167591e864
|
Better unit tests for adding a new model (#1488)
|
2024-09-22 01:50:37 -07:00 |
|
Ying Sheng
|
712216928f
|
[Feature] Initial support for multi-LoRA serving (#1307)
|
2024-09-12 16:46:14 -07:00 |
|
Ying Sheng
|
689ff588ec
|
[CI] Return output logprobs in unit test (#1361)
|
2024-09-09 13:05:13 -07:00 |
|
Yineng Zhang
|
c411f32e1c
|
feat: replace GeluAndMul (#1234)
|
2024-08-28 14:07:02 +00:00 |
|
Yineng Zhang
|
66975360e7
|
fix: increase max_new_tokens when testing generation models (#1244)
|
2024-08-28 22:12:36 +10:00 |
|