sglang

Author	SHA1	Message	Date
Fred Reiss	993956c6b1	Add support for IBM Granite 3.x models (#2437 )	2024-12-11 06:30:23 -08:00
Jani Monoses	db674e3d24	Add OLMo2 model. (#2233 )	2024-11-28 00:15:20 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
James Xu	f6f713797b	Add support for Qwen2-VL-based embedding models (#2055 )	2024-11-21 14:24:25 -08:00
Tanjiro	8c280cee55	add phi-3 small support (#2062 ) Co-authored-by: Tushar Goel <114812108+AI-Tushar@users.noreply.github.com>	2024-11-17 18:47:43 -08:00
Xiaoyu Zhang	eff468dd5a	fix test_embedding_models prompt length too long's bug (#2015 )	2024-11-12 23:21:16 +08:00
Chayenne	c77c1e05ba	fix black in pre-commit (#1940 )	2024-11-08 07:42:47 +08:00
Xuehai Pan	a5e0defb5a	minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926 )	2024-11-06 13:46:04 +00:00
Chayenne	704f8e8ed1	Add Reward API Docs etc (#1910 ) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>	2024-11-03 22:33:03 -08:00
Lianmin Zheng	2ce32db6fb	Let reward model take text inputs instead of message lists (#1907 ) Co-authored-by: Kyle Corbitt <kyle@corbt.com>	2024-11-03 13:27:12 -08:00
DanielC12321	5e00ddebc0	Add new model: Gpt2 (#1833 )	2024-10-29 17:52:33 -07:00
Lianmin Zheng	00611286a1	Fix sliding window attention and gemma-2 unit tests in CI (#1746 )	2024-10-21 13:47:12 -07:00
sixgod	45d5af2416	Add GLM-4 TextGeneration Model support for SGLang (#1736 )	2024-10-21 04:08:30 +00:00
Lianmin Zheng	7feba41584	Fix failed ci tests on long prompts; Better error messages for embedding models (#1700 )	2024-10-17 09:23:29 -07:00
Lianmin Zheng	30ee36305e	Fix the failed unit tests (#1699 )	2024-10-17 08:13:29 -07:00
Jani Monoses	a5114b6f91	Add OLMo model (#1676 )	2024-10-16 00:11:18 -07:00
Lianmin Zheng	aba9eae4c6	Fix the correctness test in bench_latency.py when tp > 1 and test_generation_models.py (#1631 )	2024-10-11 05:03:20 -07:00
Minsang Song	e6852b0dd2	[Fix] Fix AttributeError in Qwen2.5 LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_hidden_dim' (#1536 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-10-02 20:41:15 -07:00
Theresa Barton	2c7d0a5b8b	[Fix] Fix all the Huggingface paths (#1553 )	2024-10-02 10:12:07 -07:00
Ying Sheng	0f4fb19bc8	[Fix, LoRA] fix LoRA with updates in main (#1545 )	2024-09-30 10:06:08 -07:00
Lianmin Zheng	3f0fe08d37	Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541 )	2024-09-29 20:28:45 -07:00
Ying Sheng	9aa6553d2a	[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525 )	2024-09-27 23:32:11 -07:00
TianyiQ	3c93187caf	Add support for tie_word_embeddings when loading weights + support for SmolLM (#1508 )	2024-09-24 21:50:20 -07:00
Lianmin Zheng	fb2d0680e0	[Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510 )	2024-09-24 21:37:33 -07:00
Lianmin Zheng	167591e864	Better unit tests for adding a new model (#1488 )	2024-09-22 01:50:37 -07:00
Ying Sheng	712216928f	[Feature] Initial support for multi-LoRA serving (#1307 )	2024-09-12 16:46:14 -07:00
Ying Sheng	689ff588ec	[CI] Return output logprobs in unit test (#1361 )	2024-09-09 13:05:13 -07:00
Yineng Zhang	c411f32e1c	feat: replace GeluAndMul (#1234 )	2024-08-28 14:07:02 +00:00
Yineng Zhang	66975360e7	fix: increase max_new_tokens when testing generation models (#1244 )	2024-08-28 22:12:36 +10:00
Mingyi	7514b9f8d3	[CI] Fix CI (#1217 )	2024-08-26 02:56:42 +00:00
Ying Sheng	308d024092	[CI] Fix the issue of unit test hanging (#1211 )	2024-08-25 16:21:37 -07:00
Chayenne	30b4f771b0	Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-25 10:29:12 -07:00
Ying Sheng	1cb4da5c5f	[Fix] the issue of random order when input is a list (#1199 )	2024-08-24 21:43:03 -07:00
Lianmin Zheng	f6af3a6561	Cleanup readme, llava examples, usage examples and nccl init (#1194 )	2024-08-24 08:02:23 -07:00
Ying Sheng	0909bb0d2f	[Feat] Add window attention for gemma-2 (#1056 )	2024-08-13 17:01:26 -07:00
Ying Sheng	32f6144323	fix: Fix returned prefill logits and add output str test (#1046 )	2024-08-12 06:13:45 +00:00
Ying Sheng	b68c4c073b	fix: force max new tokens to be 1 for embedding request (#1019 )	2024-08-10 13:46:42 -07:00
Ying Sheng	e040a2450b	Add e5-mistral embedding model - step 3/3 (#988 )	2024-08-08 16:31:19 -07:00
Yineng Zhang	c31f084c71	chore: update vllm to 0.5.4 (#966 )	2024-08-07 21:15:41 +10:00
Ying Sheng	995af5a54b	Improve the structure of CI (#911 )	2024-08-03 23:09:21 -07:00
Ying Sheng	70cc0749ce	Add model accuracy test - step 1 (#866 )	2024-08-03 18:20:50 -07:00

41 Commits