Commit Graph

41 Commits

Author SHA1 Message Date
Fred Reiss
993956c6b1 Add support for IBM Granite 3.x models (#2437) 2024-12-11 06:30:23 -08:00
Jani Monoses
db674e3d24 Add OLMo2 model. (#2233) 2024-11-28 00:15:20 -08:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
James Xu
f6f713797b Add support for Qwen2-VL-based embedding models (#2055) 2024-11-21 14:24:25 -08:00
Tanjiro
8c280cee55 add phi-3 small support (#2062)
Co-authored-by: Tushar Goel <114812108+AI-Tushar@users.noreply.github.com>
2024-11-17 18:47:43 -08:00
Xiaoyu Zhang
eff468dd5a fix test_embedding_models prompt length too long's bug (#2015) 2024-11-12 23:21:16 +08:00
Chayenne
c77c1e05ba fix black in pre-commit (#1940) 2024-11-08 07:42:47 +08:00
Xuehai Pan
a5e0defb5a minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926) 2024-11-06 13:46:04 +00:00
Chayenne
704f8e8ed1 Add Reward API Docs etc (#1910)
Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>
2024-11-03 22:33:03 -08:00
Lianmin Zheng
2ce32db6fb Let reward model take text inputs instead of message lists (#1907)
Co-authored-by: Kyle Corbitt <kyle@corbt.com>
2024-11-03 13:27:12 -08:00
DanielC12321
5e00ddebc0 Add new model: Gpt2 (#1833) 2024-10-29 17:52:33 -07:00
Lianmin Zheng
00611286a1 Fix sliding window attention and gemma-2 unit tests in CI (#1746) 2024-10-21 13:47:12 -07:00
sixgod
45d5af2416 Add GLM-4 TextGeneration Model support for SGLang (#1736) 2024-10-21 04:08:30 +00:00
Lianmin Zheng
7feba41584 Fix failed ci tests on long prompts; Better error messages for embedding models (#1700) 2024-10-17 09:23:29 -07:00
Lianmin Zheng
30ee36305e Fix the failed unit tests (#1699) 2024-10-17 08:13:29 -07:00
Jani Monoses
a5114b6f91 Add OLMo model (#1676) 2024-10-16 00:11:18 -07:00
Lianmin Zheng
aba9eae4c6 Fix the correctness test in bench_latency.py when tp > 1 and test_generation_models.py (#1631) 2024-10-11 05:03:20 -07:00
Minsang Song
e6852b0dd2 [Fix] Fix AttributeError in Qwen2.5 LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_hidden_dim' (#1536)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-10-02 20:41:15 -07:00
Theresa Barton
2c7d0a5b8b [Fix] Fix all the Huggingface paths (#1553) 2024-10-02 10:12:07 -07:00
Ying Sheng
0f4fb19bc8 [Fix, LoRA] fix LoRA with updates in main (#1545) 2024-09-30 10:06:08 -07:00
Lianmin Zheng
3f0fe08d37 Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541) 2024-09-29 20:28:45 -07:00
Ying Sheng
9aa6553d2a [Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525) 2024-09-27 23:32:11 -07:00
TianyiQ
3c93187caf Add support for tie_word_embeddings when loading weights + support for SmolLM (#1508) 2024-09-24 21:50:20 -07:00
Lianmin Zheng
fb2d0680e0 [Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510) 2024-09-24 21:37:33 -07:00
Lianmin Zheng
167591e864 Better unit tests for adding a new model (#1488) 2024-09-22 01:50:37 -07:00
Ying Sheng
712216928f [Feature] Initial support for multi-LoRA serving (#1307) 2024-09-12 16:46:14 -07:00
Ying Sheng
689ff588ec [CI] Return output logprobs in unit test (#1361) 2024-09-09 13:05:13 -07:00
Yineng Zhang
c411f32e1c feat: replace GeluAndMul (#1234) 2024-08-28 14:07:02 +00:00
Yineng Zhang
66975360e7 fix: increase max_new_tokens when testing generation models (#1244) 2024-08-28 22:12:36 +10:00
Mingyi
7514b9f8d3 [CI] Fix CI (#1217) 2024-08-26 02:56:42 +00:00
Ying Sheng
308d024092 [CI] Fix the issue of unit test hanging (#1211) 2024-08-25 16:21:37 -07:00
Chayenne
30b4f771b0 Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-25 10:29:12 -07:00
Ying Sheng
1cb4da5c5f [Fix] the issue of random order when input is a list (#1199) 2024-08-24 21:43:03 -07:00
Lianmin Zheng
f6af3a6561 Cleanup readme, llava examples, usage examples and nccl init (#1194) 2024-08-24 08:02:23 -07:00
Ying Sheng
0909bb0d2f [Feat] Add window attention for gemma-2 (#1056) 2024-08-13 17:01:26 -07:00
Ying Sheng
32f6144323 fix: Fix returned prefill logits and add output str test (#1046) 2024-08-12 06:13:45 +00:00
Ying Sheng
b68c4c073b fix: force max new tokens to be 1 for embedding request (#1019) 2024-08-10 13:46:42 -07:00
Ying Sheng
e040a2450b Add e5-mistral embedding model - step 3/3 (#988) 2024-08-08 16:31:19 -07:00
Yineng Zhang
c31f084c71 chore: update vllm to 0.5.4 (#966) 2024-08-07 21:15:41 +10:00
Ying Sheng
995af5a54b Improve the structure of CI (#911) 2024-08-03 23:09:21 -07:00
Ying Sheng
70cc0749ce Add model accuracy test - step 1 (#866) 2024-08-03 18:20:50 -07:00