zhyncs
|
2e341cd493
|
misc: add pre-commit config (#637)
|
2024-07-17 11:55:39 -07:00 |
|
zhyncs
|
a8552cb18b
|
feat: support internlm2 (#636)
|
2024-07-16 22:40:03 -07:00 |
|
Ying Sheng
|
a470e60c97
|
clean up step function (#635)
|
2024-07-16 20:15:24 -07:00 |
|
Liangsheng Yin
|
5ff60eda78
|
Fix vertexai (#633)
|
2024-07-16 16:07:19 -07:00 |
|
Aidan Cooper
|
c193002297
|
Add support for VertexAI safety settings (#624)
|
2024-07-16 11:54:42 -07:00 |
|
ylying
|
fe3be1595d
|
Add qwen2 tie word embedding (#630)
|
2024-07-16 11:48:49 -07:00 |
|
Ying Sheng
|
0aa189f150
|
Disable NCCL_NVLS by default (#631)
|
2024-07-16 09:05:10 -07:00 |
|
Liangsheng Yin
|
c9ee3d3559
|
Fix model forward grad (#628)
|
2024-07-15 22:09:09 -07:00 |
|
Lianmin Zheng
|
41d1f67704
|
Fix flush cache (#627)
|
2024-07-15 20:44:04 -07:00 |
|
Ying Sheng
|
56f5fc4ab5
|
Bump version to 0.1.21 (#626)
|
2024-07-15 13:10:53 -07:00 |
|
Ying Sheng
|
6a2941f4d0
|
Improve tensor parallel performance (#625)
Co-authored-by: Mingyi <wisclmy0611@gmail.com>
|
2024-07-15 07:10:51 -07:00 |
|
Mingyi
|
5ac8b80677
|
Simplify mem state (#623)
|
2024-07-15 02:01:09 -07:00 |
|
Liangsheng Yin
|
a56858ba67
|
Unify index operations (#620)
|
2024-07-14 12:55:55 -07:00 |
|
Liangsheng Yin
|
564a898ad9
|
Optimize mem indices mangement (#619)
|
2024-07-13 23:39:37 -07:00 |
|
Lianmin Zheng
|
5d264a90ac
|
Bump version to 0.1.20 (#618)
|
2024-07-13 17:27:55 -07:00 |
|
Ying Sheng
|
5949b1ca0e
|
Fix memory pool index error (#616)
|
2024-07-13 16:45:11 -07:00 |
|
Lianmin Zheng
|
0feca02dd9
|
Improve benchmark scripts (#615)
|
2024-07-13 15:59:04 -07:00 |
|
Liangsheng Yin
|
10143e1a5f
|
Memorypool chunked prefetch (#614)
|
2024-07-13 15:24:03 -07:00 |
|
Lianmin Zheng
|
65c6577696
|
Improve benchmark scripts & fix llava (#613)
|
2024-07-13 15:00:26 -07:00 |
|
Lianmin Zheng
|
665815969a
|
Enable cuda graph by default (#612)
|
2024-07-13 05:29:46 -07:00 |
|
Lianmin Zheng
|
396a69240f
|
Cleanup attention backend: flashinfer and triton (#611)
|
2024-07-12 18:21:11 -07:00 |
|
Lianmin Zheng
|
af4e7910e7
|
Clean up the usage of flashinfer (#610)
|
2024-07-12 13:00:03 -07:00 |
|
Lianmin Zheng
|
519e20cfda
|
Code clean up: Remove deprecated prefill move InputMetadata to infer_batch.py (#609)
|
2024-07-12 12:28:09 -07:00 |
|
Lianmin Zheng
|
d9a6902986
|
Fix bench latency (#607)
|
2024-07-11 14:37:01 -07:00 |
|
Lianmin Zheng
|
ad872feb14
|
bump version to 0.1.19
|
2024-07-09 02:23:14 -07:00 |
|
Lianmin Zheng
|
da2e5d6546
|
Fix the default argument of OpenAI Chat completion (#605)
|
2024-07-09 02:04:43 -07:00 |
|
胡译文
|
02b7258658
|
[Feat] Expose logprob options to sgl.gen API (#503)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-07-09 00:35:39 -07:00 |
|
prophe
|
d557e9f3b7
|
Update chat template for qwen and yi-1.5. (#530)
|
2024-07-08 23:55:44 -07:00 |
|
Tommy Yang
|
740c46a152
|
Add Qwen2 MoE support (#603)
|
2024-07-08 23:44:59 -07:00 |
|
Tommy Yang
|
b38687226a
|
Make sglang compat with vllm 0.5.1 (#598)
|
2024-07-08 23:44:22 -07:00 |
|
Pan Lyu
|
710f614ebe
|
add minicpm support (#602)
|
2024-07-08 23:27:04 -07:00 |
|
Liangsheng Yin
|
f25b76c02a
|
add LogitsMetadata (#604)
|
2024-07-08 17:46:55 -07:00 |
|
Mingyi
|
f4e885b7c3
|
Reduce number of workspaces (#601)
|
2024-07-07 19:35:22 -07:00 |
|
Liangsheng Yin
|
0877f1e75b
|
Fix streaming (#600)
|
2024-07-07 01:55:58 -07:00 |
|
Liangsheng Yin
|
5304b4ef58
|
Add --enable-p2p-check option (#599)
|
2024-07-06 23:34:10 -07:00 |
|
Pan Lyu
|
26908d9568
|
* fix(detokenizer_manager.py): fix truncated decoded output (#586)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
|
2024-07-06 14:53:22 -07:00 |
|
Mingyi
|
c0982ac553
|
Fix Llava model (#594)
|
2024-07-06 00:58:46 -07:00 |
|
Ying Sheng
|
dc1b8bcfaa
|
Format (#593)
|
2024-07-05 10:06:17 -07:00 |
|
Ying Sheng
|
5a57b8addd
|
Add Gemma2 (#592)
|
2024-07-05 09:48:54 -07:00 |
|
Ying Sheng
|
2f11936f95
|
bump version to 0.1.18
|
2024-07-04 06:27:29 +00:00 |
|
Lianmin Zheng
|
63fbef9876
|
fix flashinfer & http log level
|
2024-07-03 23:19:33 -07:00 |
|
Ying Sheng
|
2a754e57b0
|
2x performance improvement for large prefill & Fix workspace conflicts (#579)
|
2024-07-03 16:14:57 -07:00 |
|
Liangsheng Yin
|
96c503eb60
|
fix the broken server args (#585)
|
2024-07-03 16:01:19 -07:00 |
|
Chen Xuechen Li
|
441cca773d
|
support gptj style rope in llama
|
2024-07-03 22:06:58 +00:00 |
|
Lianmin Zheng
|
c7709d3abe
|
Update install commands (#583)
|
2024-07-03 02:10:59 -07:00 |
|
Ying Sheng
|
9380f50ff9
|
Turn on flashinfer by default (#578)
|
2024-07-02 02:25:07 -07:00 |
|
Daniel Hernandez Garcia
|
95dc093b19
|
[BugFix] gemma loading weights "lm_head.weight" key error (#577)
|
2024-07-01 22:10:07 -07:00 |
|
Yueyang Pan
|
d9ac639202
|
Fix flashinfer version (#576)
|
2024-07-01 22:08:39 -07:00 |
|
Ying Sheng
|
75b31a2a88
|
Update run_batch interface and max_prefill_tokens (#574)
|
2024-06-30 18:26:04 -07:00 |
|
sglang
|
11616fc6bd
|
Minor fix in compiler & format (#545)
|
2024-06-29 23:42:14 -07:00 |
|