sglang

Author	SHA1	Message	Date
ylying	fe3be1595d	Add qwen2 tie word embedding (#630 )	2024-07-16 11:48:49 -07:00
Ying Sheng	0aa189f150	Disable NCCL_NVLS by default (#631 )	2024-07-16 09:05:10 -07:00
Liangsheng Yin	c9ee3d3559	Fix model forward grad (#628 )	2024-07-15 22:09:09 -07:00
Lianmin Zheng	41d1f67704	Fix flush cache (#627 )	2024-07-15 20:44:04 -07:00
Ying Sheng	56f5fc4ab5	Bump version to 0.1.21 (#626 )	2024-07-15 13:10:53 -07:00
Ying Sheng	6a2941f4d0	Improve tensor parallel performance (#625 ) Co-authored-by: Mingyi <wisclmy0611@gmail.com>	2024-07-15 07:10:51 -07:00
Mingyi	5ac8b80677	Simplify mem state (#623 )	2024-07-15 02:01:09 -07:00
Liangsheng Yin	a56858ba67	Unify index operations (#620 )	2024-07-14 12:55:55 -07:00
Liangsheng Yin	564a898ad9	Optimize mem indices mangement (#619 )	2024-07-13 23:39:37 -07:00
Lianmin Zheng	5d264a90ac	Bump version to 0.1.20 (#618 )	2024-07-13 17:27:55 -07:00
Ying Sheng	5949b1ca0e	Fix memory pool index error (#616 )	2024-07-13 16:45:11 -07:00
Lianmin Zheng	0feca02dd9	Improve benchmark scripts (#615 )	2024-07-13 15:59:04 -07:00
Liangsheng Yin	10143e1a5f	Memorypool chunked prefetch (#614 )	2024-07-13 15:24:03 -07:00
Lianmin Zheng	65c6577696	Improve benchmark scripts & fix llava (#613 )	2024-07-13 15:00:26 -07:00
Lianmin Zheng	665815969a	Enable cuda graph by default (#612 )	2024-07-13 05:29:46 -07:00
Lianmin Zheng	396a69240f	Cleanup attention backend: flashinfer and triton (#611 )	2024-07-12 18:21:11 -07:00
Lianmin Zheng	af4e7910e7	Clean up the usage of flashinfer (#610 )	2024-07-12 13:00:03 -07:00
Lianmin Zheng	519e20cfda	Code clean up: Remove deprecated prefill move InputMetadata to infer_batch.py (#609 )	2024-07-12 12:28:09 -07:00
Lianmin Zheng	d9a6902986	Fix bench latency (#607 )	2024-07-11 14:37:01 -07:00
Lianmin Zheng	ad872feb14	bump version to 0.1.19	2024-07-09 02:23:14 -07:00
Lianmin Zheng	da2e5d6546	Fix the default argument of OpenAI Chat completion (#605 )	2024-07-09 02:04:43 -07:00
胡译文	02b7258658	[Feat] Expose logprob options to `sgl.gen` API (#503 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-07-09 00:35:39 -07:00
prophe	d557e9f3b7	Update chat template for qwen and yi-1.5. (#530 )	2024-07-08 23:55:44 -07:00
Tommy Yang	740c46a152	Add Qwen2 MoE support (#603 )	2024-07-08 23:44:59 -07:00
Tommy Yang	b38687226a	Make sglang compat with vllm 0.5.1 (#598 )	2024-07-08 23:44:22 -07:00
Pan Lyu	710f614ebe	add minicpm support (#602 )	2024-07-08 23:27:04 -07:00
Liangsheng Yin	f25b76c02a	add `LogitsMetadata` (#604 )	2024-07-08 17:46:55 -07:00
Mingyi	f4e885b7c3	Reduce number of workspaces (#601 )	2024-07-07 19:35:22 -07:00
Liangsheng Yin	0877f1e75b	Fix streaming (#600 )	2024-07-07 01:55:58 -07:00
Liangsheng Yin	5304b4ef58	Add `--enable-p2p-check` option (#599 )	2024-07-06 23:34:10 -07:00
Pan Lyu	26908d9568	* fix(detokenizer_manager.py): fix truncated decoded output (#586 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-07-06 14:53:22 -07:00
Mingyi	c0982ac553	Fix Llava model (#594 )	2024-07-06 00:58:46 -07:00
Ying Sheng	dc1b8bcfaa	Format (#593 )	2024-07-05 10:06:17 -07:00
Ying Sheng	5a57b8addd	Add Gemma2 (#592 )	2024-07-05 09:48:54 -07:00
Ying Sheng	2f11936f95	bump version to 0.1.18	2024-07-04 06:27:29 +00:00
Lianmin Zheng	63fbef9876	fix flashinfer & http log level	2024-07-03 23:19:33 -07:00
Ying Sheng	2a754e57b0	2x performance improvement for large prefill & Fix workspace conflicts (#579 )	2024-07-03 16:14:57 -07:00
Liangsheng Yin	96c503eb60	fix the broken server args (#585 )	2024-07-03 16:01:19 -07:00
Chen Xuechen Li	441cca773d	support gptj style rope in llama	2024-07-03 22:06:58 +00:00
Lianmin Zheng	c7709d3abe	Update install commands (#583 )	2024-07-03 02:10:59 -07:00
Ying Sheng	9380f50ff9	Turn on flashinfer by default (#578 )	2024-07-02 02:25:07 -07:00
Daniel Hernandez Garcia	95dc093b19	[BugFix] gemma loading weights "lm_head.weight" key error (#577 )	2024-07-01 22:10:07 -07:00
Yueyang Pan	d9ac639202	Fix flashinfer version (#576 )	2024-07-01 22:08:39 -07:00
Ying Sheng	75b31a2a88	Update run_batch interface and max_prefill_tokens (#574 )	2024-06-30 18:26:04 -07:00
sglang	11616fc6bd	Minor fix in compiler & format (#545 )	2024-06-29 23:42:14 -07:00
Ying Sheng	9ce89bc14b	Update benchmark script (#571 )	2024-06-28 00:44:22 -07:00
Lianmin Zheng	badf3fa020	Expose dtype argument (#569 )	2024-06-27 23:30:39 -07:00
Lianmin Zheng	2e6e62e156	Increase the number of thread limitation for tp worker managers. (#567 )	2024-06-26 09:33:45 -07:00
Lianmin Zheng	a385ee27bd	Warmup cublas (#566 )	2024-06-25 12:46:00 -07:00
Lianmin Zheng	eb1ae6ae0c	Add sglang.bench_latency for offline benchmark (#564 )	2024-06-25 03:38:04 -07:00

1 2 3 4 5 ...

283 Commits