sglang

Author	SHA1	Message	Date
Ke Bao	5303c1ed22	Support Mistral-Nemo (#691 )	2024-07-22 03:36:53 +10:00
zhyncs	65bd13386b	misc: recommend to use chat model for benchmark (#690 )	2024-07-22 00:13:33 +10:00
Liangsheng Yin	eedc12e12e	Support Deepseek MoE Model (#689 )	2024-07-21 03:09:29 -07:00
zhyncs	6a846bb1fd	misc: update output file logic (#686 )	2024-07-21 18:07:30 +10:00
zhyncs	0fdb3127a1	feat: update bench serving (#685 )	2024-07-21 16:46:58 +10:00
Max Shawabkeh	5ad033a070	Fix StreamExecutor.fork() losing the current role start index. (#684 )	2024-07-20 23:32:11 -07:00
Lianmin Zheng	77e592e8e0	support non-streaming benchmark (#682 )	2024-07-20 18:36:42 -07:00
Liangsheng Yin	caaad53b52	Support gpt-bigcode model class (#681 )	2024-07-20 18:34:37 -07:00
Liangsheng Yin	69d19188fc	Decouple kv (#679 )	2024-07-20 14:16:45 -07:00
zhyncs	4b4a67f814	feat: support TRT LLM benchmark and multiple benchmarks (#670 )	2024-07-20 11:05:35 -07:00
Ke Bao	0ac94c36cb	Fallback when sampling failed (#678 )	2024-07-20 10:44:54 -07:00
Ying Sheng	2b4c646277	Update version to 0.1.22 (#677 )	2024-07-20 03:39:50 -07:00
Liangsheng Yin	f424e76d96	Fix illegal tokens during sampling (#676 )	2024-07-20 03:11:15 -07:00
Lianmin Zheng	490a1f39dd	Fix cuda graph with flashinfer (#675 )	2024-07-20 02:43:55 -07:00
Ying Sheng	06487f126e	refactor model loader: initial refactor (#664 )	2024-07-20 02:18:22 -07:00
Liangsheng Yin	39c57317e1	Revert "Temporary fix invalid sample results" (#673 )	2024-07-20 02:06:31 -07:00
Lianmin Zheng	9592a1f3bd	Fix random dataset (#671 )	2024-07-20 01:57:43 -07:00
Lianmin Zheng	35759efa91	Support random dataset in bench_serving.py (#669 )	2024-07-20 01:06:43 -07:00
Liangsheng Yin	8f4b1559e7	Temporary fix invalid sample results (#668 )	2024-07-20 00:51:05 -07:00
Mingyi	e3046ea3a8	Update OpenAI API (#667 )	2024-07-19 23:20:54 -07:00
yichuan~	49c5e0eca9	Add support for OpenAI API parallel sampling (#640 )	2024-07-19 23:10:01 -07:00
Ke Bao	ec2150b294	Fix kill process util (#666 )	2024-07-19 21:43:11 -07:00
Liangsheng Yin	7620cd37dd	Fix jump forward when streaming (#665 )	2024-07-19 16:42:06 -07:00
Ying Sheng	11c8efff73	Add benchmark instructions (#663 )	2024-07-19 11:12:23 -07:00
Ying Sheng	e87c7fd501	Improve docs (#662 )	2024-07-19 10:58:03 -07:00
zhyncs	630479c3a6	feat: update check env (#661 )	2024-07-19 09:54:15 -07:00
Ying Sheng	51fda1439f	Update Readme (#660 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-07-19 09:54:01 -07:00
zhyncs	dc4e4a6acc	misc: update SGLang package description (#659 )	2024-07-19 09:27:39 -07:00
Ying Sheng	2d96da813e	refactor model loader [unreachable code]: initial refactor (#655 )	2024-07-19 09:27:06 -07:00
zhyncs	c126a6ccba	feat: add benchmark serving (#657 )	2024-07-19 09:15:21 -07:00
zhyncs	ac971ff633	perf: reduce ttft and itl with stream_interval 1 (#658 )	2024-07-19 09:14:22 -07:00
Lianmin Zheng	e1792cca24	Remove cached triton launcher (#656 )	2024-07-18 23:28:40 -07:00
shrirajh	1b7adbb5a0	`TokenizerManager.context_len` should inherit from `server_args.conte… (#654 )	2024-07-18 21:55:29 -07:00
Liangsheng Yin	a9ef49c12c	Detokenize incrementally when streaming (#653 )	2024-07-18 17:57:40 -07:00
Ying Sheng	21ba3a88a1	Remove useless variables in infer_batch.py (#651 )	2024-07-18 05:31:44 -07:00
zhyncs	9c5cac2450	fix: resolve lint error (#650 )	2024-07-18 03:33:21 -07:00
zhyncs	b050d9283f	fix: set ulimit -n 65535 (#647 )	2024-07-18 02:35:45 -07:00
zhyncs	6a4dc99697	misc: rm rpyc from PACKAGE_LIST (#649 )	2024-07-18 02:35:38 -07:00
Mingyi	d774acad5c	Remove the dependency of rpyc (#646 )	2024-07-18 02:13:54 -07:00
zhyncs	d93388da3e	feat: add check_env (#645 )	2024-07-17 21:39:28 -07:00
Ying Sheng	476584cb6e	Increase the capacity of the memory pool (#643 )	2024-07-17 15:44:41 -07:00
Liangsheng Yin	abd5385ac5	Move `global_server_args_dict` (#642 )	2024-07-17 13:49:15 -07:00
Liangsheng Yin	3de2f30a27	Flashinfer sample kernel (#617 )	2024-07-17 13:24:43 -07:00
zhyncs	2e341cd493	misc: add pre-commit config (#637 )	2024-07-17 11:55:39 -07:00
zhyncs	a8552cb18b	feat: support internlm2 (#636 )	2024-07-16 22:40:03 -07:00
Ying Sheng	a470e60c97	clean up step function (#635 )	2024-07-16 20:15:24 -07:00
Liangsheng Yin	5ff60eda78	Fix vertexai (#633 )	2024-07-16 16:07:19 -07:00
Aidan Cooper	c193002297	Add support for VertexAI safety settings (#624 )	2024-07-16 11:54:42 -07:00
ylying	fe3be1595d	Add qwen2 tie word embedding (#630 )	2024-07-16 11:48:49 -07:00
Ying Sheng	0aa189f150	Disable NCCL_NVLS by default (#631 )	2024-07-16 09:05:10 -07:00

1 2 3 4 5 ...

331 Commits