Mingyi
|
c0982ac553
|
Fix Llava model (#594)
|
2024-07-06 00:58:46 -07:00 |
|
Ying Sheng
|
dc1b8bcfaa
|
Format (#593)
|
2024-07-05 10:06:17 -07:00 |
|
Ying Sheng
|
5a57b8addd
|
Add Gemma2 (#592)
|
2024-07-05 09:48:54 -07:00 |
|
Ying Sheng
|
2f11936f95
|
bump version to 0.1.18
|
2024-07-04 06:27:29 +00:00 |
|
Lianmin Zheng
|
63fbef9876
|
fix flashinfer & http log level
|
2024-07-03 23:19:33 -07:00 |
|
Ying Sheng
|
2a754e57b0
|
2x performance improvement for large prefill & Fix workspace conflicts (#579)
|
2024-07-03 16:14:57 -07:00 |
|
Liangsheng Yin
|
96c503eb60
|
fix the broken server args (#585)
|
2024-07-03 16:01:19 -07:00 |
|
Chen Xuechen Li
|
441cca773d
|
support gptj style rope in llama
|
2024-07-03 22:06:58 +00:00 |
|
Lianmin Zheng
|
c7709d3abe
|
Update install commands (#583)
|
2024-07-03 02:10:59 -07:00 |
|
Ying Sheng
|
9380f50ff9
|
Turn on flashinfer by default (#578)
|
2024-07-02 02:25:07 -07:00 |
|
Daniel Hernandez Garcia
|
95dc093b19
|
[BugFix] gemma loading weights "lm_head.weight" key error (#577)
|
2024-07-01 22:10:07 -07:00 |
|
Yueyang Pan
|
d9ac639202
|
Fix flashinfer version (#576)
|
2024-07-01 22:08:39 -07:00 |
|
Ying Sheng
|
75b31a2a88
|
Update run_batch interface and max_prefill_tokens (#574)
|
2024-06-30 18:26:04 -07:00 |
|
sglang
|
11616fc6bd
|
Minor fix in compiler & format (#545)
|
2024-06-29 23:42:14 -07:00 |
|
Ying Sheng
|
9ce89bc14b
|
Update benchmark script (#571)
|
2024-06-28 00:44:22 -07:00 |
|
Lianmin Zheng
|
badf3fa020
|
Expose dtype argument (#569)
|
2024-06-27 23:30:39 -07:00 |
|
Lianmin Zheng
|
2e6e62e156
|
Increase the number of thread limitation for tp worker managers. (#567)
|
2024-06-26 09:33:45 -07:00 |
|
Lianmin Zheng
|
a385ee27bd
|
Warmup cublas (#566)
|
2024-06-25 12:46:00 -07:00 |
|
Lianmin Zheng
|
eb1ae6ae0c
|
Add sglang.bench_latency for offline benchmark (#564)
|
2024-06-25 03:38:04 -07:00 |
|
Lianmin Zheng
|
2187f36237
|
Add a new arguments log_level_http to control the HTTP logging (#563)
|
2024-06-25 01:16:20 -07:00 |
|
Lianmin Zheng
|
9465b668b9
|
Allow running with vllm==0.4.3 (#561)
|
2024-06-24 15:24:21 -07:00 |
|
Lianmin Zheng
|
1fa15099d8
|
Add LlamaForClassification (#559)
|
2024-06-22 00:49:31 -07:00 |
|
Lianmin Zheng
|
303ef8883e
|
Clean up logits processor (#558)
|
2024-06-22 00:25:24 -07:00 |
|
Lianmin Zheng
|
e94e60d6fb
|
make flashinfer workspace larger
|
2024-06-21 17:32:36 -07:00 |
|
Lianmin Zheng
|
d2f8bfb2e1
|
Follow-up fixes for flashinfer 0.0.5 (#556)
|
2024-06-20 23:19:52 -07:00 |
|
Lianmin Zheng
|
b7e2f800ac
|
Update flashinfer to 0.0.5 (#554)
|
2024-06-20 20:29:06 -07:00 |
|
Ying Sheng
|
09593e9bc9
|
Multi-node Tensor Parallelism (#550)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-06-17 20:41:24 -07:00 |
|
Lianmin Zheng
|
53a7ebd89a
|
Update fused_moe (#553)
|
2024-06-17 09:47:58 -07:00 |
|
Liangsheng Yin
|
ad5f04d6ce
|
Fix the Jump-Forward with Chinese (#551)
|
2024-06-16 21:45:04 +08:00 |
|
Qubitium-modelcloud
|
bbec01c9aa
|
Fix tp worker only checking req[0] for stream (#546)
|
2024-06-14 22:56:10 -07:00 |
|
Ying Sheng
|
fb9296f0ed
|
Higher priority for user input of max_prefill_tokens & format (#540)
|
2024-06-12 21:48:40 -07:00 |
|
Ying Sheng
|
1374334d38
|
Fix dependency & crash issues (#539)
|
2024-06-12 21:23:19 -07:00 |
|
Lianmin Zheng
|
94aead9e8d
|
Fix dependency (#538)
|
2024-06-12 13:17:35 -07:00 |
|
Liangsheng Yin
|
9c902b1954
|
Decode Incrementally (#517)
|
2024-06-11 23:39:12 -07:00 |
|
ZhouXingg
|
111991fe23
|
Fix Regression: Disable p2p for 4090 (#531)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
|
2024-06-11 23:27:17 -07:00 |
|
Qubitium
|
a8c787d2b3
|
Add ChatGLM Model Support (#516)
Co-authored-by: ZX <zx@lbx.dev>
|
2024-06-11 16:39:52 -07:00 |
|
Fabian Preiß
|
5f283991e9
|
[Minor] Correct Optional type hints in api (#526)
|
2024-06-11 16:37:27 -07:00 |
|
Fabian Preiß
|
542bc733d6
|
Fix missing numpy dependency in pyproject.toml (#524)
|
2024-06-10 12:13:50 -07:00 |
|
Lianmin Zheng
|
f6dbd24043
|
Improve doc strings (#518)
|
2024-06-08 02:39:32 -07:00 |
|
Lianmin Zheng
|
e8a2327d52
|
Update version to 0.1.17 (#515)
|
2024-06-07 19:49:18 -07:00 |
|
Lianmin Zheng
|
91f93f141f
|
Crash the server when error or OOM happens (#514)
|
2024-06-07 19:22:34 -07:00 |
|
Qubitium
|
f70f72586a
|
Fix rid state map leak + Refractor .finished (#505)
Co-authored-by: ZX <zx@lbx.dev>
|
2024-06-07 13:20:40 -07:00 |
|
Lianmin Zheng
|
c0ae70c8ed
|
Improve logging & fix litellm dependency. (#512)
|
2024-06-07 13:10:32 -07:00 |
|
胡译文
|
87260b7bfd
|
Litellm Backend (#502)
|
2024-06-07 12:24:28 -07:00 |
|
Amos You
|
651a23ee7c
|
remove redundant pad_input_ids function (#500)
|
2024-06-07 12:23:29 -07:00 |
|
Lianmin Zheng
|
bf3e271fe0
|
Update vllm to v0.4.3 (#511)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
Co-authored-by: ZX <zx@lbx.dev>
|
2024-06-07 12:11:31 -07:00 |
|
Lianmin Zheng
|
3bc01ac137
|
[Minor] improve code style
|
2024-06-03 18:11:34 -07:00 |
|
Lianmin Zheng
|
159cc741e4
|
Make the server random by default (#493)
|
2024-05-31 23:33:34 -07:00 |
|
Ying Sheng
|
83525a1df2
|
Revert "Make the server random by default" (#492)
|
2024-05-31 12:00:21 -07:00 |
|
Lianmin Zheng
|
80a33ce8b0
|
Do not set the default value of global random seed (#488)
|
2024-05-29 18:41:18 -04:00 |
|