bjmsong
|
e21026690d
|
benchmark decoding attention kernel with cudnn (#2467)
Co-authored-by: root <bjmsong@126.com>
|
2024-12-17 03:31:57 -08:00 |
|
Lianmin Zheng
|
56198b45d9
|
Add a benchmark script for in-batch prefix caching (#2494)
|
2024-12-16 18:49:02 -08:00 |
|
Xiaoyu Zhang
|
a0592c059f
|
[Benchmark] add a benchmark for hf/vllm/sglang rmsnorm (#2486)
|
2024-12-15 13:52:08 +08:00 |
|
bjmsong
|
f67723940d
|
decoding attention kernel benchmark (#2425)
Co-authored-by: root <bjmsong@126.com>
|
2024-12-11 04:46:59 -08:00 |
|
Xiaoyu Zhang
|
3844feb9bb
|
Add a unittest for fused_moe (#2416)
|
2024-12-08 22:46:10 -08:00 |
|
Lianmin Zheng
|
07ec07ad1f
|
Improve torch compile for fused moe (#2327)
|
2024-12-03 01:58:25 -08:00 |
|
Lianmin Zheng
|
33deca81b5
|
Add more fused moe benchmark utilities (#2314)
|
2024-12-02 04:26:55 -08:00 |
|
Xiaoyu Zhang
|
262e370f78
|
[benchmark] Add fused_moe_triton benchmark and tuning tools (#2225)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2024-11-29 13:36:45 -08:00 |
|
Henry Hyeonmok Ko
|
dbe1729395
|
Merged three native APIs into one: get_server_info (#2152)
|
2024-11-24 01:37:58 -08:00 |
|
Byron Hsu
|
cbedd1db1d
|
[router] cache-aware load-balancing router v1 (#2114)
|
2024-11-23 08:34:48 -08:00 |
|
Xuehai Pan
|
62a4a339eb
|
docs: fix module docstrings and copyright headers (#2077)
|
2024-11-22 22:16:53 +08:00 |
|
Lianmin Zheng
|
c29b98e043
|
Fix json benchmark (#2043)
|
2024-11-15 05:33:43 -08:00 |
|
DarkSharpness
|
954f4e6bd6
|
benchmark json schema (#2030)
|
2024-11-15 05:06:19 -08:00 |
|
Byron Hsu
|
f9633fa9b9
|
[rust] cache-aware DP - approx tree (#1934)
|
2024-11-10 21:57:32 -08:00 |
|
Xuehai Pan
|
a5e0defb5a
|
minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926)
|
2024-11-06 13:46:04 +00:00 |
|
Lianmin Zheng
|
dd3809fad8
|
Fix engine unit test (#1701)
|
2024-10-17 09:53:32 -07:00 |
|
Ying Sheng
|
9c064bf78a
|
[LoRA, Performance] Speedup multi-LoRA serving - Step 1 (#1587)
|
2024-10-06 10:33:44 -07:00 |
|
Theresa Barton
|
2c7d0a5b8b
|
[Fix] Fix all the Huggingface paths (#1553)
|
2024-10-02 10:12:07 -07:00 |
|
Ying Sheng
|
37963394aa
|
[Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433)
|
2024-09-15 12:46:04 -07:00 |
|
Lianmin Zheng
|
e4d68afcf0
|
[Minor] Many cleanup (#1357)
|
2024-09-09 04:14:11 -07:00 |
|
Kai-Hsun Chen
|
c9b75917d5
|
[server] Passing model_override_args to launch_server via the CLI. (#1298)
Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
|
2024-09-09 02:14:25 -07:00 |
|
Yineng Zhang
|
62f15eea5a
|
docs: add conclusion (#1340)
|
2024-09-06 04:25:14 +10:00 |
|
Yineng Zhang
|
79794af52d
|
docs: highlight ttft itl and throughput (#1337)
|
2024-09-06 00:00:06 +10:00 |
|
Yineng Zhang
|
3494b32c3a
|
docs: update README (#1336)
|
2024-09-05 23:39:44 +10:00 |
|
Lianmin Zheng
|
57d0bd91ec
|
Improve benchmark (#1140)
|
2024-08-17 17:43:23 -07:00 |
|
Lianmin Zheng
|
5a261bd055
|
Fix the deadlock in multi-node tp (#1122)
|
2024-08-16 01:39:24 -07:00 |
|
Lianmin Zheng
|
326df4bab2
|
Use a single workspace for flashinfer (#1077)
|
2024-08-14 19:25:37 -07:00 |
|
Yineng Zhang
|
1c2b5f5240
|
docs: update nsys usage (#1103)
|
2024-08-15 01:39:15 +08:00 |
|
Liangsheng Yin
|
a34dd86a7d
|
Use dtype to control generate (#1082)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2024-08-14 15:58:07 +00:00 |
|
Lianmin Zheng
|
a59636bb5e
|
Update grok 1 model (#1095)
|
2024-08-14 04:40:44 -07:00 |
|
Meng, Peng
|
41bb1ab10d
|
fix nsys cannot profile cuda kernel (#957)
|
2024-08-07 11:51:21 +08:00 |
|
Ke Bao
|
e1eae1fd15
|
Support MLA for DeepSeek-V2 with Triton - step 1 (#905)
|
2024-08-05 03:40:33 +10:00 |
|
Yineng Zhang
|
1edd4e07d6
|
chore: bump v0.2.7 (#830)
|
2024-07-30 20:41:10 +10:00 |
|
Yineng Zhang
|
a50c8a14b3
|
fix: use v0.2.5 for benchmark (#814)
|
2024-07-30 12:40:35 +10:00 |
|
Ying Sheng
|
db6089e6f3
|
Revert "Organize public APIs" (#815)
|
2024-07-29 19:40:28 -07:00 |
|
Liangsheng Yin
|
c8e9fed87a
|
Organize public APIs (#809)
|
2024-07-29 15:34:16 -07:00 |
|
Yineng Zhang
|
768e05d08f
|
fix benchmark (#743)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-07-26 21:26:13 +10:00 |
|
Yineng Zhang
|
fded67441d
|
misc: update bulid instruction (#724)
|
2024-07-25 17:08:11 +10:00 |
|
Yineng Zhang
|
97e0f7d250
|
docs: update comment (#721)
|
2024-07-25 10:51:18 +10:00 |
|
Ying Sheng
|
30d8e130e7
|
Improve benchmark scripts (#717)
|
2024-07-24 14:44:14 -07:00 |
|
Ying Sheng
|
08a3bd19cc
|
docs: update doc (#716)
|
2024-07-24 20:44:03 +00:00 |
|
Yineng Zhang
|
321a963b01
|
misc: update doc (#715)
|
2024-07-24 13:05:46 -07:00 |
|
Yineng Zhang
|
2d3ae4e125
|
docs: update doc (#713)
|
2024-07-25 00:03:17 +10:00 |
|
Yineng Zhang
|
75f4ccb7dd
|
docs: update README (#712)
|
2024-07-24 23:33:28 +10:00 |
|
Lianmin Zheng
|
490a1f39dd
|
Fix cuda graph with flashinfer (#675)
|
2024-07-20 02:43:55 -07:00 |
|
zhyncs
|
2e341cd493
|
misc: add pre-commit config (#637)
|
2024-07-17 11:55:39 -07:00 |
|
Lianmin Zheng
|
41d1f67704
|
Fix flush cache (#627)
|
2024-07-15 20:44:04 -07:00 |
|
Ying Sheng
|
6a2941f4d0
|
Improve tensor parallel performance (#625)
Co-authored-by: Mingyi <wisclmy0611@gmail.com>
|
2024-07-15 07:10:51 -07:00 |
|
Mingyi
|
5ac8b80677
|
Simplify mem state (#623)
|
2024-07-15 02:01:09 -07:00 |
|
Ying Sheng
|
bae9541e4c
|
Update benchmark script (#621)
|
2024-07-14 21:38:53 +00:00 |
|