kk
|
7ca751ff7d
|
Fused moe triton cfg opt for rocm (#2612)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2024-12-26 23:38:22 -08:00 |
|
Lianmin Zheng
|
c75adfec59
|
Update CODEOWNERS (#2608)
|
2024-12-26 20:58:08 -08:00 |
|
HAI
|
7722c11c1d
|
Regression fix to AMD/ROCm from recent change (#2606)
|
2024-12-26 20:22:14 -08:00 |
|
fzyzcjy
|
b2ed5c8ea7
|
Tiny code cleanup in tokenizer_manager.py (#2586)
|
2024-12-26 17:53:09 -08:00 |
|
Lianmin Zheng
|
f46f394f4d
|
Update README.md (#2605)
|
2024-12-26 10:58:49 -08:00 |
|
Lianmin Zheng
|
2125898af5
|
Update contributor_guide.md (#2603)
|
2024-12-26 08:36:13 -08:00 |
|
fzyzcjy
|
44f011d224
|
Super tiny typo fix (#2564)
|
2024-12-26 08:28:01 -08:00 |
|
kzhou003
|
ed91e003bb
|
[UTILS] improve makefile a bit by adding help info (#2570)
Co-authored-by: Hongpeng Guo <hpguo@anyscale.com>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: yigex <yigex@amd.com>
|
2024-12-26 08:24:18 -08:00 |
|
yudian0504
|
531d6ea968
|
fix: package data missing (#2521)
|
2024-12-26 08:16:48 -08:00 |
|
Lianmin Zheng
|
dc3bee4815
|
Fix test and benchmark scripts (#2598)
|
2024-12-26 07:56:26 -08:00 |
|
Zhizhou Sha
|
a74d194146
|
[unittest] add unit test to test quant args of srt engine (#2574)
|
2024-12-26 06:54:43 -08:00 |
|
fzyzcjy
|
3169e66c23
|
Fix duplicated handling of GetWeightsByNameReqInput (#2565)
|
2024-12-26 06:49:32 -08:00 |
|
Lianmin Zheng
|
773951548d
|
Fix logprob_start_len for multi modal models (#2597)
Co-authored-by: libra <lihu723@gmail.com>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: Wang, Haoyu <haoyu.wang@intel.com>
|
2024-12-26 06:27:45 -08:00 |
|
fsygd
|
637de9e8ce
|
update readme of DeepSeek V3 (#2596)
|
2024-12-26 21:31:56 +08:00 |
|
Adarsh Shirawalmath
|
acb340728c
|
[Feature] Support new parameter - EBNF in xgrammar (#2526)
|
2024-12-26 05:12:41 -08:00 |
|
Sangchun Ha (Patrick)
|
08effbff35
|
Error occurs when loading the gemma model in bitsandbytes format. (#2557)
|
2024-12-26 05:10:37 -08:00 |
|
Lianmin Zheng
|
60bd32723a
|
Update README.md (#2594)
|
2024-12-26 03:31:50 -08:00 |
|
Liangsheng Yin
|
e7ebecf82e
|
Fix cache hit rate when chunked prefill (#2555)
|
2024-12-26 03:14:28 -08:00 |
|
Xiaoyu Zhang
|
9a23c48456
|
h100 tuning fused_moe_triton for qwen2 moe (#2560)
|
2024-12-26 03:13:31 -08:00 |
|
Yineng Zhang
|
635a042623
|
docs: update deepseek v3 example (#2592)
|
2024-12-26 17:43:37 +08:00 |
|
Yineng Zhang
|
2dccecf432
|
fix: only enable moe_align_block_size for now (#2590)
|
2024-12-26 16:56:59 +08:00 |
|
Yineng Zhang
|
75ad0a143f
|
docs: add deepseek v3 launch instructions (#2589)
|
2024-12-25 23:26:54 -08:00 |
|
Yineng Zhang
|
efc52f85e2
|
chore: bump v0.4.1 (#2582)
|
2024-12-26 07:14:51 +08:00 |
|
Yineng Zhang
|
60e2fdcf4f
|
use sgl-kernel moe_align_block_size (#2581)
Co-authored-by: ispobock <ispobaoke@163.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-26 06:29:08 +08:00 |
|
Yineng Zhang
|
d7c0e872b0
|
chore: bump 0.0.2.post8 for sgl-kernel (#2580)
|
2024-12-26 06:11:39 +08:00 |
|
Yineng Zhang
|
31548116a8
|
fix moe_align_block_size_kernel for shared memory issue (#2579)
Co-authored-by: ispobock <ispobaoke@163.com>
|
2024-12-26 05:31:04 +08:00 |
|
HandH1998
|
53aed988cb
|
Refactor MoE (#2575)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2024-12-26 00:02:14 +08:00 |
|
Ying Sheng
|
8a56b43175
|
[Bench] Flush cache before benchmarking (#2566)
|
2024-12-24 11:21:21 +08:00 |
|
Ke Bao
|
e835a50021
|
Reorg moe code (#2563)
|
2024-12-24 01:10:22 +08:00 |
|
Lianmin Zheng
|
23e5e50fd5
|
Fix gemlite import (#2553)
|
2024-12-22 20:21:17 -08:00 |
|
Shi Shuai
|
25e5d589e3
|
Doc: Update Grammar Backend (#2545)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2024-12-22 17:14:40 -08:00 |
|
Lianmin Zheng
|
41b1db69b8
|
A better aio rwlock that guarantees the order (#2547)
|
2024-12-22 15:44:32 -08:00 |
|
Lianmin Zheng
|
8496701934
|
[Misc] Fix metrics, weight update lock, request logging (#2543)
|
2024-12-22 06:27:22 -08:00 |
|
Xiaoyu Zhang
|
7d672d277b
|
[kernel optimize] benchmark write_req_to_token_pool_triton and optimize kernel (#2509)
|
2024-12-22 02:31:02 -08:00 |
|
Yineng Zhang
|
d4b174817d
|
docs: update sponsorship (DataCrunch) (#2523)
|
2024-12-22 02:29:04 -08:00 |
|
Lei
|
19ba2b0ea9
|
Add lora_paths to v1_chat_generate_request (#2529)
|
2024-12-22 02:23:33 -08:00 |
|
Yineng Zhang
|
4e1e3cff20
|
fix #2528 (#2541)
|
2024-12-22 00:14:41 +08:00 |
|
Yineng Zhang
|
8f4d04e540
|
chore: bump v0.4.0.post2 (#2525)
|
2024-12-21 21:16:34 +08:00 |
|
Jerry Zhang
|
feb2b768ba
|
Add integration with gemlite weight only quant (#2528)
|
2024-12-21 00:25:25 +08:00 |
|
Yineng Zhang
|
d95a5f5bf5
|
fix followup #2517 (#2524)
|
2024-12-19 23:24:30 +08:00 |
|
Yineng Zhang
|
4b83db24f1
|
fix: continue to use flashinfer 0.1.6 temporarily (#2517)
|
2024-12-19 14:03:24 +08:00 |
|
Yineng Zhang
|
64456cf023
|
docs: update README (#2516)
|
2024-12-19 13:44:02 +08:00 |
|
Yineng Zhang
|
bb4a922023
|
feat: add llama3 eval (#2515)
|
2024-12-19 13:37:09 +08:00 |
|
Lianmin Zheng
|
21e9e63ad5
|
Print progress bar during cuda graph capture (#2502)
|
2024-12-17 06:33:46 -08:00 |
|
Lianmin Zheng
|
1fc84cf60b
|
Update readme (#2500)
Co-authored-by: Ravi Theja <ravi03071991@gmail.com>
Co-authored-by: “yixin-huang1” <yixinhuang1@berkeley.edu>
|
2024-12-17 04:33:36 -08:00 |
|
Lianmin Zheng
|
361ea8d912
|
Fix openai protocols and pass top_k, min_p (#2499)
|
2024-12-17 04:14:14 -08:00 |
|
Lei
|
33c5ff2845
|
Add lora_path to chat completion (#2438)
|
2024-12-17 03:47:49 -08:00 |
|
Hui Liu
|
5ce9daea59
|
ROCm support for sglang.check_env (#2426)
|
2024-12-17 03:45:14 -08:00 |
|
Ata Fatahi
|
ce094a5d79
|
Clean up GPU memory after killing sglang processes (#2457)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
|
2024-12-17 03:42:40 -08:00 |
|
bjmsong
|
e21026690d
|
benchmark decoding attention kernel with cudnn (#2467)
Co-authored-by: root <bjmsong@126.com>
|
2024-12-17 03:31:57 -08:00 |
|