yizhang2077
|
3900a94afe
|
Support twoshot kernel (#2688)
|
2025-01-06 00:47:16 +08:00 |
|
Xiaoyu Zhang
|
ded9fcd09a
|
improve moe_align_kernel for deepseek v3 (#2735)
|
2025-01-06 00:28:22 +08:00 |
|
Yineng Zhang
|
bc6ad367c2
|
fix lint (#2733)
|
2025-01-05 14:45:42 +08:00 |
|
Lianmin Zheng
|
3a22a303d1
|
Revert the GLOO_SOCKET_IFNAME change (#2731)
|
2025-01-04 20:13:16 -08:00 |
|
libra
|
bdb3929dbb
|
Refactor SchedulePolicy to improve code organization (#2571)
|
2025-01-04 00:05:16 +08:00 |
|
Ce Gao
|
f5d0865b25
|
feat: Support VLM in reference_hf (#2726)
Signed-off-by: Ce Gao <gaocegege@hotmail.com>
|
2025-01-03 22:32:30 +08:00 |
|
Ce Gao
|
afdee7b1a9
|
[Docs] fix 404 - Contributor Guide, again (#2727)
Signed-off-by: Ce Gao <gaocegege@hotmail.com>
|
2025-01-03 22:21:38 +08:00 |
|
Lianmin Zheng
|
cb34d848ac
|
Update README.md (#2722)
Co-authored-by: Yangmin Li <2682000734@qq.com>
Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu>
Co-authored-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-01-03 00:32:20 -08:00 |
|
Lianmin Zheng
|
0f9cc6d8d3
|
Fix package loss for small models (#2717)
Co-authored-by: sdli1995 < mmlmonkey@163.com>
|
2025-01-02 18:25:26 -08:00 |
|
yigex
|
c7ae474a49
|
[Feature, Hardware] Enable DeepseekV3 on AMD GPUs (#2601)
Co-authored-by: root <root@banff-cyxtera-s83-5.amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Bruce Xue <yigex@xilinx.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-01-02 16:23:19 -08:00 |
|
Lianmin Zheng
|
bdf946bf81
|
Support loading pre-sharded moe weights (#2716)
|
2025-01-02 15:07:37 -08:00 |
|
yukavio
|
8c8779cd05
|
[Fix] fix retract error in eagle speculative decoding (#2711)
Co-authored-by: kavioyu <kavioyu@tencent.com>
|
2025-01-02 10:28:39 -08:00 |
|
Mick
|
1775b963db
|
[Fix] fix incorrectly overwriting the port specified in ServerArgs (#2714)
|
2025-01-02 10:28:22 -08:00 |
|
Shi Shuai
|
dd2e2d275f
|
Docs: Update documentation workflow and contribution guide (#2704)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-01-02 09:18:31 -08:00 |
|
Rodrigo Garcia
|
a990daff9c
|
Included multi-node DeepSeekv3 example (#2707)
|
2025-01-02 22:17:03 +08:00 |
|
Yineng Zhang
|
ba5112ff69
|
feat: support moe_align_block_size_triton (#2712)
Co-authored-by: WANDY666 <1060304770@qq.com>
|
2025-01-02 21:47:44 +08:00 |
|
yukavio
|
815dce0554
|
Eagle speculative decoding part 4: Add EAGLE2 worker (#2150)
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-01-02 03:22:34 -08:00 |
|
Lianmin Zheng
|
ad20b7957e
|
Eagle speculative decoding part 3: small modifications to the general scheduler (#2709)
Co-authored-by: kavioyu <kavioyu@tencent.com>
|
2025-01-02 02:09:08 -08:00 |
|
fzyzcjy
|
9183c23eca
|
Speed up update_weights_from_tensor (#2695)
|
2025-01-02 02:05:19 -08:00 |
|
kk
|
148254d4db
|
Improve moe reduce sum kernel performance (#2705)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2025-01-02 01:11:06 -08:00 |
|
Xiaotong Jiang
|
a4d6d6f1dd
|
[feat]: Add math eval to CI nightly run (#2663)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-01-01 15:29:35 -08:00 |
|
Shi Shuai
|
062c48d2bd
|
[Docs] Add Support for Pydantic Structured Output Format (#2697)
|
2025-01-01 15:08:43 -08:00 |
|
kk
|
b6e0cfb5e1
|
ROCm base image update (#2692)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2025-01-01 12:12:19 +08:00 |
|
Chayenne
|
0d8d97b8e6
|
Doc: Rename contribution_guide.md (#2691)
|
2024-12-31 14:35:48 -08:00 |
|
Shi Shuai
|
0a765bbccc
|
Docs: Refactor Contribution Guide (#2690)
|
2024-12-31 14:11:00 -08:00 |
|
Xiaoyu Zhang
|
286cad3ee3
|
h200 tuning fused_moe_triton config for Mixtral 8x7B/8x22B and Qwen2 57BA14B (#2689)
|
2024-12-31 23:17:36 +08:00 |
|
Ying Sheng
|
dc7eb01f19
|
[Fix] fix openai adapter (#2685)
|
2024-12-31 10:48:19 +00:00 |
|
Lianmin Zheng
|
b0524c3789
|
Eagle speculative decoding part 2: Fix cuda graph + DP attention hanging (#2684)
Co-authored-by: yukavio <kavioyu@gmail.com>
|
2024-12-31 02:25:05 -08:00 |
|
Lianmin Zheng
|
6c42fa229d
|
Update README.md (#2683)
|
2024-12-31 00:13:10 -08:00 |
|
Yineng Zhang
|
d49b13c6f8
|
feat: use CUDA 12.4 by default (for FA3) (#2682)
|
2024-12-31 15:52:09 +08:00 |
|
Yineng Zhang
|
bedc4c7a50
|
misc: update CODEOWNERS (#2680)
|
2024-12-31 15:04:50 +08:00 |
|
Lianmin Zheng
|
f44d143949
|
Support target model verification in the attention backend (#2678)
Co-authored-by: yukavio <kavioyu@gmail.com>
|
2024-12-30 22:58:55 -08:00 |
|
Yineng Zhang
|
b6b57fc200
|
minor: cleanup sgl-kernel (#2679)
|
2024-12-31 14:52:00 +08:00 |
|
Ke Bao
|
b4403985d0
|
Add cutlass submodule for sgl-kernel (#2676)
|
2024-12-31 14:28:29 +08:00 |
|
Lianmin Zheng
|
339c69a243
|
Improve the computation for time_per_output_token Prometheus metrics (#2674)
|
2024-12-30 21:40:14 -08:00 |
|
fzyzcjy
|
f707470019
|
CI: Update scripts to fail fast (#2672)
|
2024-12-30 19:04:01 -08:00 |
|
Lianmin Zheng
|
21ec66e59e
|
Minor follow-up fixes for the logprob refactor (#2670)
|
2024-12-30 05:42:08 -08:00 |
|
HAI
|
c5210dfa38
|
AMD DeepSeek_V3 FP8 Numerical fix (#2667)
|
2024-12-30 21:31:12 +08:00 |
|
mobicham
|
a29dd9501d
|
Add GemLite caching after each capture (#2669)
|
2024-12-30 05:27:29 -08:00 |
|
Lianmin Zheng
|
9c6ba2484f
|
Refactor logprob computation to return the real logprob used in sampling (#2664)
|
2024-12-30 04:51:38 -08:00 |
|
Ke Bao
|
b02da24a5b
|
Refactor sgl-kernel build (#2642)
|
2024-12-30 18:07:01 +08:00 |
|
Lianmin Zheng
|
bdd2827a80
|
Update structured_outputs.ipynb (#2666)
|
2024-12-30 00:46:41 -08:00 |
|
Lianmin Zheng
|
8c3b420eec
|
[Docs] clean up structured outputs docs (#2654)
|
2024-12-29 23:57:16 -08:00 |
|
HAI
|
e6f523b5f2
|
fix typo in python/sglang/srt/layers/quantization/fp8.py (#2655)
|
2024-12-29 23:45:02 -08:00 |
|
Lianmin Zheng
|
3231817861
|
Revert "[feat] Add math eval to CI" (#2656)
|
2024-12-30 15:05:50 +08:00 |
|
Xiaotong Jiang
|
a11f8d5f6a
|
[feat] Add math eval to CI (#2652)
|
2024-12-30 14:49:41 +08:00 |
|
Yineng Zhang
|
098d659c0e
|
docs: update README (#2651)
|
2024-12-30 13:33:29 +08:00 |
|
Lzhang-hub
|
76d14f8cb9
|
add 2*h20 node serving example for deepseek v3 (#2650)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-12-30 13:04:38 +08:00 |
|
Lianmin Zheng
|
b08c308ebc
|
Update the timeout in nightly-test.yml (#2649)
|
2024-12-29 14:51:07 -08:00 |
|
Lianmin Zheng
|
03d5fbfd44
|
Release 0.4.1.post3 - upload the config.json to PyPI (#2647)
|
2024-12-29 14:25:53 -08:00 |
|