zixuanzhang226
|
f3cbd24541
|
feat: send kvmetrics from sglang scheduler (#6721)
|
2025-06-25 01:57:49 -07:00 |
|
Chunyuan WU
|
7eb47b0f3d
|
[CPU] [BF16] Call fused_experts_cpu, weight_packed_linear and bmm_cpu kernel in DeepSeek model (#6641)
Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-06-25 01:43:33 -07:00 |
|
Chang Su
|
112b496a6c
|
misc: Improvement to serving_chat.py and add more ut (#7489)
|
2025-06-24 17:19:51 -07:00 |
|
Chang Su
|
fa42e41962
|
ci: Revert openai_server related tests in AMD suites (#7449)
|
2025-06-23 15:28:22 -07:00 |
|
Chang Su
|
34b6b8426f
|
feat(func_call): Add more check in BaseFormatDetector.parse_streaming_increment (#7479)
|
2025-06-23 11:15:47 -07:00 |
|
Chang Su
|
b7a2df0a44
|
refactor(test): reorganize OpenAI test file structure (#7408)
|
2025-06-21 19:37:48 -07:00 |
|
Chang Su
|
72676cd6c0
|
feat(oai refactor): Replace openai_api with entrypoints/openai (#7351)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
|
2025-06-21 13:21:06 -07:00 |
|
Keyang Ru
|
5e7fdc79fa
|
[OAI Server Refactor] [ChatCompletions & Completions] Support Return Hidden State (#7329)
Signed-off-by: keru <rukeyang@gmail.com>
|
2025-06-20 19:18:53 -07:00 |
|
Cheng Wan
|
e879d8b7a8
|
[Feature] Comprehensive Hybrid Parallelism Support (#6389)
|
2025-06-20 14:43:11 -07:00 |
|
Xinyuan Tong
|
0998808009
|
Refine OpenAI serving entrypoint to remove batch requests (#7372)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Chang Su <csu272@usc.edu>
|
2025-06-20 14:33:43 -07:00 |
|
Ata Fatahi
|
1ab6be1b26
|
Purge VerlEngine (#7326)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
|
2025-06-19 23:47:21 -07:00 |
|
woodx
|
4df5fc2156
|
Feat/refactor embedding server (#7322)
|
2025-06-19 23:46:01 -07:00 |
|
Stefan He
|
3774f07825
|
Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately (#7099)
|
2025-06-19 00:56:37 -07:00 |
|
Chunyuan WU
|
9179ea1595
|
add seed in CPU UTs to avoid flaky failure (#7333)
|
2025-06-18 19:12:14 -07:00 |
|
Jinn
|
ffd1a26e09
|
Add more refactored openai test & in CI (#7284)
|
2025-06-18 13:52:55 -07:00 |
|
YanbingJiang
|
094c116f7d
|
Update python API of activation, topk, norm and rope and remove vllm dependency (#6614)
Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Co-authored-by: sdp <sdp@gnr799219.jf.intel.com>
|
2025-06-17 22:11:50 -07:00 |
|
Chang Su
|
fc554105f6
|
ci: Fix test_ebnf_generate_all_optional_function_params (#7288)
|
2025-06-17 16:39:42 -07:00 |
|
Chang Su
|
e726131523
|
bugfix(tool call ebnf): Fix EBNF generation for optional function parameters (#7283)
|
2025-06-17 13:36:07 -07:00 |
|
u4lr451
|
10d60cd41b
|
feat: mtp support dp-attention (#6081)
Co-authored-by: austindeng <austindeng@tencent.com>
Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: ch-wan <cwan39@gatech.edu>
|
2025-06-17 00:33:28 -07:00 |
|
Xinyuan Tong
|
70c471a868
|
[Refactor] OAI Server components (#7167)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-06-16 20:45:20 -07:00 |
|
KavioYu
|
873ae12cee
|
support custom weight loader for model runner (#7122)
Co-authored-by: kavioyu <kavioyu@tencent.com>
|
2025-06-16 16:28:15 -07:00 |
|
Sai Enduri
|
62a7aa2efc
|
Update CI flakes. (#7244)
|
2025-06-16 15:19:32 -07:00 |
|
woodx
|
e30ef368ab
|
Feat/support rerank (#6058)
|
2025-06-16 10:50:01 -07:00 |
|
Lianmin Zheng
|
53a525bf33
|
[Eagle] Fix kernel call after updating speculative sampling kernels (#7231)
|
2025-06-16 07:25:59 -07:00 |
|
Lianmin Zheng
|
b1286a116a
|
[EAGLE] Refactor code for page size > 1 & more simplifications (#7213)
|
2025-06-16 03:04:29 -07:00 |
|
Lianmin Zheng
|
fff10809bf
|
Revert "[EAGLE] Refactor code for page size > 1 & more simplifications" (#7210)
|
2025-06-15 02:48:00 -07:00 |
|
Lianmin Zheng
|
5f1ab32717
|
[EAGLE] Refactor code for page size > 1 & more simplifications (#7163)
|
2025-06-14 23:16:23 -07:00 |
|
Lianmin Zheng
|
a023856b12
|
Move host memory pools into a separate file (#7200)
|
2025-06-14 21:31:42 -07:00 |
|
Byron Hsu
|
db0cc57e75
|
[PD] Support decode retract and update decode.py (#7196)
|
2025-06-14 19:48:05 -07:00 |
|
Lianmin Zheng
|
f47a1b1d0f
|
Increase timeout in test/srt/test_disaggregation.py (#7175)
|
2025-06-13 23:12:14 -07:00 |
|
Lianmin Zheng
|
ba589b88fc
|
Improve test cases for eagle infer (#7173)
|
2025-06-13 22:25:13 -07:00 |
|
Jinn
|
50876abc47
|
Add test for refactored openai server (#7161)
|
2025-06-13 20:42:57 -07:00 |
|
Lianmin Zheng
|
0fc3d992bb
|
Split the eagle test into two files (#7170)
|
2025-06-13 20:14:26 -07:00 |
|
Zijian
|
31d6dee5c4
|
Support VILA models (#6106)
|
2025-06-11 11:47:25 -07:00 |
|
Baizhou Zhang
|
2a5f0100e0
|
Fix GGuf and add back test_gguf.py (#7067)
|
2025-06-10 21:07:20 -07:00 |
|
Yudi Xue
|
14c18d25df
|
Frontend language separate reasoning support (#6031)
|
2025-06-10 17:11:29 -07:00 |
|
Brayden Zhong
|
ca9291181d
|
[Feature] Add Logit Bias (#6579)
Co-authored-by: Cinjon Resnick <cinjon.resnick@gmail.com>
|
2025-06-10 15:39:25 -07:00 |
|
kyle-pena-kuzco
|
b56de8f943
|
Open AI API hidden states (#6716)
|
2025-06-10 14:37:29 -07:00 |
|
Yineng Zhang
|
2f58445531
|
Revert "Add sanity checks when a test file is not added to CI (#6947)" (#7063)
|
2025-06-10 12:43:25 -07:00 |
|
fzyzcjy
|
fe55947acd
|
Add sanity checks when a test file is not added to CI (#6947)
|
2025-06-10 12:34:57 -07:00 |
|
Baizhou Zhang
|
3b014bc13d
|
Fix test_lora.py CI (#7061)
|
2025-06-10 12:24:46 -07:00 |
|
Lianmin Zheng
|
019851d099
|
Fix eagle on AMD (#7051)
|
2025-06-10 05:22:40 -07:00 |
|
YanbingJiang
|
fcde67b016
|
CPU: map changes from developing branch in sgl-kernel (#6833)
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-06-10 01:08:15 -07:00 |
|
Emmanuel Ferdman
|
f40942ad63
|
Migrate to assertEqual (#6741)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
|
2025-06-09 16:47:39 -07:00 |
|
Lianmin Zheng
|
dc0705a504
|
Simplify prepare_extend_after_decode (#6987)
|
2025-06-09 16:39:21 -07:00 |
|
Sai Enduri
|
3465d7ae78
|
Update amd nightly models CI. (#6992)
|
2025-06-09 10:54:08 -07:00 |
|
Yineng Zhang
|
56ccd3c22c
|
chore: upgrade flashinfer v0.2.6.post1 jit (#6958)
Co-authored-by: alcanderian <alcanderian@gmail.com>
Co-authored-by: Qiaolin Yu <qy254@cornell.edu>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
|
2025-06-09 09:22:39 -07:00 |
|
Pan Lyu
|
451ffe74d9
|
support qwen3 emebedding (#6990)
|
2025-06-09 01:32:49 -07:00 |
|
Sai Enduri
|
2c18642502
|
Enable more unit tests for AMD CI. (#6983)
|
2025-06-08 19:41:55 -07:00 |
|
Lianmin Zheng
|
9ecb18568b
|
Fix triton sliding window test case (#6981)
|
2025-06-08 17:20:46 -07:00 |
|