Commit Graph

715 Commits

Author SHA1 Message Date
Zijian
31d6dee5c4 Support VILA models (#6106) 2025-06-11 11:47:25 -07:00
Baizhou Zhang
2a5f0100e0 Fix GGuf and add back test_gguf.py (#7067) 2025-06-10 21:07:20 -07:00
Yudi Xue
14c18d25df Frontend language separate reasoning support (#6031) 2025-06-10 17:11:29 -07:00
Brayden Zhong
ca9291181d [Feature] Add Logit Bias (#6579)
Co-authored-by: Cinjon Resnick <cinjon.resnick@gmail.com>
2025-06-10 15:39:25 -07:00
kyle-pena-kuzco
b56de8f943 Open AI API hidden states (#6716) 2025-06-10 14:37:29 -07:00
Yineng Zhang
2f58445531 Revert "Add sanity checks when a test file is not added to CI (#6947)" (#7063) 2025-06-10 12:43:25 -07:00
fzyzcjy
fe55947acd Add sanity checks when a test file is not added to CI (#6947) 2025-06-10 12:34:57 -07:00
Baizhou Zhang
3b014bc13d Fix test_lora.py CI (#7061) 2025-06-10 12:24:46 -07:00
Lianmin Zheng
019851d099 Fix eagle on AMD (#7051) 2025-06-10 05:22:40 -07:00
YanbingJiang
fcde67b016 CPU: map changes from developing branch in sgl-kernel (#6833)
Co-authored-by: mingfeima <mingfei.ma@intel.com>
2025-06-10 01:08:15 -07:00
Emmanuel Ferdman
f40942ad63 Migrate to assertEqual (#6741)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-06-09 16:47:39 -07:00
Lianmin Zheng
dc0705a504 Simplify prepare_extend_after_decode (#6987) 2025-06-09 16:39:21 -07:00
Sai Enduri
3465d7ae78 Update amd nightly models CI. (#6992) 2025-06-09 10:54:08 -07:00
Yineng Zhang
56ccd3c22c chore: upgrade flashinfer v0.2.6.post1 jit (#6958)
Co-authored-by: alcanderian <alcanderian@gmail.com>
Co-authored-by: Qiaolin Yu <qy254@cornell.edu>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
2025-06-09 09:22:39 -07:00
Pan Lyu
451ffe74d9 support qwen3 emebedding (#6990) 2025-06-09 01:32:49 -07:00
Sai Enduri
2c18642502 Enable more unit tests for AMD CI. (#6983) 2025-06-08 19:41:55 -07:00
Lianmin Zheng
9ecb18568b Fix triton sliding window test case (#6981) 2025-06-08 17:20:46 -07:00
Ke Bao
cc74499d51 Fix draft extend ut stability with flush cache (#6979) 2025-06-08 17:09:32 -07:00
Lianmin Zheng
0c1f03a23d Sync cuda graph runners (#6976) 2025-06-08 16:12:25 -07:00
Lianmin Zheng
20d3ad3b58 Fix CI and triton moe Configs (#6974) 2025-06-08 05:06:46 -07:00
Hubert Lu
4740288303 [AMD] Add more tests to per-commit-amd (#6926) 2025-06-08 01:08:37 -07:00
HAI
b819381fec AITER backend extension and workload optimizations (#6838)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
2025-06-05 23:00:18 -07:00
Zaili Wang
562f279a2d [CPU] enable CI for PRs, add Dockerfile and auto build task (#6458)
Co-authored-by: diwei sun <diwei.sun@intel.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-06-05 13:43:54 -07:00
Chang Su
8b2474898b bugfix(OAI): Fix image_data processing for jinja chat templates (#6877) 2025-06-05 13:37:01 -07:00
fzyzcjy
0de5e7d40f Support layerwise rebalancing experts (#6851) 2025-06-05 00:05:52 -07:00
zyksir
8e3797be1c support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277) 2025-06-04 22:11:24 -07:00
Lifu Huang
4474eaf552 Support LoRA in TestOpenAIVisionServer and fix fused kv_proj loading bug. (#6861) 2025-06-04 22:08:30 -07:00
ishandhanani
f0f84975f4 feat: add dp-rank to KV events (#6852) 2025-06-04 15:29:34 -07:00
Chanh Nguyen
3f1e433903 Decoder-only Scoring API (#6460)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
2025-06-04 14:14:54 -07:00
Xinyuan Tong
cf9815ba69 [Refactor] Multimodal data processing for VLM (#6659)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-06-04 11:22:33 -07:00
Marc Sun
37f1547587 [FEAT] Add transformers backend support (#5929) 2025-06-03 21:05:29 -07:00
jianan-gu
ff00895c46 Add CPU optimized kernels for topk and rope fusions (#6456) 2025-06-02 17:37:34 -07:00
Ke Bao
a2cb5913a0 Add draft extend CUDA graph for flashinfer backend (#6805) 2025-06-02 01:51:26 -07:00
Ravi Theja
c6a0cacc35 Update CI tests for Llama4 models (#6421) 2025-06-01 11:52:15 +08:00
Lianmin Zheng
2d72fc47cf Improve profiler and integrate profiler in bench_one_batch_server (#6787) 2025-05-31 15:53:55 -07:00
Chang Su
e39bca0756 ci: relax test_function_call_required (#6786) 2025-05-30 19:18:42 -07:00
Jianan Ji
a2bb856543 Temporarily lower mmlu threshold for triton sliding window backend (#6785) 2025-05-30 18:40:50 -07:00
Chang Su
f18b068f15 feat(tool call): Enhance Llama32Detector for improved JSON parsing in non-stream (#6784) 2025-05-30 17:05:17 -07:00
Chao Yang
4fac524b14 update llama4 chat template and pythonic parser (#6679)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-05-30 17:01:22 -07:00
Jianan Ji
22630ca242 Support sliding window in triton backend (#6509) 2025-05-30 01:11:53 -07:00
Chang Su
c673727e0e refactor(tool call): Fix BaseFormatDetector tool_index issue and refactor parse_streaming_increment (#6715) 2025-05-29 00:08:45 -07:00
iLeGend
e06b076105 Fix PP for Qwen3 MoE (#6709) 2025-05-28 23:06:18 -07:00
shangmingc
d63e76f735 [CI] Fix setup of disaggregation with different tp (#6706)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-05-28 11:17:27 -07:00
shangmingc
c25231c679 [CI] Fix flaky pp single node test (#6689)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-05-28 00:40:26 -07:00
Sai Enduri
f4a8987f69 Update amd docker and nightly models. (#6687) 2025-05-28 00:08:08 -07:00
Chang Su
41ba767f0c feat: Add warnings for invalid tool_choice and UTs (#6582) 2025-05-27 16:53:19 -07:00
Ke Bao
f127355a30 Add batch test for draft extend (#6672) 2025-05-27 16:32:05 -07:00
fzyzcjy
87068b5cc7 Support gathering expert distribution details (#6665) 2025-05-27 15:32:59 -07:00
Junrong Lin
2103b80607 [CI] update verlengine ci to 4-gpu test (#6007) 2025-05-27 14:32:23 -07:00
Sai Enduri
eb8f02dd87 Update nightly thresholds and dependencies. (#6635) 2025-05-26 11:44:13 -07:00