Commit Graph

537 Commits

Author SHA1 Message Date
Lianmin Zheng
30ceccc74a Update hyperparameter_tuning.md (#7454) 2025-06-22 22:42:55 -07:00
Chang Su
72676cd6c0 feat(oai refactor): Replace openai_api with entrypoints/openai (#7351)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
2025-06-21 13:21:06 -07:00
Jinn
ab74f8f09d Remove batches api in docs & example (#7400) 2025-06-20 19:46:31 -07:00
woodx
97011abc8a [Doc] add embedding rerank doc (#7364) 2025-06-19 21:53:54 -07:00
Yineng Zhang
fadf18fdd5 docs: update installation (#7366) 2025-06-19 12:00:19 -07:00
linzhuo
1de4db9bef update invalid link in doc (#7297) 2025-06-18 01:37:36 -07:00
Yijie Zhu
a39d928782 support qwen2 running on ascend npu device (#7022)
Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>
2025-06-17 11:24:10 -07:00
Yineng Zhang
f9dc9dd28b chore: bump v0.4.7.post1 (#7248) 2025-06-16 15:20:29 -07:00
Lianmin Zheng
21615cc3fe Minor style and doc fix (#7228) 2025-06-16 01:03:13 -07:00
Lifu Huang
98538822d5 Add Phi-4-mm to supported VLM supported model list. (#7178) 2025-06-13 23:17:40 -07:00
Povilas Kanapickas
bd7cfbd2f8 [Fix] Reduce busy polling when scheduler is idle (#6026) 2025-06-12 14:58:22 -07:00
Lianmin Zheng
dbdf76ca98 Clean up docs for server args and sampling parameters (generated by grok) (#7076) 2025-06-10 19:55:42 -07:00
Ximingwang-09
f2a75a66c4 update doc (#7046)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
2025-06-11 10:02:01 +08:00
Lianmin Zheng
0f218731e3 Do not run frontend_reasoning.ipynb to reduce the CI load (#7073) 2025-06-10 17:15:31 -07:00
Yudi Xue
14c18d25df Frontend language separate reasoning support (#6031) 2025-06-10 17:11:29 -07:00
Lianmin Zheng
90bd3e32d6 Improve perf tuning docs (#7071) 2025-06-10 16:55:04 -07:00
kyle-pena-kuzco
b56de8f943 Open AI API hidden states (#6716) 2025-06-10 14:37:29 -07:00
Lianmin Zheng
bb185b0e92 Update README.md (#7040) 2025-06-10 01:59:14 -07:00
Yineng Zhang
4f723edd3b chore: bump v0.4.7 (#7038) 2025-06-10 01:56:20 -07:00
Yueyang Pan
98c00a2df1 Fix torch profiler bugs for bench_offline_throughput.py (#6557) 2025-06-09 20:33:41 +08:00
HAI
b819381fec AITER backend extension and workload optimizations (#6838)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
2025-06-05 23:00:18 -07:00
shangmingc
dd1012fcbe [PD] Fix potential perf spike caused by tracker gc and optimize doc (#6764)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-06-05 10:56:02 -07:00
zyksir
8e3797be1c support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277) 2025-06-04 22:11:24 -07:00
Xinyuan Tong
cf9815ba69 [Refactor] Multimodal data processing for VLM (#6659)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-06-04 11:22:33 -07:00
Marc Sun
37f1547587 [FEAT] Add transformers backend support (#5929) 2025-06-03 21:05:29 -07:00
Lianmin Zheng
2d72fc47cf Improve profiler and integrate profiler in bench_one_batch_server (#6787) 2025-05-31 15:53:55 -07:00
shangmingc
6cb00c6398 [PD] Optimize time out logic and add env var doc for mooncake (#6761)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-05-30 00:45:02 -07:00
Baizhou Zhang
791b3bfabb [Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell (#6479) 2025-05-28 16:03:43 -07:00
Trevor Morris
e806f708c9 [PD] Make bootstrap code common between NIXL and Mooncake (#6473) 2025-05-27 12:47:38 -07:00
Vincent Zhong
45a31a82e4 docs: Update documentation to reflect xgrammar as default grammar backend (#6601)
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
2025-05-27 13:29:13 +08:00
Brayden Zhong
1aa0fbf416 Add note to add supported model to documentation (#6640) 2025-05-27 13:18:46 +08:00
linzhuo
7a0bbe6a64 update toc for doc and dockerfile code style format (#6450)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-05-27 13:05:11 +08:00
fzyzcjy
25be63d0b2 Auto handle PD disaggregation in bench_serving (#6587)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-05-25 22:41:27 -07:00
simveit
e235be16fe Fix some issues with current docs. (#6588) 2025-05-26 01:04:34 +08:00
Yineng Zhang
7e257cd666 chore: bump v0.4.6.post5 (#6566) 2025-05-24 00:48:05 -07:00
Chang Su
ed0c3035cd feat(Tool Calling): Support required and specific function mode (#6550) 2025-05-23 21:00:37 -07:00
ryang
a6ae3af15e Support XiaomiMiMo inference with mtp (#6059) 2025-05-22 14:14:49 -07:00
Byron Hsu
7513558074 [PD] Add doc and simplify sender.send (#6019) 2025-05-21 21:22:21 -07:00
Wenxuan Tan
66324895c6 [docs] Fix torch version (#6472) 2025-05-20 10:53:14 -07:00
fzyzcjy
f0653886a5 Expert distribution recording without overhead for EPLB (#4957) 2025-05-19 20:07:43 -07:00
simveit
506e5de8fe Improve supported models doc (#6430) 2025-05-20 01:43:35 +08:00
applesaucethebun
6dc6b30637 Add missing model to doc (#6396)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-18 12:57:58 -07:00
Vincent Zhong
e9ef39d2e9 docs: Update the MD files (#6373)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-17 09:23:16 -07:00
Kiv Chen
64825b8395 model(vlm): mistral 3.1 (#5099)
Co-authored-by: KivenChen <sleigh-queue-0y@icloud.com>
2025-05-16 18:36:18 -07:00
Yury Sulsky
f19a9204cd Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136)
Co-authored-by: Yury Sulsky <ysulsky@tesla.com>
2025-05-16 12:26:15 -07:00
quinnrong94
2e4babdb0a [Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109)
Co-authored-by: Yingyi <yingyihuang2000@outlook.com>
Co-authored-by: neiltian <neiltian@tencent.com>
Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com>
Co-authored-by: kexueyu <kexueyu@tencent.com>
Co-authored-by: vincentmeng <vincentmeng@tencent.com>
Co-authored-by: pengmeng <pengmeng@tencent.com>
2025-05-15 00:48:09 -07:00
Brayden Zhong
9a91fa0ed1 docs: fix a bad redirect (#6300) 2025-05-14 10:27:19 -07:00
Mick
cd7c8a8de6 doc: update developer guide regarding mllms (#6138)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: XinyuanTong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-14 23:13:13 +08:00
Yineng Zhang
16267d4fa7 chore: bump v0.4.6.post4 (#6245) 2025-05-13 01:57:51 -07:00
Kiv Chen
5380cd7ea3 model(vlm): pixtral (#5084) 2025-05-13 00:16:10 -07:00