Lianmin Zheng
|
676a7b51bd
|
make --speculative-draft-model an alias of --speculative-draft-model-path (#10246)
|
2025-09-09 19:12:24 -07:00 |
|
Kevin Tuan
|
15f993472c
|
refactor(InternVL): Use gpu to preprocess the input image (#9795)
|
2025-09-09 19:09:04 -07:00 |
|
Lianmin Zheng
|
bcf1955f7e
|
Revert "chore: upgrade v0.3.9 sgl-kernel" (#10245)
|
2025-09-09 19:05:20 -07:00 |
|
Lianmin Zheng
|
a06bf66425
|
[Auto Sync] Update collector.py, startup_func_log_and_timer... (20250910) (#10242)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: cctry <shiyang@x.ai>
|
2025-09-09 18:05:16 -07:00 |
|
Lianmin Zheng
|
bf72b80122
|
[Auto Sync] Update io_struct.py (20250909) (#10236)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: cctry <shiyang@x.ai>
|
2025-09-09 14:15:21 -07:00 |
|
Teng Ma
|
8471e5e616
|
[HiCache] feat: add mooncake backend extra config (#10213)
|
2025-09-09 12:50:00 -07:00 |
|
Lianmin Zheng
|
4582931ac3
|
Revert "Revert the changes on NCCL symmetric memory" (#10238)
|
2025-09-09 12:11:49 -07:00 |
|
Lianmin Zheng
|
d352c29aa0
|
Revert the changes on NCCL symmetric memory (#10210)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-09-09 11:01:33 -07:00 |
|
Yineng Zhang
|
d3ee70985f
|
chore: upgrade v0.3.9 sgl-kernel (#10220)
|
2025-09-09 03:16:25 -07:00 |
|
Rain H
|
71fc7b7fad
|
[Fix] KV-cache eviction mismatch across PP ranks in DeepSeek V3/R1 (#10214)
|
2025-09-09 02:07:13 -07:00 |
|
shaharmor98
|
9ab72f9895
|
add variable TP Decode > Prefill size support (#9960)
Signed-off-by: Shahar Mor <smor@nvidia.com>
|
2025-09-09 16:47:26 +08:00 |
|
Lianmin Zheng
|
71133a0426
|
[Auto Sync] Update sampling_batch_info.py (20250909) (#10212)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: cctry <shiyang@x.ai>
|
2025-09-09 01:29:52 -07:00 |
|
Shangming Cai
|
f5f6b3b4b5
|
Refactor fused_add_rmsnorm import logic (#10207)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-09 00:23:58 -07:00 |
|
Yineng Zhang
|
94fb4e9e54
|
feat: support fa cute in sgl-kernel (#10205)
Co-authored-by: cicirori <32845984+cicirori@users.noreply.github.com>
|
2025-09-09 00:14:39 -07:00 |
|
blzheng
|
d1d4074c4e
|
[CPU] Add gelu_and_mul kernel in sgl-kernel and add ut (#9300)
|
2025-09-08 23:23:13 -07:00 |
|
DarkSharpness
|
948b01a04c
|
[Refactor] Remove Hicache Load & Write threads (#10127)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-09-08 22:18:50 -07:00 |
|
wenhuipeng
|
16ff3d4b05
|
Support opt model (#10165)
|
2025-09-09 12:45:00 +08:00 |
|
Liangsheng Yin
|
83d55ac51f
|
[1/N]DP refactor: Improve dp rank scheduling in PD disaggregation mode. (#10169)
|
2025-09-09 12:27:55 +08:00 |
|
blzheng
|
97fff98c68
|
[CPU] Fix phi4-mm prompt issue in bench_serving (#9900)
|
2025-09-08 20:12:32 -07:00 |
|
Caproni
|
96784a65fd
|
[Fix] Orphan process in data parallel (#7995)
Signed-off-by: Capronir <839972205@qq.com>
|
2025-09-09 11:09:09 +08:00 |
|
Rain Jiang
|
df5407fb53
|
Revert "feat: add fused moe config for Qwen3-30B-A3B on B200" (#10185)
|
2025-09-08 18:11:15 -07:00 |
|
Baizhou Zhang
|
8ad700f735
|
Cleaning codes for speculative attention mode (#10149)
|
2025-09-08 17:38:06 -07:00 |
|
Rain Jiang
|
7a40e4f4a6
|
fix the cutlass moe tests (#10182)
|
2025-09-08 16:24:55 -07:00 |
|
Yineng Zhang
|
19d64f2b72
|
fix: resolve lint issue (#10181)
|
2025-09-08 15:09:55 -07:00 |
|
Teng Ma
|
a02071a12c
|
[Bench] feat: mooncake trace integration (#9839)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Teng Ma <sima.mt@alibaba-inc.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
|
2025-09-09 02:50:54 +08:00 |
|
Yineng Zhang
|
45b3a6a256
|
Revert "[ModelOpt] Fix Weight Loading for DSR1-FP4 Quantization (#9712)" (#10176)
|
2025-09-08 11:28:15 -07:00 |
|
LukasBluebaum
|
9a18aa54c2
|
[fix] Relax white space rules in EBNFComposer (#9595)
|
2025-09-08 10:47:19 -07:00 |
|
Zhiy-Zhang
|
91f0fd95a4
|
pref: Add H20 fp8 fused MoE kernel configs for Qwen3 (#10166)
Co-authored-by: qiufan.zzy <qiufan.zzy@antgroup.com>
|
2025-09-08 09:57:21 -07:00 |
|
alanhe151220037
|
8085aca791
|
[Bug fix] Fix ascend mla in aclgraph (#9925)
|
2025-09-08 09:49:43 -07:00 |
|
Liangsheng Yin
|
72f9fc5f11
|
Monkey patch uvicorn multi worker is_alive timeout (#10159)
Co-authored-by: Huang Long <121648372+llll114@users.noreply.github.com>
|
2025-09-08 17:43:23 +08:00 |
|
hzh0425
|
ec99668ab7
|
[Hicache]: Add E2E CI For 3FS-KVStore (#10131)
|
2025-09-08 01:54:50 -07:00 |
|
Liangsheng Yin
|
78f139812a
|
[1/N] DP-Refactor: move communicators into tokenizer_communicator_mixin (#10028)
|
2025-09-08 16:27:37 +08:00 |
|
Swipe4057
|
bfd7a18d8d
|
update xgrammar 0.1.24 and transformers 4.56.1 (#10155)
|
2025-09-08 01:20:31 -07:00 |
|
ssshinigami
|
5dd8c6444b
|
[Bug fix] Fix Gemma 2 and fix Gemma 3 multimodal with bs > 1 on NPU (#9871)
Co-authored-by: Maksim <makcum888e@mail.ru>
|
2025-09-08 01:19:40 -07:00 |
|
Huaiyu, Zheng
|
ee21817c6b
|
enable llama3.1-8B on xpu (#9434)
|
2025-09-07 22:34:20 -07:00 |
|
Yineng Zhang
|
b7d1f17b8d
|
Revert "enable auto-round quantization model (#6226)" (#10148)
|
2025-09-07 22:31:11 -07:00 |
|
Weiwei
|
c8295d2353
|
enable auto-round quantization model (#6226)
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
|
2025-09-07 22:05:35 -07:00 |
|
Even Zhou
|
b67c277f86
|
[Bugfix] Qwen3MoE aclrtMemcpy failed with NPUGraph (#10013)
|
2025-09-07 21:50:49 -07:00 |
|
Xinyuan Tong
|
8116804e4f
|
Fix: (glm4v) Add missing field (#10147)
|
2025-09-07 21:47:14 -07:00 |
|
cicirori
|
8c5930f08a
|
Add speculator attention backend switch (#9981)
|
2025-09-07 21:44:36 -07:00 |
|
Zhiqiang Xie
|
3b99f23c44
|
[Bugfix] Retract not releasing enough memory when page size > 1 (#9989)
|
2025-09-07 21:41:50 -07:00 |
|
Cao E
|
7577f0e40f
|
Add graph runner support with torch compile on CPU (#7843)
|
2025-09-07 21:33:58 -07:00 |
|
Qiaolin Yu
|
8cda5a622c
|
Standalone speculative decoding (#10090)
|
2025-09-07 20:55:09 -07:00 |
|
kk
|
400d3b97ae
|
Fix run time error in dsv3-fp8 model on mi35x (#10104)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-09-07 20:45:17 -07:00 |
|
Lzhang-hub
|
37d83c6e6d
|
Qwen2.5-VL eagle3 infer (#8801)
|
2025-09-07 20:44:34 -07:00 |
|
Rain Jiang
|
7802586cab
|
fix the fp8 topk_config.correction_bias is none bug (#10040)
|
2025-09-07 20:28:14 -07:00 |
|
fzyzcjy
|
bc5fc332f7
|
Fix slow fused add RMSNorm (#10141)
|
2025-09-07 20:20:39 -07:00 |
|
Xinyuan Tong
|
f3440adcb5
|
vlm: enable GLM4.1V server testing & fix video processing (#10095)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
|
2025-09-08 03:53:08 +01:00 |
|
Cheng Wan
|
5a7e10fe4c
|
[MoE] fix: incorrect weight initialization for cutlass_fused_experts_fp8 (#10144)
|
2025-09-07 19:43:59 -07:00 |
|
Shisong Ma
|
33467c05a4
|
[BUG FIX] add fail check when get fail in case wait complete block (#9971)
Co-authored-by: mashisong <mashisong@bytedance.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-09-07 18:34:04 -07:00 |
|