17 Commits

Author SHA1 Message Date
1902c81fdd fix issue of loading weight 2026-06-30 09:55:13 +08:00
f89bc60d59 fix multiple issues 2026-06-26 17:23:55 +08:00
c84151eef9 fix issues 2026-06-26 12:55:02 +08:00
b5806731e0 some op overhead optimization 2026-06-19 11:19:39 +08:00
47a4d9e72a fix no reasoning token issue 2026-06-18 12:21:05 +08:00
3b8a567e9e fix serving issues when requesting real data 2026-06-12 17:57:23 +08:00
50e3a05fb0 fix incorrect MoE step to ensure decoding speed 2026-06-12 11:44:50 +08:00
629f878c28 initial commit for qwen3.6-moe adaptation 2026-06-12 10:10:49 +08:00
365da18436 Add reasoning parser mechanism + qwen3 parser + bugfixes 2026-06-10 18:22:29 +08:00
4ab36b51d5 Add qwen3_coder tool calling parser 2026-06-10 14:38:54 +08:00
d972854fb7 fix completion token statistic bug when input context is large 2026-06-08 15:04:34 +08:00
c2de1c83b0 Utilize chunked prefill + K-tiling techniques to ensure 100K context 2026-06-05 17:00:41 +08:00
2d1ef50992 chunked prefill support and memory opts 2026-06-05 16:03:34 +08:00
8c047a70ea some modifications to ensure 50K context input 2026-06-04 17:56:29 +08:00
1c33ef1355 add paged_attn 2026-05-29 16:53:39 +08:00
3ef8227384 initial version of adding chunked attention, ensuring 20K context 2026-05-29 16:49:33 +08:00
0e89906481 Qwen3.6-27B iluvatar bi-v100 adaptation 2026-05-21 16:37:24 +08:00