Commit Graph

22 Commits

Author SHA1 Message Date
liwei02
3d62430fd7 调整配置参数 2026-06-25 17:36:43 +08:00
72aa7e690a Add README and start commands 2026-06-23 17:17:22 +08:00
b5806731e0 some op overhead optimization 2026-06-19 11:19:39 +08:00
47a4d9e72a fix no reasoning token issue 2026-06-18 12:21:05 +08:00
3b8a567e9e fix serving issues when requesting real data 2026-06-12 17:57:23 +08:00
50e3a05fb0 fix incorrect MoE step to ensure decoding speed 2026-06-12 11:44:50 +08:00
629f878c28 initial commit for qwen3.6-moe adaptation 2026-06-12 10:10:49 +08:00
365da18436 Add reasoning parser mechanism + qwen3 parser + bugfixes 2026-06-10 18:22:29 +08:00
4ab36b51d5 Add qwen3_coder tool calling parser 2026-06-10 14:38:54 +08:00
d972854fb7 fix completion token statistic bug when input context is large 2026-06-08 15:04:34 +08:00
c2de1c83b0 Utilize chunked prefill + K-tiling techniques to ensure 100K context 2026-06-05 17:00:41 +08:00
2d1ef50992 chunked prefill support and memory opts 2026-06-05 16:03:34 +08:00
8c047a70ea some modifications to ensure 50K context input 2026-06-04 17:56:29 +08:00
1c33ef1355 add paged_attn 2026-05-29 16:53:39 +08:00
3ef8227384 initial version of adding chunked attention, ensuring 20K context 2026-05-29 16:49:33 +08:00
0e89906481 Qwen3.6-27B iluvatar bi-v100 adaptation 2026-05-21 16:37:24 +08:00
fad74b701b Update to new version of base image 2025-10-24 15:45:06 +08:00
ee04aead1e add dataset and more models 2025-10-17 16:52:12 +08:00
8f07ba339a Update README 2025-08-29 15:40:07 +08:00
zhousha
37e89f390e update Dockerfile images 2025-08-25 14:19:36 +08:00
99fb9f5cb0 First commit 2025-08-05 19:02:46 +08:00
9efe891f99 添加 README.md 2025-08-04 16:57:34 +08:00