|
|
365da18436
|
Add reasoning parser mechanism + qwen3 parser + bugfixes
|
2026-06-10 18:22:29 +08:00 |
|
|
|
4ab36b51d5
|
Add qwen3_coder tool calling parser
|
2026-06-10 14:38:54 +08:00 |
|
|
|
d972854fb7
|
fix completion token statistic bug when input context is large
|
2026-06-08 15:04:34 +08:00 |
|
|
|
c2de1c83b0
|
Utilize chunked prefill + K-tiling techniques to ensure 100K context
|
2026-06-05 17:00:41 +08:00 |
|
|
|
2d1ef50992
|
chunked prefill support and memory opts
|
2026-06-05 16:03:34 +08:00 |
|
|
|
8c047a70ea
|
some modifications to ensure 50K context input
|
2026-06-04 17:56:29 +08:00 |
|
|
|
1c33ef1355
|
add paged_attn
|
2026-05-29 16:53:39 +08:00 |
|
|
|
3ef8227384
|
initial version of adding chunked attention, ensuring 20K context
|
2026-05-29 16:49:33 +08:00 |
|
|
|
0e89906481
|
Qwen3.6-27B iluvatar bi-v100 adaptation
|
2026-05-21 16:37:24 +08:00 |
|