|
|
1902c81fdd
|
fix issue of loading weight
|
2026-06-30 09:55:13 +08:00 |
|
|
|
f89bc60d59
|
fix multiple issues
|
2026-06-26 17:23:55 +08:00 |
|
|
|
810874ddb8
|
enable prefix caching
|
2026-06-26 13:27:52 +08:00 |
|
|
|
c84151eef9
|
fix issues
|
2026-06-26 12:55:02 +08:00 |
|
liwei02
|
3d62430fd7
|
调整配置参数
|
2026-06-25 17:36:43 +08:00 |
|
|
|
72aa7e690a
|
Add README and start commands
|
2026-06-23 17:17:22 +08:00 |
|
|
|
b5806731e0
|
some op overhead optimization
|
2026-06-19 11:19:39 +08:00 |
|
|
|
47a4d9e72a
|
fix no reasoning token issue
|
2026-06-18 12:21:05 +08:00 |
|
|
|
3b8a567e9e
|
fix serving issues when requesting real data
|
2026-06-12 17:57:23 +08:00 |
|
|
|
50e3a05fb0
|
fix incorrect MoE step to ensure decoding speed
|
2026-06-12 11:44:50 +08:00 |
|
|
|
629f878c28
|
initial commit for qwen3.6-moe adaptation
|
2026-06-12 10:10:49 +08:00 |
|
|
|
365da18436
|
Add reasoning parser mechanism + qwen3 parser + bugfixes
|
2026-06-10 18:22:29 +08:00 |
|
|
|
4ab36b51d5
|
Add qwen3_coder tool calling parser
|
2026-06-10 14:38:54 +08:00 |
|
|
|
d972854fb7
|
fix completion token statistic bug when input context is large
|
2026-06-08 15:04:34 +08:00 |
|
|
|
c2de1c83b0
|
Utilize chunked prefill + K-tiling techniques to ensure 100K context
|
2026-06-05 17:00:41 +08:00 |
|
|
|
2d1ef50992
|
chunked prefill support and memory opts
|
2026-06-05 16:03:34 +08:00 |
|
|
|
8c047a70ea
|
some modifications to ensure 50K context input
|
2026-06-04 17:56:29 +08:00 |
|
|
|
1c33ef1355
|
add paged_attn
|
2026-05-29 16:53:39 +08:00 |
|
|
|
3ef8227384
|
initial version of adding chunked attention, ensuring 20K context
|
2026-05-29 16:49:33 +08:00 |
|
|
|
0e89906481
|
Qwen3.6-27B iluvatar bi-v100 adaptation
|
2026-05-21 16:37:24 +08:00 |
|
|
|
fad74b701b
|
Update to new version of base image
|
2025-10-24 15:45:06 +08:00 |
|
|
|
ee04aead1e
|
add dataset and more models
|
2025-10-17 16:52:12 +08:00 |
|
|
|
8f07ba339a
|
Update README
|
2025-08-29 15:40:07 +08:00 |
|
zhousha
|
37e89f390e
|
update Dockerfile images
|
2025-08-25 14:19:36 +08:00 |
|
|
|
99fb9f5cb0
|
First commit
|
2025-08-05 19:02:46 +08:00 |
|
|
|
9efe891f99
|
添加 README.md
|
2025-08-04 16:57:34 +08:00 |
|