enginex-vllm-bi100-qwen36

Author	SHA1	Message	Date
Lu Xinlong	b5806731e0	some op overhead optimization	2026-06-19 11:19:39 +08:00
Lu Xinlong	47a4d9e72a	fix no reasoning token issue	2026-06-18 12:21:05 +08:00
Lu Xinlong	3b8a567e9e	fix serving issues when requesting real data	2026-06-12 17:57:23 +08:00
Lu Xinlong	50e3a05fb0	fix incorrect MoE step to ensure decoding speed	2026-06-12 11:44:50 +08:00
Lu Xinlong	629f878c28	initial commit for qwen3.6-moe adaptation	2026-06-12 10:10:49 +08:00
Lu Xinlong	365da18436	Add reasoning parser mechanism + qwen3 parser + bugfixes	2026-06-10 18:22:29 +08:00
Lu Xinlong	4ab36b51d5	Add qwen3_coder tool calling parser	2026-06-10 14:38:54 +08:00
Lu Xinlong	d972854fb7	fix completion token statistic bug when input context is large	2026-06-08 15:04:34 +08:00
Lu Xinlong	c2de1c83b0	Utilize chunked prefill + K-tiling techniques to ensure 100K context	2026-06-05 17:00:41 +08:00
Lu Xinlong	2d1ef50992	chunked prefill support and memory opts	2026-06-05 16:03:34 +08:00
Lu Xinlong	8c047a70ea	some modifications to ensure 50K context input	2026-06-04 17:56:29 +08:00
Lu Xinlong	1c33ef1355	add paged_attn	2026-05-29 16:53:39 +08:00
Lu Xinlong	3ef8227384	initial version of adding chunked attention, ensuring 20K context	2026-05-29 16:49:33 +08:00
Lu Xinlong	0e89906481	Qwen3.6-27B iluvatar bi-v100 adaptation	2026-05-21 16:37:24 +08:00