enginex-vllm-bi100-qwen36/qwen3_6_scripts at 4ab36b51d515932be6ae071d6e95c9bbf5707f76 - enginex-vllm-bi100-qwen36 - Gitea: Git with a cup of tea

EngineX-Iluvatar/enginex-vllm-bi100-qwen36

Files

History

Lu Xinlong 4ab36b51d5 Add qwen3_coder tool calling parser

2026-06-10 14:38:54 +08:00

..

Qwen3.6-27B iluvatar bi-v100 adaptation

2026-05-21 16:37:24 +08:00

mamba_cache.py

Qwen3.6-27B iluvatar bi-v100 adaptation

2026-05-21 16:37:24 +08:00

paged_attn.py

Utilize chunked prefill + K-tiling techniques to ensure 100K context

2026-06-05 17:00:41 +08:00

patch_ops.sh

Add qwen3_coder tool calling parser

2026-06-10 14:38:54 +08:00

patch_transformers_qwen3_5.py

Qwen3.6-27B iluvatar bi-v100 adaptation

2026-05-21 16:37:24 +08:00

patch_vllm_qwen3_5.py

Qwen3.6-27B iluvatar bi-v100 adaptation

2026-05-21 16:37:24 +08:00

patch_vllm_tool_parser.py

Add qwen3_coder tool calling parser

2026-06-10 14:38:54 +08:00

patch_xformers_sdpa_batch_kernel.py

Qwen3.6-27B iluvatar bi-v100 adaptation

2026-05-21 16:37:24 +08:00

patch_xformers_sdpa_batch.py

Qwen3.6-27B iluvatar bi-v100 adaptation

2026-05-21 16:37:24 +08:00

patch_xformers_sdpa_seq_kernel.py

Qwen3.6-27B iluvatar bi-v100 adaptation

2026-05-21 16:37:24 +08:00

patch_xformers_sdpa_seq.py

chunked prefill support and memory opts

2026-06-05 16:03:34 +08:00

qwen3_5.py

chunked prefill support and memory opts

2026-06-05 16:03:34 +08:00

qwen3coder_tool_parser.py

Add qwen3_coder tool calling parser

2026-06-10 14:38:54 +08:00

sequence.py

fix completion token statistic bug when input context is large

2026-06-08 15:04:34 +08:00