Fix Harmony reasoning parser for and auto-separation for gpt-oss models (#9190)

Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: minleminzui <2969413251@qq.com> Co-authored-by: maocheng23 <maocheng@berkeley.edu> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-26 00:26:26 +02:00
parent 24a8cee66d
commit a0a77d937b
8 changed files with 1676 additions and 551 deletions
--- a/python/sglang/srt/server_args.py
+++ b/python/sglang/srt/server_args.py
@@ -2271,6 +2271,7 @@ class ServerArgs:
            if is_mxfp4_quant_format:
                # use bf16 for mxfp4 triton kernels
                self.dtype = "bfloat16"
+
        elif "Llama4" in model_arch:
            assert self.attention_backend in {
                "fa3",