From 95c231e50d97406b0dd1974632415a99fae2e701 Mon Sep 17 00:00:00 2001 From: vzed <207368749+vincentzed@users.noreply.github.com> Date: Sun, 4 May 2025 16:12:40 -0400 Subject: [PATCH] Tool Call: Add `chat_template_kwargs` documentation (#5679) --- docs/backend/openai_api_completions.ipynb | 58 ++++++++++++++++++++++- 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/docs/backend/openai_api_completions.ipynb b/docs/backend/openai_api_completions.ipynb index 2fc74a7be..5424e45a0 100644 --- a/docs/backend/openai_api_completions.ipynb +++ b/docs/backend/openai_api_completions.ipynb @@ -94,7 +94,63 @@ "\n", "The chat completions API accepts OpenAI Chat Completions API's parameters. Refer to [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat/create) for more details.\n", "\n", - "Here is an example of a detailed chat completion request:" + "SGLang extends the standard API with the `extra_body` parameter, allowing for additional customization. One key option within `extra_body` is `chat_template_kwargs`, which can be used to pass arguments to the chat template processor.\n", + "\n", + "#### Enabling Model Thinking/Reasoning\n", + "\n", + "You can use `chat_template_kwargs` to enable or disable the model's internal thinking or reasoning process output. Set `\"enable_thinking\": True` within `chat_template_kwargs` to include the reasoning steps in the response. This requires launching the server with a compatible reasoning parser (e.g., `--reasoning-parser qwen3` for Qwen3 models).\n", + "\n", + "Here's an example demonstrating how to enable thinking and retrieve the reasoning content separately (using `separate_reasoning: True`):\n", + "\n", + "```python\n", + "# Ensure the server is launched with a compatible reasoning parser, e.g.:\n", + "# python3 -m sglang.launch_server --model-path QwQ/Qwen3-32B-250415 --reasoning-parser qwen3 ...\n", + "\n", + "from openai import OpenAI\n", + "\n", + "# Modify OpenAI's API key and API base to use SGLang's API server.\n", + "openai_api_key = \"EMPTY\"\n", + "openai_api_base = f\"http://127.0.0.1:{port}/v1\" # Use the correct port\n", + "\n", + "client = OpenAI(\n", + " api_key=openai_api_key,\n", + " base_url=openai_api_base,\n", + ")\n", + "\n", + "model = \"QwQ/Qwen3-32B-250415\" # Use the model loaded by the server\n", + "messages = [{\"role\": \"user\", \"content\": \"9.11 and 9.8, which is greater?\"}]\n", + "\n", + "response = client.chat.completions.create(\n", + " model=model,\n", + " messages=messages,\n", + " extra_body={\n", + " \"chat_template_kwargs\": {\"enable_thinking\": True},\n", + " \"separate_reasoning\": True\n", + " }\n", + ")\n", + "\n", + "print(\"response.choices[0].message.reasoning_content: \\n\", response.choices[0].message.reasoning_content)\n", + "print(\"response.choices[0].message.content: \\n\", response.choices[0].message.content)\n", + "```\n", + "\n", + "**Example Output:**\n", + "\n", + "```\n", + "response.choices[0].message.reasoning_content: \n", + " Okay, so I need to figure out which number is greater between 9.11 and 9.8. Hmm, let me think. Both numbers start with 9, right? So the whole number part is the same. That means I need to look at the decimal parts to determine which one is bigger.\n", + "...\n", + "Therefore, after checking multiple methods—aligning decimals, subtracting, converting to fractions, and using a real-world analogy—it's clear that 9.8 is greater than 9.11.\n", + "\n", + "response.choices[0].message.content: \n", + " To determine which number is greater between **9.11** and **9.8**, follow these steps:\n", + "...\n", + "**Answer**: \n", + "9.8 is greater than 9.11.\n", + "```\n", + "\n", + "Setting `\"enable_thinking\": False` (or omitting it) will result in `reasoning_content` being `None`.\n", + "\n", + "Here is an example of a detailed chat completion request using standard OpenAI parameters:" ] }, {