sglang/docs/basic_usage/openai_api_completions.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# OpenAI APIs - Completions\n",
    "\n",
    "SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n",
    "A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/api-reference).\n",
    "\n",
    "This tutorial covers the following popular APIs:\n",
    "\n",
    "- `chat/completions`\n",
    "- `completions`\n",
    "\n",
    "Check out other tutorials to learn about [vision APIs](openai_api_vision.ipynb) for vision-language models and [embedding APIs](openai_api_embeddings.ipynb) for embedding models."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Launch A Server\n",
    "\n",
    "Launch the server in your terminal and wait for it to initialize."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sglang.test.doc_patch import launch_server_cmd\n",
    "from sglang.utils import wait_for_server, print_highlight, terminate_process\n",
    "\n",
    "server_process, port = launch_server_cmd(\n",
    "    \"python3 -m sglang.launch_server --model-path qwen/qwen2.5-0.5b-instruct --host 0.0.0.0\"\n",
    ")\n",
    "\n",
    "wait_for_server(f\"http://localhost:{port}\")\n",
    "print(f\"Server started on http://localhost:{port}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Chat Completions\n",
    "\n",
    "### Usage\n",
    "\n",
    "The server fully implements the OpenAI API.\n",
    "It will automatically apply the chat template specified in the Hugging Face tokenizer, if one is available.\n",
    "You can also specify a custom chat template with `--chat-template` when launching the server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import openai\n",
    "\n",
    "client = openai.Client(base_url=f\"http://127.0.0.1:{port}/v1\", api_key=\"None\")\n",
    "\n",
    "response = client.chat.completions.create(\n",
    "    model=\"qwen/qwen2.5-0.5b-instruct\",\n",
    "    messages=[\n",
    "        {\"role\": \"user\", \"content\": \"List 3 countries and their capitals.\"},\n",
    "    ],\n",
    "    temperature=0,\n",
    "    max_tokens=64,\n",
    ")\n",
    "\n",
    "print_highlight(f\"Response: {response}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parameters\n",
    "\n",
    "The chat completions API accepts OpenAI Chat Completions API's parameters. Refer to [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat/create) for more details.\n",
    "\n",
    "SGLang extends the standard API with the `extra_body` parameter, allowing for additional customization. One key option within `extra_body` is `chat_template_kwargs`, which can be used to pass arguments to the chat template processor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = client.chat.completions.create(\n",
    "    model=\"qwen/qwen2.5-0.5b-instruct\",\n",
    "    messages=[\n",
    "        {\n",
    "            \"role\": \"system\",\n",
    "            \"content\": \"You are a knowledgeable historian who provides concise responses.\",\n",
    "        },\n",
    "        {\"role\": \"user\", \"content\": \"Tell me about ancient Rome\"},\n",
    "        {\n",
    "            \"role\": \"assistant\",\n",
    "            \"content\": \"Ancient Rome was a civilization centered in Italy.\",\n",
    "        },\n",
    "        {\"role\": \"user\", \"content\": \"What were their major achievements?\"},\n",
    "    ],\n",
    "    temperature=0.3,  # Lower temperature for more focused responses\n",
    "    max_tokens=128,  # Reasonable length for a concise response\n",
    "    top_p=0.95,  # Slightly higher for better fluency\n",
    "    presence_penalty=0.2,  # Mild penalty to avoid repetition\n",
    "    frequency_penalty=0.2,  # Mild penalty for more natural language\n",
    "    n=1,  # Single response is usually more stable\n",
    "    seed=42,  # Keep for reproducibility\n",
    ")\n",
    "\n",
    "print_highlight(response.choices[0].message.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Streaming mode is also supported."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stream = client.chat.completions.create(\n",
    "    model=\"qwen/qwen2.5-0.5b-instruct\",\n",
    "    messages=[{\"role\": \"user\", \"content\": \"Say this is a test\"}],\n",
    "    stream=True,\n",
    ")\n",
    "for chunk in stream:\n",
    "    if chunk.choices[0].delta.content is not None:\n",
    "        print(chunk.choices[0].delta.content, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Enabling Model Thinking/Reasoning\n",
    "\n",
    "You can use `chat_template_kwargs` to enable or disable the model's internal thinking or reasoning process output. Set `\"enable_thinking\": True` within `chat_template_kwargs` to include the reasoning steps in the response. This requires launching the server with a compatible reasoning parser.\n",
    "\n",
    "**Reasoning Parser Options:**\n",
    "- `--reasoning-parser deepseek-r1`: For DeepSeek-R1 family models (R1, R1-0528, R1-Distill)\n",
    "- `--reasoning-parser qwen3`: For both standard Qwen3 models that support `enable_thinking` parameter and Qwen3-Thinking models\n",
    "- `--reasoning-parser qwen3-thinking`: For Qwen3-Thinking models, force reasoning version of qwen3 parser\n",
    "- `--reasoning-parser kimi`: For Kimi thinking models\n",
    "\n",
    "Here's an example demonstrating how to enable thinking and retrieve the reasoning content separately (using `separate_reasoning: True`):\n",
    "\n",
    "```python\n",
    "# For Qwen3 models with enable_thinking support:\n",
    "# python3 -m sglang.launch_server --model-path QwQ/Qwen3-32B-250415 --reasoning-parser qwen3 ...\n",
    "\n",
    "from openai import OpenAI\n",
    "\n",
    "# Modify OpenAI's API key and API base to use SGLang's API server.\n",
    "openai_api_key = \"EMPTY\"\n",
    "openai_api_base = f\"http://127.0.0.1:{port}/v1\" # Use the correct port\n",
    "\n",
    "client = OpenAI(\n",
    "    api_key=openai_api_key,\n",
    "    base_url=openai_api_base,\n",
    ")\n",
    "\n",
    "model = \"QwQ/Qwen3-32B-250415\" # Use the model loaded by the server\n",
    "messages = [{\"role\": \"user\", \"content\": \"9.11 and 9.8, which is greater?\"}]\n",
    "\n",
    "response = client.chat.completions.create(\n",
    "    model=model,\n",
    "    messages=messages,\n",
    "    extra_body={\n",
    "        \"chat_template_kwargs\": {\"enable_thinking\": True},\n",
    "        \"separate_reasoning\": True\n",
    "    }\n",
    ")\n",
    "\n",
    "print(\"response.choices[0].message.reasoning_content: \\n\", response.choices[0].message.reasoning_content)\n",
    "print(\"response.choices[0].message.content: \\n\", response.choices[0].message.content)\n",
    "```\n",
    "\n",
    "**Example Output:**\n",
    "\n",
    "```\n",
    "response.choices[0].message.reasoning_content: \n",
    " Okay, so I need to figure out which number is greater between 9.11 and 9.8. Hmm, let me think. Both numbers start with 9, right? So the whole number part is the same. That means I need to look at the decimal parts to determine which one is bigger.\n",
    "...\n",
    "Therefore, after checking multiple methods—aligning decimals, subtracting, converting to fractions, and using a real-world analogy—it's clear that 9.8 is greater than 9.11.\n",
    "\n",
    "response.choices[0].message.content: \n",
    " To determine which number is greater between **9.11** and **9.8**, follow these steps:\n",
    "...\n",
    "**Answer**:  \n",
    "9.8 is greater than 9.11.\n",
    "```\n",
    "\n",
    "Setting `\"enable_thinking\": False` (or omitting it) will result in `reasoning_content` being `None`.\n",
    "\n",
    "**Note for Qwen3-Thinking models:** These models always generate thinking content and do not support the `enable_thinking` parameter. Use `--reasoning-parser qwen3-thinking` or `--reasoning-parser qwen3` to parse the thinking content.\n",
    "\n",
    "Here is an example of a detailed chat completion request using standard OpenAI parameters:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Completions\n",
    "\n",
    "### Usage\n",
    "Completions API is similar to Chat Completions API, but without the `messages` parameter or chat templates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = client.completions.create(\n",
    "    model=\"qwen/qwen2.5-0.5b-instruct\",\n",
    "    prompt=\"List 3 countries and their capitals.\",\n",
    "    temperature=0,\n",
    "    max_tokens=64,\n",
    "    n=1,\n",
    "    stop=None,\n",
    ")\n",
    "\n",
    "print_highlight(f\"Response: {response}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parameters\n",
    "\n",
    "The completions API accepts OpenAI Completions API's parameters.  Refer to [OpenAI Completions API](https://platform.openai.com/docs/api-reference/completions/create) for more details.\n",
    "\n",
    "Here is an example of a detailed completions request:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = client.completions.create(\n",
    "    model=\"qwen/qwen2.5-0.5b-instruct\",\n",
    "    prompt=\"Write a short story about a space explorer.\",\n",
    "    temperature=0.7,  # Moderate temperature for creative writing\n",
    "    max_tokens=150,  # Longer response for a story\n",
    "    top_p=0.9,  # Balanced diversity in word choice\n",
    "    stop=[\"\\n\\n\", \"THE END\"],  # Multiple stop sequences\n",
    "    presence_penalty=0.3,  # Encourage novel elements\n",
    "    frequency_penalty=0.3,  # Reduce repetitive phrases\n",
    "    n=1,  # Generate one completion\n",
    "    seed=123,  # For reproducible results\n",
    ")\n",
    "\n",
    "print_highlight(f\"Response: {response}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Structured Outputs (JSON, Regex, EBNF)\n",
    "\n",
    "For OpenAI compatible structured outputs API, refer to [Structured Outputs](../advanced_features/structured_outputs.ipynb) for more details.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "terminate_process(server_process)"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
Update docs title (#1879) 2024-11-01 20:00:41 -07:00			`"# OpenAI APIs - Completions\n",`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`"\n",`
Improve docs and fix the broken links (#1875) 2024-11-01 17:47:44 -07:00			`"SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n",`
			`"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/api-reference).\n",`
Imporve openai api documents (#1827) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-30 00:39:41 -07:00			`"\n",`
Improve docs and fix the broken links (#1875) 2024-11-01 17:47:44 -07:00			`"This tutorial covers the following popular APIs:\n",`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`"\n",`
			"- `chat/completions`\n",
			"- `completions`\n",
Improve docs and fix the broken links (#1875) 2024-11-01 17:47:44 -07:00			`"\n",`
Refactor the docs (#9031) 2025-08-10 19:49:45 -07:00			`"Check out other tutorials to learn about [vision APIs](openai_api_vision.ipynb) for vision-language models and [embedding APIs](openai_api_embeddings.ipynb) for embedding models."`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
Improve docs and fix the broken links (#1875) 2024-11-01 17:47:44 -07:00			`"## Launch A Server\n",`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`"\n",`
Docs: Refactor Contribution Guide (#2690) 2024-12-31 22:11:00 +00:00			`"Launch the server in your terminal and wait for it to initialize."`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`]`
			`},`
			`{`
			`"cell_type": "code",`
Native api (#1886) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-11-02 01:02:17 -07:00			`"execution_count": null,`
feat(pre-commit): trim unnecessary notebook metadata from git history (#2127) 2024-11-23 05:04:51 +08:00			`"metadata": {},`
add native api docs (#1883) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-11-02 00:17:30 -07:00			`"outputs": [],`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`"source": [`
Refactor the docs (#9031) 2025-08-10 19:49:45 -07:00			`"from sglang.test.doc_patch import launch_server_cmd\n",`
[CI] Improve Docs CI Efficiency (#3587) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> 2025-02-15 03:57:00 +00:00			`"from sglang.utils import wait_for_server, print_highlight, terminate_process\n",`
			`"\n",`
			`"server_process, port = launch_server_cmd(\n",`
Refactor the docs (#9031) 2025-08-10 19:49:45 -07:00			`" \"python3 -m sglang.launch_server --model-path qwen/qwen2.5-0.5b-instruct --host 0.0.0.0\"\n",`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`")\n",`
			`"\n",`
[CI] Improve Docs CI Efficiency (#3587) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> 2025-02-15 03:57:00 +00:00			`"wait_for_server(f\"http://localhost:{port}\")\n",`
			`"print(f\"Server started on http://localhost:{port}\")"`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`]`
			`},`
Improve docs and fix the broken links (#1875) 2024-11-01 17:47:44 -07:00			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Chat Completions\n",`
			`"\n",`
			`"### Usage\n",`
			`"\n",`
			`"The server fully implements the OpenAI API.\n",`
			`"It will automatically apply the chat template specified in the Hugging Face tokenizer, if one is available.\n",`
			"You can also specify a custom chat template with `--chat-template` when launching the server."
			`]`
			`},`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`{`
			`"cell_type": "code",`
Native api (#1886) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-11-02 01:02:17 -07:00			`"execution_count": null,`
feat(pre-commit): trim unnecessary notebook metadata from git history (#2127) 2024-11-23 05:04:51 +08:00			`"metadata": {},`
add native api docs (#1883) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-11-02 00:17:30 -07:00			`"outputs": [],`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`"source": [`
			`"import openai\n",`
			`"\n",`
[CI] Improve Docs CI Efficiency (#3587) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> 2025-02-15 03:57:00 +00:00			`"client = openai.Client(base_url=f\"http://127.0.0.1:{port}/v1\", api_key=\"None\")\n",`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`"\n",`
			`"response = client.chat.completions.create(\n",`
smaller and non gated models for docs (#5378) 2025-04-21 02:38:25 +02:00			`" model=\"qwen/qwen2.5-0.5b-instruct\",\n",`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`" messages=[\n",`
			`" {\"role\": \"user\", \"content\": \"List 3 countries and their capitals.\"},\n",`
			`" ],\n",`
			`" temperature=0,\n",`
			`" max_tokens=64,\n",`
			`")\n",`
Imporve openai api documents (#1827) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-30 00:39:41 -07:00			`"\n",`
			`"print_highlight(f\"Response: {response}\")"`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### Parameters\n",`
			`"\n",`
Imporve openai api documents (#1827) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-30 00:39:41 -07:00			`"The chat completions API accepts OpenAI Chat Completions API's parameters. Refer to [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat/create) for more details.\n",`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`"\n",`
Refactor the docs (#9031) 2025-08-10 19:49:45 -07:00			"SGLang extends the standard API with the `extra_body` parameter, allowing for additional customization. One key option within `extra_body` is `chat_template_kwargs`, which can be used to pass arguments to the chat template processor."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"response = client.chat.completions.create(\n",`
			`" model=\"qwen/qwen2.5-0.5b-instruct\",\n",`
			`" messages=[\n",`
			`" {\n",`
			`" \"role\": \"system\",\n",`
			`" \"content\": \"You are a knowledgeable historian who provides concise responses.\",\n",`
			`" },\n",`
			`" {\"role\": \"user\", \"content\": \"Tell me about ancient Rome\"},\n",`
			`" {\n",`
			`" \"role\": \"assistant\",\n",`
			`" \"content\": \"Ancient Rome was a civilization centered in Italy.\",\n",`
			`" },\n",`
			`" {\"role\": \"user\", \"content\": \"What were their major achievements?\"},\n",`
			`" ],\n",`
			`" temperature=0.3, # Lower temperature for more focused responses\n",`
			`" max_tokens=128, # Reasonable length for a concise response\n",`
			`" top_p=0.95, # Slightly higher for better fluency\n",`
			`" presence_penalty=0.2, # Mild penalty to avoid repetition\n",`
			`" frequency_penalty=0.2, # Mild penalty for more natural language\n",`
			`" n=1, # Single response is usually more stable\n",`
			`" seed=42, # Keep for reproducibility\n",`
			`")\n",`
Tool Call: Add `chat_template_kwargs` documentation (#5679) 2025-05-04 16:12:40 -04:00			`"\n",`
Refactor the docs (#9031) 2025-08-10 19:49:45 -07:00			`"print_highlight(response.choices[0].message.content)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Streaming mode is also supported."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"stream = client.chat.completions.create(\n",`
			`" model=\"qwen/qwen2.5-0.5b-instruct\",\n",`
			`" messages=[{\"role\": \"user\", \"content\": \"Say this is a test\"}],\n",`
			`" stream=True,\n",`
			`")\n",`
			`"for chunk in stream:\n",`
			`" if chunk.choices[0].delta.content is not None:\n",`
			`" print(chunk.choices[0].delta.content, end=\"\")"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### Enabling Model Thinking/Reasoning\n",`
Tool Call: Add `chat_template_kwargs` documentation (#5679) 2025-05-04 16:12:40 -04:00			`"\n",`
[Feat] Add reasoning parser for Qwen/Qwen3-235B-A22B-Thinking-2507 (#8363) 2025-07-25 14:59:42 -07:00			"You can use `chat_template_kwargs` to enable or disable the model's internal thinking or reasoning process output. Set `\"enable_thinking\": True` within `chat_template_kwargs` to include the reasoning steps in the response. This requires launching the server with a compatible reasoning parser.\n",
			`"\n",`
			`"Reasoning Parser Options:\n",`
			"- `--reasoning-parser deepseek-r1`: For DeepSeek-R1 family models (R1, R1-0528, R1-Distill)\n",
Enables force reasoning based on chat template for Qwen3-Thinking (#8369) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Chang Su <csu272@usc.edu> 2025-08-06 20:02:47 -07:00			"- `--reasoning-parser qwen3`: For both standard Qwen3 models that support `enable_thinking` parameter and Qwen3-Thinking models\n",
			"- `--reasoning-parser qwen3-thinking`: For Qwen3-Thinking models, force reasoning version of qwen3 parser\n",
[Feat] Add reasoning parser for Qwen/Qwen3-235B-A22B-Thinking-2507 (#8363) 2025-07-25 14:59:42 -07:00			"- `--reasoning-parser kimi`: For Kimi thinking models\n",
Tool Call: Add `chat_template_kwargs` documentation (#5679) 2025-05-04 16:12:40 -04:00			`"\n",`
			"Here's an example demonstrating how to enable thinking and retrieve the reasoning content separately (using `separate_reasoning: True`):\n",
			`"\n",`
			"```python\n",
Enables force reasoning based on chat template for Qwen3-Thinking (#8369) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Chang Su <csu272@usc.edu> 2025-08-06 20:02:47 -07:00			`"# For Qwen3 models with enable_thinking support:\n",`
Tool Call: Add `chat_template_kwargs` documentation (#5679) 2025-05-04 16:12:40 -04:00			`"# python3 -m sglang.launch_server --model-path QwQ/Qwen3-32B-250415 --reasoning-parser qwen3 ...\n",`
			`"\n",`
			`"from openai import OpenAI\n",`
			`"\n",`
			`"# Modify OpenAI's API key and API base to use SGLang's API server.\n",`
			`"openai_api_key = \"EMPTY\"\n",`
			`"openai_api_base = f\"http://127.0.0.1:{port}/v1\" # Use the correct port\n",`
			`"\n",`
			`"client = OpenAI(\n",`
			`" api_key=openai_api_key,\n",`
			`" base_url=openai_api_base,\n",`
			`")\n",`
			`"\n",`
			`"model = \"QwQ/Qwen3-32B-250415\" # Use the model loaded by the server\n",`
			`"messages = [{\"role\": \"user\", \"content\": \"9.11 and 9.8, which is greater?\"}]\n",`
			`"\n",`
			`"response = client.chat.completions.create(\n",`
			`" model=model,\n",`
			`" messages=messages,\n",`
			`" extra_body={\n",`
Enables force reasoning based on chat template for Qwen3-Thinking (#8369) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Chang Su <csu272@usc.edu> 2025-08-06 20:02:47 -07:00			`" \"chat_template_kwargs\": {\"enable_thinking\": True},\n",`
Tool Call: Add `chat_template_kwargs` documentation (#5679) 2025-05-04 16:12:40 -04:00			`" \"separate_reasoning\": True\n",`
			`" }\n",`
			`")\n",`
			`"\n",`
			`"print(\"response.choices[0].message.reasoning_content: \\n\", response.choices[0].message.reasoning_content)\n",`
			`"print(\"response.choices[0].message.content: \\n\", response.choices[0].message.content)\n",`
			"```\n",
			`"\n",`
			`"Example Output:\n",`
			`"\n",`
			"```\n",
			`"response.choices[0].message.reasoning_content: \n",`
			`" Okay, so I need to figure out which number is greater between 9.11 and 9.8. Hmm, let me think. Both numbers start with 9, right? So the whole number part is the same. That means I need to look at the decimal parts to determine which one is bigger.\n",`
			`"...\n",`
			`"Therefore, after checking multiple methods—aligning decimals, subtracting, converting to fractions, and using a real-world analogy—it's clear that 9.8 is greater than 9.11.\n",`
			`"\n",`
			`"response.choices[0].message.content: \n",`
			`" To determine which number is greater between 9.11 and 9.8, follow these steps:\n",`
			`"...\n",`
			`"Answer: \n",`
			`"9.8 is greater than 9.11.\n",`
			"```\n",
			`"\n",`
			"Setting `\"enable_thinking\": False` (or omitting it) will result in `reasoning_content` being `None`.\n",
			`"\n",`
Enables force reasoning based on chat template for Qwen3-Thinking (#8369) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Chang Su <csu272@usc.edu> 2025-08-06 20:02:47 -07:00			"Note for Qwen3-Thinking models: These models always generate thinking content and do not support the `enable_thinking` parameter. Use `--reasoning-parser qwen3-thinking` or `--reasoning-parser qwen3` to parse the thinking content.\n",
[Feat] Add reasoning parser for Qwen/Qwen3-235B-A22B-Thinking-2507 (#8363) 2025-07-25 14:59:42 -07:00			`"\n",`
Tool Call: Add `chat_template_kwargs` documentation (#5679) 2025-05-04 16:12:40 -04:00			`"Here is an example of a detailed chat completion request using standard OpenAI parameters:"`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Completions\n",`
			`"\n",`
			`"### Usage\n",`
Improve docs and fix the broken links (#1875) 2024-11-01 17:47:44 -07:00			"Completions API is similar to Chat Completions API, but without the `messages` parameter or chat templates."
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`]`
			`},`
			`{`
			`"cell_type": "code",`
Native api (#1886) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-11-02 01:02:17 -07:00			`"execution_count": null,`
feat(pre-commit): trim unnecessary notebook metadata from git history (#2127) 2024-11-23 05:04:51 +08:00			`"metadata": {},`
add native api docs (#1883) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-11-02 00:17:30 -07:00			`"outputs": [],`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`"source": [`
			`"response = client.completions.create(\n",`
smaller and non gated models for docs (#5378) 2025-04-21 02:38:25 +02:00			`" model=\"qwen/qwen2.5-0.5b-instruct\",\n",`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`" prompt=\"List 3 countries and their capitals.\",\n",`
			`" temperature=0,\n",`
			`" max_tokens=64,\n",`
			`" n=1,\n",`
			`" stop=None,\n",`
			`")\n",`
Imporve openai api documents (#1827) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-30 00:39:41 -07:00			`"\n",`
			`"print_highlight(f\"Response: {response}\")"`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### Parameters\n",`
			`"\n",`
Imporve openai api documents (#1827) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-30 00:39:41 -07:00			`"The completions API accepts OpenAI Completions API's parameters. Refer to [OpenAI Completions API](https://platform.openai.com/docs/api-reference/completions/create) for more details.\n",`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`"\n",`
			`"Here is an example of a detailed completions request:"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
Native api (#1886) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-11-02 01:02:17 -07:00			`"execution_count": null,`
feat(pre-commit): trim unnecessary notebook metadata from git history (#2127) 2024-11-23 05:04:51 +08:00			`"metadata": {},`
add native api docs (#1883) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-11-02 00:17:30 -07:00			`"outputs": [],`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`"source": [`
			`"response = client.completions.create(\n",`
smaller and non gated models for docs (#5378) 2025-04-21 02:38:25 +02:00			`" model=\"qwen/qwen2.5-0.5b-instruct\",\n",`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`" prompt=\"Write a short story about a space explorer.\",\n",`
			`" temperature=0.7, # Moderate temperature for creative writing\n",`
			`" max_tokens=150, # Longer response for a story\n",`
			`" top_p=0.9, # Balanced diversity in word choice\n",`
			`" stop=[\"\\n\\n\", \"THE END\"], # Multiple stop sequences\n",`
			`" presence_penalty=0.3, # Encourage novel elements\n",`
			`" frequency_penalty=0.3, # Reduce repetitive phrases\n",`
			`" n=1, # Generate one completion\n",`
			`" seed=123, # For reproducible results\n",`
			`")\n",`
			`"\n",`
Imporve openai api documents (#1827) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-30 00:39:41 -07:00			`"print_highlight(f\"Response: {response}\")"`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`]`
			`},`
Fix regex docs (#1909) 2024-11-03 14:18:16 -08:00			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
[Docs] Add EBNF to sampling params docs (#2609) 2024-12-29 13:35:00 +05:30			`"## Structured Outputs (JSON, Regex, EBNF)\n",`
Doc: Update Grammar Backend (#2545) Co-authored-by: Chayenne <zhaochen20@outlook.com> 2024-12-23 01:14:40 +00:00			`"\n",`
Refactor the docs (#9031) 2025-08-10 19:49:45 -07:00			`"For OpenAI compatible structured outputs API, refer to [Structured Outputs](../advanced_features/structured_outputs.ipynb) for more details.\n"`
[Docs] Add EBNF to sampling params docs (#2609) 2024-12-29 13:35:00 +05:30			`]`
			`},`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`{`
			`"cell_type": "code",`
feat(pre-commit): trim unnecessary notebook metadata from git history (#2127) 2024-11-23 05:04:51 +08:00			`"execution_count": null,`
			`"metadata": {},`
Update docs (#1839) 2024-10-30 02:49:08 -07:00			`"outputs": [],`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`"source": [`
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com> 2025-02-19 19:15:44 +00:00			`"terminate_process(server_process)"`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`]`
			`}`
			`],`
			`"metadata": {`
Native api (#1886) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-11-02 01:02:17 -07:00			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
feat(pre-commit): trim unnecessary notebook metadata from git history (#2127) 2024-11-23 05:04:51 +08:00			`"pygments_lexer": "ipython3"`
Add openAI compatible API (#1810) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-10-27 10:51:42 -07:00			`}`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 2`
			`}`