Docs: Fix layout to docs (#3733)

2025-02-21 20:24:13 +01:00
parent 9af0e21ef5
commit 4592afc27d
7 changed files with 11 additions and 9 deletions
--- a/docs/backend/function_calling.ipynb
+++ b/docs/backend/function_calling.ipynb
@@ -6,7 +6,7 @@
   "source": [
    "# Tool and Function Calling\n",
    "\n",
-    "This guide demonstrates how to use SGLang’s **Tool Calling** functionality."
+    "This guide demonstrates how to use SGLang’s [Funcion calling](https://platform.openai.com/docs/guides/function-calling) functionality."
   ]
  },
  {
--- a/docs/backend/openai_api_completions.ipynb
+++ b/docs/backend/openai_api_completions.ipynb
@@ -15,7 +15,7 @@
    "- `completions`\n",
    "- `batches`\n",
    "\n",
-    "Check out other tutorials to learn about vision APIs for vision-language models and embedding APIs for embedding models."
+    "Check out other tutorials to learn about [vision APIs](https://docs.sglang.ai/backend/openai_api_vision.html) for vision-language models and [embedding APIs](https://docs.sglang.ai/backend/openai_api_embeddings.html) for embedding models."
   ]
  },
  {
--- a/docs/backend/openai_api_vision.ipynb
+++ b/docs/backend/openai_api_vision.ipynb
@@ -13,7 +13,9 @@
    "SGLang supports vision language models such as Llama 3.2, LLaVA-OneVision, and QWen-VL2  \n",
    "- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)  \n",
    "- [lmms-lab/llava-onevision-qwen2-72b-ov-chat](https://huggingface.co/lmms-lab/llava-onevision-qwen2-72b-ov-chat)  \n",
-    "- [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)  "
+    "- [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)  \n",
+    "\n",
+    "As an alternative to the OpenAI API, you can also use the [SGLang offline engine](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py)."
   ]
  },
  {
--- a/docs/backend/quantization.md
+++ b/docs/backend/quantization.md
@@ -10,7 +10,7 @@ Online quantization dynamically computes scaling parameters—such as the maximu

 ## Offline Quantization

-To load already quantized models, simply load the model weights and config. **Again, if the model has been quantized offline, there's no need to add "--quantization" argument when starting the engine. The quantization method will be parsed from the downloaded Hugging Face config. For example, DeepSeek V3/R1 models are already in FP8, so do not add redundant parameters.**
+To load already quantized models, simply load the model weights and config. **Again, if the model has been quantized offline, there's no need to add `--quantization` argument when starting the engine. The quantization method will be parsed from the downloaded Hugging Face config. For example, DeepSeek V3/R1 models are already in FP8, so do not add redundant parameters.**

 ```bash
 python3 -m sglang.launch_server \