From e235be16fe720f62aed1ec1dd3dbf7b9dfcf2107 Mon Sep 17 00:00:00 2001 From: simveit <69345428+simveit@users.noreply.github.com> Date: Sun, 25 May 2025 19:04:34 +0200 Subject: [PATCH] Fix some issues with current docs. (#6588) --- docs/backend/openai_api_embeddings.ipynb | 4 +--- docs/backend/openai_api_vision.ipynb | 8 +------- docs/index.rst | 7 ++++++- docs/references/deepseek.md | 6 +++--- docs/references/performance_analysis_and_optimization.rst | 7 +++++++ docs/references/performance_tuning.rst | 7 ------- docs/start/install.md | 2 -- 7 files changed, 18 insertions(+), 23 deletions(-) create mode 100644 docs/references/performance_analysis_and_optimization.rst delete mode 100644 docs/references/performance_tuning.rst diff --git a/docs/backend/openai_api_embeddings.ipynb b/docs/backend/openai_api_embeddings.ipynb index 89abeb830..e4a40cd5c 100644 --- a/docs/backend/openai_api_embeddings.ipynb +++ b/docs/backend/openai_api_embeddings.ipynb @@ -9,9 +9,7 @@ "SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n", "A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/embeddings).\n", "\n", - "This tutorial covers the embedding APIs for embedding models, such as \n", - "- [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) \n", - "- [Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) \n" + "This tutorial covers the embedding APIs for embedding models. For a list of the supported models see the [corresponding overview page](https://docs.sglang.ai/supported_models/embedding_models.html)\n" ] }, { diff --git a/docs/backend/openai_api_vision.ipynb b/docs/backend/openai_api_vision.ipynb index 16f9b8f78..0c80fdc0d 100644 --- a/docs/backend/openai_api_vision.ipynb +++ b/docs/backend/openai_api_vision.ipynb @@ -10,13 +10,7 @@ "A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/vision).\n", "This tutorial covers the vision APIs for vision language models.\n", "\n", - "SGLang supports various vision language models such as Llama 3.2, LLaVA-OneVision, Qwen2.5-VL, Gemma3 and [more](https://docs.sglang.ai/supported_models/multimodal_language_models): \n", - "- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) \n", - "- [lmms-lab/llava-onevision-qwen2-72b-ov-chat](https://huggingface.co/lmms-lab/llava-onevision-qwen2-72b-ov-chat) \n", - "- [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)\n", - "- [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)\n", - "- [openbmb/MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V)\n", - "- [deepseek-ai/deepseek-vl2](https://huggingface.co/deepseek-ai/deepseek-vl2)\n", + "SGLang supports various vision language models such as Llama 3.2, LLaVA-OneVision, Qwen2.5-VL, Gemma3 and [more](https://docs.sglang.ai/supported_models/multimodal_language_models).\n", "\n", "As an alternative to the OpenAI API, you can also use the [SGLang offline engine](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py)." ] diff --git a/docs/index.rst b/docs/index.rst index edd786372..9f736a811 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -28,6 +28,11 @@ The core features include: backend/openai_api_embeddings.ipynb backend/native_api.ipynb backend/offline_engine_api.ipynb + +.. toctree:: + :maxdepth: 1 + :caption: Advanced Backend Configurations + backend/server_arguments.md backend/sampling_params.md backend/hyperparameter_tuning.md @@ -77,4 +82,4 @@ The core features include: references/general references/hardware references/advanced_deploy - references/performance_tuning + references/performance_analysis_and_optimization diff --git a/docs/references/deepseek.md b/docs/references/deepseek.md index 6f0d9afd2..7b464b7c2 100644 --- a/docs/references/deepseek.md +++ b/docs/references/deepseek.md @@ -3,7 +3,7 @@ SGLang provides many optimizations specifically designed for the DeepSeek models, making it the inference engine recommended by the official [DeepSeek team](https://github.com/deepseek-ai/DeepSeek-V3/tree/main?tab=readme-ov-file#62-inference-with-sglang-recommended) from Day 0. This document outlines current optimizations for DeepSeek. -Additionally, the SGLang team is actively developing enhancements following this [Roadmap](https://github.com/sgl-project/sglang/issues/2591). +For an overview of the implemented features see the completed [Roadmap](https://github.com/sgl-project/sglang/issues/2591). ## Launch DeepSeek V3 with SGLang @@ -221,6 +221,6 @@ Important Notes: ## FAQ -1. **Question**: What should I do if model loading takes too long and NCCL timeout occurs? +**Q: Model loading is taking too long, and I'm encountering an NCCL timeout. What should I do?** - **Answer**: You can try to add `--dist-timeout 3600` when launching the model, this allows for 1-hour timeout. +A: If you're experiencing extended model loading times and an NCCL timeout, you can try increasing the timeout duration. Add the argument `--dist-timeout 3600` when launching your model. This will set the timeout to one hour, which often resolves the issue. diff --git a/docs/references/performance_analysis_and_optimization.rst b/docs/references/performance_analysis_and_optimization.rst new file mode 100644 index 000000000..1d70fb51d --- /dev/null +++ b/docs/references/performance_analysis_and_optimization.rst @@ -0,0 +1,7 @@ +Performance Analysis & Optimization +=================================== +.. toctree:: + :maxdepth: 1 + + benchmark_and_profiling.md + accuracy_evaluation.md \ No newline at end of file diff --git a/docs/references/performance_tuning.rst b/docs/references/performance_tuning.rst deleted file mode 100644 index 6cc20e061..000000000 --- a/docs/references/performance_tuning.rst +++ /dev/null @@ -1,7 +0,0 @@ -Performance Tuning -==================== -.. toctree:: - :maxdepth: 1 - - benchmark_and_profiling.md - accuracy_evaluation.md diff --git a/docs/start/install.md b/docs/start/install.md index 168f6f7bd..82af7a92c 100644 --- a/docs/start/install.md +++ b/docs/start/install.md @@ -23,8 +23,6 @@ uv pip install "sglang[all]>=0.4.6.post5" 1. Use `export CUDA_HOME=/usr/local/cuda-` to set the `CUDA_HOME` environment variable. 2. Install FlashInfer first following [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html), then install SGLang as described above. -- If you encounter `ImportError; cannot import name 'is_valid_list_of_images' from 'transformers.models.llama.image_processing_llama'`, try to use the specified version of `transformers` in [pyproject.toml](https://github.com/sgl-project/sglang/blob/main/python/pyproject.toml). Currently, just running `pip install transformers==4.51.1`. - ## Method 2: From source ```bash