Fix some issues with current docs. (#6588)

This commit is contained in:
simveit
2025-05-25 19:04:34 +02:00
committed by GitHub
parent 5ccf8fe1a0
commit e235be16fe
7 changed files with 18 additions and 23 deletions

View File

@@ -9,9 +9,7 @@
"SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n", "SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n",
"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/embeddings).\n", "A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/embeddings).\n",
"\n", "\n",
"This tutorial covers the embedding APIs for embedding models, such as \n", "This tutorial covers the embedding APIs for embedding models. For a list of the supported models see the [corresponding overview page](https://docs.sglang.ai/supported_models/embedding_models.html)\n"
"- [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) \n",
"- [Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) \n"
] ]
}, },
{ {

View File

@@ -10,13 +10,7 @@
"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/vision).\n", "A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/vision).\n",
"This tutorial covers the vision APIs for vision language models.\n", "This tutorial covers the vision APIs for vision language models.\n",
"\n", "\n",
"SGLang supports various vision language models such as Llama 3.2, LLaVA-OneVision, Qwen2.5-VL, Gemma3 and [more](https://docs.sglang.ai/supported_models/multimodal_language_models): \n", "SGLang supports various vision language models such as Llama 3.2, LLaVA-OneVision, Qwen2.5-VL, Gemma3 and [more](https://docs.sglang.ai/supported_models/multimodal_language_models).\n",
"- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) \n",
"- [lmms-lab/llava-onevision-qwen2-72b-ov-chat](https://huggingface.co/lmms-lab/llava-onevision-qwen2-72b-ov-chat) \n",
"- [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)\n",
"- [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)\n",
"- [openbmb/MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V)\n",
"- [deepseek-ai/deepseek-vl2](https://huggingface.co/deepseek-ai/deepseek-vl2)\n",
"\n", "\n",
"As an alternative to the OpenAI API, you can also use the [SGLang offline engine](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py)." "As an alternative to the OpenAI API, you can also use the [SGLang offline engine](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py)."
] ]

View File

@@ -28,6 +28,11 @@ The core features include:
backend/openai_api_embeddings.ipynb backend/openai_api_embeddings.ipynb
backend/native_api.ipynb backend/native_api.ipynb
backend/offline_engine_api.ipynb backend/offline_engine_api.ipynb
.. toctree::
:maxdepth: 1
:caption: Advanced Backend Configurations
backend/server_arguments.md backend/server_arguments.md
backend/sampling_params.md backend/sampling_params.md
backend/hyperparameter_tuning.md backend/hyperparameter_tuning.md
@@ -77,4 +82,4 @@ The core features include:
references/general references/general
references/hardware references/hardware
references/advanced_deploy references/advanced_deploy
references/performance_tuning references/performance_analysis_and_optimization

View File

@@ -3,7 +3,7 @@
SGLang provides many optimizations specifically designed for the DeepSeek models, making it the inference engine recommended by the official [DeepSeek team](https://github.com/deepseek-ai/DeepSeek-V3/tree/main?tab=readme-ov-file#62-inference-with-sglang-recommended) from Day 0. SGLang provides many optimizations specifically designed for the DeepSeek models, making it the inference engine recommended by the official [DeepSeek team](https://github.com/deepseek-ai/DeepSeek-V3/tree/main?tab=readme-ov-file#62-inference-with-sglang-recommended) from Day 0.
This document outlines current optimizations for DeepSeek. This document outlines current optimizations for DeepSeek.
Additionally, the SGLang team is actively developing enhancements following this [Roadmap](https://github.com/sgl-project/sglang/issues/2591). For an overview of the implemented features see the completed [Roadmap](https://github.com/sgl-project/sglang/issues/2591).
## Launch DeepSeek V3 with SGLang ## Launch DeepSeek V3 with SGLang
@@ -221,6 +221,6 @@ Important Notes:
## FAQ ## FAQ
1. **Question**: What should I do if model loading takes too long and NCCL timeout occurs? **Q: Model loading is taking too long, and I'm encountering an NCCL timeout. What should I do?**
**Answer**: You can try to add `--dist-timeout 3600` when launching the model, this allows for 1-hour timeout. A: If you're experiencing extended model loading times and an NCCL timeout, you can try increasing the timeout duration. Add the argument `--dist-timeout 3600` when launching your model. This will set the timeout to one hour, which often resolves the issue.

View File

@@ -0,0 +1,7 @@
Performance Analysis & Optimization
===================================
.. toctree::
:maxdepth: 1
benchmark_and_profiling.md
accuracy_evaluation.md

View File

@@ -1,7 +0,0 @@
Performance Tuning
====================
.. toctree::
:maxdepth: 1
benchmark_and_profiling.md
accuracy_evaluation.md

View File

@@ -23,8 +23,6 @@ uv pip install "sglang[all]>=0.4.6.post5"
1. Use `export CUDA_HOME=/usr/local/cuda-<your-cuda-version>` to set the `CUDA_HOME` environment variable. 1. Use `export CUDA_HOME=/usr/local/cuda-<your-cuda-version>` to set the `CUDA_HOME` environment variable.
2. Install FlashInfer first following [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html), then install SGLang as described above. 2. Install FlashInfer first following [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html), then install SGLang as described above.
- If you encounter `ImportError; cannot import name 'is_valid_list_of_images' from 'transformers.models.llama.image_processing_llama'`, try to use the specified version of `transformers` in [pyproject.toml](https://github.com/sgl-project/sglang/blob/main/python/pyproject.toml). Currently, just running `pip install transformers==4.51.1`.
## Method 2: From source ## Method 2: From source
```bash ```bash