Fix some issues with current docs. (#6588)
This commit is contained in:
@@ -9,9 +9,7 @@
|
|||||||
"SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n",
|
"SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n",
|
||||||
"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/embeddings).\n",
|
"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/embeddings).\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This tutorial covers the embedding APIs for embedding models, such as \n",
|
"This tutorial covers the embedding APIs for embedding models. For a list of the supported models see the [corresponding overview page](https://docs.sglang.ai/supported_models/embedding_models.html)\n"
|
||||||
"- [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) \n",
|
|
||||||
"- [Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) \n"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -10,13 +10,7 @@
|
|||||||
"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/vision).\n",
|
"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/vision).\n",
|
||||||
"This tutorial covers the vision APIs for vision language models.\n",
|
"This tutorial covers the vision APIs for vision language models.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"SGLang supports various vision language models such as Llama 3.2, LLaVA-OneVision, Qwen2.5-VL, Gemma3 and [more](https://docs.sglang.ai/supported_models/multimodal_language_models): \n",
|
"SGLang supports various vision language models such as Llama 3.2, LLaVA-OneVision, Qwen2.5-VL, Gemma3 and [more](https://docs.sglang.ai/supported_models/multimodal_language_models).\n",
|
||||||
"- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) \n",
|
|
||||||
"- [lmms-lab/llava-onevision-qwen2-72b-ov-chat](https://huggingface.co/lmms-lab/llava-onevision-qwen2-72b-ov-chat) \n",
|
|
||||||
"- [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)\n",
|
|
||||||
"- [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)\n",
|
|
||||||
"- [openbmb/MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V)\n",
|
|
||||||
"- [deepseek-ai/deepseek-vl2](https://huggingface.co/deepseek-ai/deepseek-vl2)\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"As an alternative to the OpenAI API, you can also use the [SGLang offline engine](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py)."
|
"As an alternative to the OpenAI API, you can also use the [SGLang offline engine](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py)."
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -28,6 +28,11 @@ The core features include:
|
|||||||
backend/openai_api_embeddings.ipynb
|
backend/openai_api_embeddings.ipynb
|
||||||
backend/native_api.ipynb
|
backend/native_api.ipynb
|
||||||
backend/offline_engine_api.ipynb
|
backend/offline_engine_api.ipynb
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
:caption: Advanced Backend Configurations
|
||||||
|
|
||||||
backend/server_arguments.md
|
backend/server_arguments.md
|
||||||
backend/sampling_params.md
|
backend/sampling_params.md
|
||||||
backend/hyperparameter_tuning.md
|
backend/hyperparameter_tuning.md
|
||||||
@@ -77,4 +82,4 @@ The core features include:
|
|||||||
references/general
|
references/general
|
||||||
references/hardware
|
references/hardware
|
||||||
references/advanced_deploy
|
references/advanced_deploy
|
||||||
references/performance_tuning
|
references/performance_analysis_and_optimization
|
||||||
|
|||||||
@@ -3,7 +3,7 @@
|
|||||||
SGLang provides many optimizations specifically designed for the DeepSeek models, making it the inference engine recommended by the official [DeepSeek team](https://github.com/deepseek-ai/DeepSeek-V3/tree/main?tab=readme-ov-file#62-inference-with-sglang-recommended) from Day 0.
|
SGLang provides many optimizations specifically designed for the DeepSeek models, making it the inference engine recommended by the official [DeepSeek team](https://github.com/deepseek-ai/DeepSeek-V3/tree/main?tab=readme-ov-file#62-inference-with-sglang-recommended) from Day 0.
|
||||||
|
|
||||||
This document outlines current optimizations for DeepSeek.
|
This document outlines current optimizations for DeepSeek.
|
||||||
Additionally, the SGLang team is actively developing enhancements following this [Roadmap](https://github.com/sgl-project/sglang/issues/2591).
|
For an overview of the implemented features see the completed [Roadmap](https://github.com/sgl-project/sglang/issues/2591).
|
||||||
|
|
||||||
## Launch DeepSeek V3 with SGLang
|
## Launch DeepSeek V3 with SGLang
|
||||||
|
|
||||||
@@ -221,6 +221,6 @@ Important Notes:
|
|||||||
|
|
||||||
## FAQ
|
## FAQ
|
||||||
|
|
||||||
1. **Question**: What should I do if model loading takes too long and NCCL timeout occurs?
|
**Q: Model loading is taking too long, and I'm encountering an NCCL timeout. What should I do?**
|
||||||
|
|
||||||
**Answer**: You can try to add `--dist-timeout 3600` when launching the model, this allows for 1-hour timeout.
|
A: If you're experiencing extended model loading times and an NCCL timeout, you can try increasing the timeout duration. Add the argument `--dist-timeout 3600` when launching your model. This will set the timeout to one hour, which often resolves the issue.
|
||||||
|
|||||||
@@ -0,0 +1,7 @@
|
|||||||
|
Performance Analysis & Optimization
|
||||||
|
===================================
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
benchmark_and_profiling.md
|
||||||
|
accuracy_evaluation.md
|
||||||
@@ -1,7 +0,0 @@
|
|||||||
Performance Tuning
|
|
||||||
====================
|
|
||||||
.. toctree::
|
|
||||||
:maxdepth: 1
|
|
||||||
|
|
||||||
benchmark_and_profiling.md
|
|
||||||
accuracy_evaluation.md
|
|
||||||
@@ -23,8 +23,6 @@ uv pip install "sglang[all]>=0.4.6.post5"
|
|||||||
1. Use `export CUDA_HOME=/usr/local/cuda-<your-cuda-version>` to set the `CUDA_HOME` environment variable.
|
1. Use `export CUDA_HOME=/usr/local/cuda-<your-cuda-version>` to set the `CUDA_HOME` environment variable.
|
||||||
2. Install FlashInfer first following [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html), then install SGLang as described above.
|
2. Install FlashInfer first following [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html), then install SGLang as described above.
|
||||||
|
|
||||||
- If you encounter `ImportError; cannot import name 'is_valid_list_of_images' from 'transformers.models.llama.image_processing_llama'`, try to use the specified version of `transformers` in [pyproject.toml](https://github.com/sgl-project/sglang/blob/main/python/pyproject.toml). Currently, just running `pip install transformers==4.51.1`.
|
|
||||||
|
|
||||||
## Method 2: From source
|
## Method 2: From source
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
Reference in New Issue
Block a user