[Docs]: Fix Multi-User Port Allocation Conflicts (#3601)

Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: simveit <simp.veitner@gmail.com>
This commit is contained in:
Shi Shuai
2025-02-19 19:15:44 +00:00
committed by GitHub
parent 6b0aeb58fd
commit 55de40f782
12 changed files with 168 additions and 117 deletions

View File

@@ -36,42 +36,70 @@ find . -name '*.ipynb' -exec nbstripout {} \;
# After these checks pass, push your changes and open a PR on your branch
pre-commit run --all-files
```
---
### **Port Allocation and CI Efficiency**
If you need to run and shut down a SGLang server or engine, following these examples:
1. Launch and close Sever:
**To launch and kill the server:**
```python
#Launch Sever
from sglang.test.test_utils import is_in_ci
from sglang.utils import wait_for_server, print_highlight, terminate_process
from sglang.utils import (
execute_shell_command,
wait_for_server,
terminate_process,
print_highlight,
if is_in_ci():
from patch import launch_server_cmd
else:
from sglang.utils import launch_server_cmd
server_process, port = launch_server_cmd(
"""
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
--host 0.0.0.0
"""
)
server_process = execute_shell_command(
"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30000 --host 0.0.0.0"
)
wait_for_server("http://localhost:30000")
# Terminate Sever
wait_for_server(f"http://localhost:{port}")
# Terminate Server
terminate_process(server_process)
```
2. Launch Engine and close Engine
**To launch and kill the engine:**
```python
# Launch Engine
import sglang as sgl
import asyncio
from sglang.test.test_utils import is_in_ci
if is_in_ci():
import patch
llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
# Terminalte Engine
llm.shutdown()
```
### **Why this approach?**
- **Dynamic Port Allocation**: Avoids port conflicts by selecting an available port at runtime, enabling multiple server instances to run in parallel.
- **Optimized for CI**: The `patch` version of `launch_server_cmd` and `sgl.Engine()` in CI environments helps manage GPU memory dynamically, preventing conflicts and improving test parallelism.
- **Better Parallel Execution**: Ensures smooth concurrent tests by avoiding fixed port collisions and optimizing memory usage.
### **Model Selection**
For demonstrations in the docs, **prefer smaller models** to reduce memory consumption and speed up inference. Running larger models in CI can lead to instability due to memory constraints.
### **Prompt Alignment Example**
When designing prompts, ensure they align with SGLangs structured formatting. For example:
```python
prompt = """You are an AI assistant. Answer concisely and accurately.
User: What is the capital of France?
Assistant: The capital of France is Paris."""
```
This keeps responses aligned with expected behavior and improves reliability across different files.