[Docs]: Fix Multi-User Port Allocation Conflicts (#3601)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com>
This commit is contained in:
@@ -36,42 +36,70 @@ find . -name '*.ipynb' -exec nbstripout {} \;
|
||||
# After these checks pass, push your changes and open a PR on your branch
|
||||
pre-commit run --all-files
|
||||
```
|
||||
---
|
||||
|
||||
### **Port Allocation and CI Efficiency**
|
||||
|
||||
If you need to run and shut down a SGLang server or engine, following these examples:
|
||||
|
||||
1. Launch and close Sever:
|
||||
**To launch and kill the server:**
|
||||
|
||||
```python
|
||||
#Launch Sever
|
||||
from sglang.test.test_utils import is_in_ci
|
||||
from sglang.utils import wait_for_server, print_highlight, terminate_process
|
||||
|
||||
from sglang.utils import (
|
||||
execute_shell_command,
|
||||
wait_for_server,
|
||||
terminate_process,
|
||||
print_highlight,
|
||||
if is_in_ci():
|
||||
from patch import launch_server_cmd
|
||||
else:
|
||||
from sglang.utils import launch_server_cmd
|
||||
|
||||
server_process, port = launch_server_cmd(
|
||||
"""
|
||||
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
|
||||
--host 0.0.0.0
|
||||
"""
|
||||
)
|
||||
|
||||
server_process = execute_shell_command(
|
||||
"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30000 --host 0.0.0.0"
|
||||
)
|
||||
|
||||
wait_for_server("http://localhost:30000")
|
||||
|
||||
# Terminate Sever
|
||||
wait_for_server(f"http://localhost:{port}")
|
||||
|
||||
# Terminate Server
|
||||
terminate_process(server_process)
|
||||
```
|
||||
2. Launch Engine and close Engine
|
||||
|
||||
**To launch and kill the engine:**
|
||||
|
||||
```python
|
||||
# Launch Engine
|
||||
|
||||
import sglang as sgl
|
||||
import asyncio
|
||||
from sglang.test.test_utils import is_in_ci
|
||||
|
||||
if is_in_ci():
|
||||
import patch
|
||||
|
||||
llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
|
||||
|
||||
# Terminalte Engine
|
||||
llm.shutdown()
|
||||
```
|
||||
|
||||
### **Why this approach?**
|
||||
|
||||
- **Dynamic Port Allocation**: Avoids port conflicts by selecting an available port at runtime, enabling multiple server instances to run in parallel.
|
||||
- **Optimized for CI**: The `patch` version of `launch_server_cmd` and `sgl.Engine()` in CI environments helps manage GPU memory dynamically, preventing conflicts and improving test parallelism.
|
||||
- **Better Parallel Execution**: Ensures smooth concurrent tests by avoiding fixed port collisions and optimizing memory usage.
|
||||
|
||||
### **Model Selection**
|
||||
|
||||
For demonstrations in the docs, **prefer smaller models** to reduce memory consumption and speed up inference. Running larger models in CI can lead to instability due to memory constraints.
|
||||
|
||||
### **Prompt Alignment Example**
|
||||
|
||||
When designing prompts, ensure they align with SGLang’s structured formatting. For example:
|
||||
|
||||
```python
|
||||
prompt = """You are an AI assistant. Answer concisely and accurately.
|
||||
|
||||
User: What is the capital of France?
|
||||
Assistant: The capital of France is Paris."""
|
||||
```
|
||||
|
||||
This keeps responses aligned with expected behavior and improves reliability across different files.
|
||||
|
||||
Reference in New Issue
Block a user