[Docs]: Fix Multi-User Port Allocation Conflicts (#3601)

Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com>
2025-02-19 19:15:44 +00:00
parent 6b0aeb58fd
commit 55de40f782
12 changed files with 168 additions and 117 deletions
--- a/docs/README.md
+++ b/docs/README.md
@@ -36,42 +36,70 @@ find . -name '*.ipynb' -exec nbstripout {} \;
 # After these checks pass, push your changes and open a PR on your branch
 pre-commit run --all-files
 ```
+---

+### **Port Allocation and CI Efficiency**

-If you need to run and shut down a SGLang server or engine, following these examples:
-
-1. Launch and close Sever:
+**To launch and kill the server:**

 ```python
-#Launch Sever
+from sglang.test.test_utils import is_in_ci
+from sglang.utils import wait_for_server, print_highlight, terminate_process

-from sglang.utils import (
-    execute_shell_command,
-    wait_for_server,
-    terminate_process,
-    print_highlight,
+if is_in_ci():
+    from patch import launch_server_cmd
+else:
+    from sglang.utils import launch_server_cmd
+
+server_process, port = launch_server_cmd(
+    """
+python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
+ --host 0.0.0.0
+"""
 )

-server_process = execute_shell_command(
-    "python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30000 --host 0.0.0.0"
-)
-
-wait_for_server("http://localhost:30000")
-
-# Terminate Sever
+wait_for_server(f"http://localhost:{port}")

+# Terminate Server
 terminate_process(server_process)
 ```
-2. Launch Engine and close Engine
+
+**To launch and kill the engine:**

 ```python
 # Launch Engine
-
 import sglang as sgl
 import asyncio
+from sglang.test.test_utils import is_in_ci
+
+if is_in_ci():
+    import patch

 llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

 # Terminalte Engine
 llm.shutdown()
 ```
+
+### **Why this approach?**
+
+- **Dynamic Port Allocation**: Avoids port conflicts by selecting an available port at runtime, enabling multiple server instances to run in parallel.
+- **Optimized for CI**: The `patch` version of `launch_server_cmd` and `sgl.Engine()` in CI environments helps manage GPU memory dynamically, preventing conflicts and improving test parallelism.
+- **Better Parallel Execution**: Ensures smooth concurrent tests by avoiding fixed port collisions and optimizing memory usage.
+
+### **Model Selection**
+
+For demonstrations in the docs, **prefer smaller models** to reduce memory consumption and speed up inference. Running larger models in CI can lead to instability due to memory constraints.
+
+### **Prompt Alignment Example**
+
+When designing prompts, ensure they align with SGLang’s structured formatting. For example:
+
+```python
+prompt = """You are an AI assistant. Answer concisely and accurately.
+
+User: What is the capital of France?
+Assistant: The capital of France is Paris."""
+```
+
+This keeps responses aligned with expected behavior and improves reliability across different files.