Fix formatting in long code blocks (#10528)

2025-09-16 12:02:05 -07:00
parent 0abb41c70d
commit 7f028b07c4
11 changed files with 345 additions and 50 deletions
--- a/docs/basic_usage/deepseek.md
+++ b/docs/basic_usage/deepseek.md
@@ -158,7 +158,14 @@ The precompilation process typically takes around 10 minutes to complete.
 **Usage**:
 Add arguments `--speculative-algorithm`, `--speculative-num-steps`, `--speculative-eagle-topk` and `--speculative-num-draft-tokens` to enable this feature. For example:
 ```
-python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3-0324 --speculative-algorithm EAGLE --speculative-num-steps 1 --speculative-eagle-topk 1 --speculative-num-draft-tokens 2 --trust-remote-code --tp 8
+python3 -m sglang.launch_server \
+  --model-path deepseek-ai/DeepSeek-V3-0324 \
+  --speculative-algorithm EAGLE \
+  --speculative-num-steps 1 \
+  --speculative-eagle-topk 1 \
+  --speculative-num-draft-tokens 2 \
+  --trust-remote-code \
+  --tp 8
 ```
 - The best configuration for `--speculative-num-steps`, `--speculative-eagle-topk` and `--speculative-num-draft-tokens` can be searched with [bench_speculative.py](https://github.com/sgl-project/sglang/blob/main/scripts/playground/bench_speculative.py) script for given batch size. The minimum configuration is `--speculative-num-steps 1 --speculative-eagle-topk 1 --speculative-num-draft-tokens 2`, which can achieve speedup for larger batch sizes.
 - FlashAttention3, FlashMLA, and Triton backend fully supports MTP usage. For FlashInfer backend (`--attention-backend flashinfer`) with speculative decoding,`--speculative-eagle-topk` parameter should be set to `1`. MTP support for the CutlassMLA and TRTLLM MLA backends are still under development.
@@ -177,7 +184,14 @@ See [Reasoning Parser](https://docs.sglang.ai/advanced_features/separate_reasoni
 Add arguments `--tool-call-parser deepseekv3` and `--chat-template ./examples/chat_template/tool_chat_template_deepseekv3.jinja`(recommended) to enable this feature. For example (running on 1 * H20 node):

 ```
-python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3-0324 --tp 8 --port 30000 --host 0.0.0.0 --mem-fraction-static 0.9 --tool-call-parser deepseekv3 --chat-template ./examples/chat_template/tool_chat_template_deepseekv3.jinja
+python3 -m sglang.launch_server \
+  --model deepseek-ai/DeepSeek-V3-0324 \
+  --tp 8 \
+  --port 30000 \
+  --host 0.0.0.0 \
+  --mem-fraction-static 0.9 \
+  --tool-call-parser deepseekv3 \
+  --chat-template ./examples/chat_template/tool_chat_template_deepseekv3.jinja
 ```

 Sample Request:
--- a/docs/basic_usage/gpt_oss.md
+++ b/docs/basic_usage/gpt_oss.md
@@ -43,7 +43,12 @@ export PYTHON_EXECUTION_BACKEND=UV

 Launch the server with the demo tool server:

-`python3 -m sglang.launch_server --model-path openai/gpt-oss-120b --tool-server demo --tp 2`
+```bash
+python3 -m sglang.launch_server \
+  --model-path openai/gpt-oss-120b \
+  --tool-server demo \
+  --tp 2
+```

 For production usage, sglang can act as an MCP client for multiple services. An [example tool server](https://github.com/openai/gpt-oss/tree/main/gpt-oss-mcp-server) is provided. Start the servers and point sglang to them:
 ```bash
--- a/docs/basic_usage/llama4.md
+++ b/docs/basic_usage/llama4.md
@@ -11,7 +11,10 @@ Ongoing optimizations are tracked in the [Roadmap](https://github.com/sgl-projec
 To serve Llama 4 models on 8xH100/H200 GPUs:

 ```bash
-python3 -m sglang.launch_server --model-path meta-llama/Llama-4-Scout-17B-16E-Instruct --tp 8 --context-length 1000000
+python3 -m sglang.launch_server \
+  --model-path meta-llama/Llama-4-Scout-17B-16E-Instruct \
+  --tp 8 \
+  --context-length 1000000
 ```

 ### Configuration Tips
@@ -29,7 +32,16 @@ python3 -m sglang.launch_server --model-path meta-llama/Llama-4-Scout-17B-16E-In
 **Usage**:
 Add arguments `--speculative-draft-model-path`, `--speculative-algorithm`, `--speculative-num-steps`, `--speculative-eagle-topk` and `--speculative-num-draft-tokens` to enable this feature. For example:
 ```
-python3 -m sglang.launch_server --model-path meta-llama/Llama-4-Maverick-17B-128E-Instruct --speculative-algorithm EAGLE3  --speculative-draft-model-path nvidia/Llama-4-Maverick-17B-128E-Eagle3 --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --trust-remote-code --tp 8 --context-length 1000000
+python3 -m sglang.launch_server \
+  --model-path meta-llama/Llama-4-Maverick-17B-128E-Instruct \
+  --speculative-algorithm EAGLE3 \
+  --speculative-draft-model-path nvidia/Llama-4-Maverick-17B-128E-Eagle3 \
+  --speculative-num-steps 3 \
+  --speculative-eagle-topk 1 \
+  --speculative-num-draft-tokens 4 \
+  --trust-remote-code \
+  --tp 8 \
+  --context-length 1000000
 ```

 - **Note** The Llama 4 draft model *nvidia/Llama-4-Maverick-17B-128E-Eagle3* can only recognize conversations in chat mode.
@@ -50,11 +62,21 @@ Commands:

 ```bash
 # Llama-4-Scout-17B-16E-Instruct model
-python -m sglang.launch_server --model-path meta-llama/Llama-4-Scout-17B-16E-Instruct --port 30000 --tp 8 --mem-fraction-static 0.8 --context-length 65536
+python -m sglang.launch_server \
+  --model-path meta-llama/Llama-4-Scout-17B-16E-Instruct \
+  --port 30000 \
+  --tp 8 \
+  --mem-fraction-static 0.8 \
+  --context-length 65536
 lm_eval --model local-chat-completions --model_args model=meta-llama/Llama-4-Scout-17B-16E-Instruct,base_url=http://localhost:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048 --tasks mmlu_pro --batch_size 128 --apply_chat_template --num_fewshot 0

 # Llama-4-Maverick-17B-128E-Instruct
-python -m sglang.launch_server --model-path meta-llama/Llama-4-Maverick-17B-128E-Instruct --port 30000 --tp 8 --mem-fraction-static 0.8 --context-length 65536
+python -m sglang.launch_server \
+  --model-path meta-llama/Llama-4-Maverick-17B-128E-Instruct \
+  --port 30000 \
+  --tp 8 \
+  --mem-fraction-static 0.8 \
+  --context-length 65536
 lm_eval --model local-chat-completions --model_args model=meta-llama/Llama-4-Maverick-17B-128E-Instruct,base_url=http://localhost:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048 --tasks mmlu_pro --batch_size 128 --apply_chat_template --num_fewshot 0
 ```

--- a/docs/basic_usage/qwen3.md
+++ b/docs/basic_usage/qwen3.md
@@ -21,7 +21,13 @@ python3 -m sglang.launch_server --model Qwen/Qwen3-Next-80B-A3B-Instruct --tp 4
 Add arguments `--speculative-algorithm`, `--speculative-num-steps`, `--speculative-eagle-topk` and `--speculative-num-draft-tokens` to enable this feature. For example:

 ``` bash
-python3 -m sglang.launch_server --model Qwen/Qwen3-Next-80B-A3B-Instruct --tp 4 --speculative-num-steps 3  --speculative-eagle-topk 1  --speculative-num-draft-tokens 4 --speculative-algo NEXTN
+python3 -m sglang.launch_server \
+  --model Qwen/Qwen3-Next-80B-A3B-Instruct \
+  --tp 4 \
+  --speculative-num-steps 3 \
+  --speculative-eagle-topk 1 \
+  --speculative-num-draft-tokens 4 \
+  --speculative-algo NEXTN
 ```

 Details can be seen in [this PR](https://github.com/sgl-project/sglang/pull/10233).
--- a/docs/basic_usage/sampling_params.md
+++ b/docs/basic_usage/sampling_params.md
@@ -258,7 +258,10 @@ Detailed example in [structured outputs](../advanced_features/structured_outputs
 Launch a server with `--enable-custom-logit-processor` flag on.

 ```bash
-python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --port 30000 --enable-custom-logit-processor
+python -m sglang.launch_server \
+  --model-path meta-llama/Meta-Llama-3-8B-Instruct \
+  --port 30000 \
+  --enable-custom-logit-processor
 ```

 Define a custom logit processor that will always sample a specific token id.