Revert "feat: add thinking_budget (#6089)" (#6181)

2025-05-10 16:07:45 -07:00
parent 921e4a8185
commit 66fc63d6b1
9 changed files with 5 additions and 196 deletions
--- a/docs/backend/sampling_params.md
+++ b/docs/backend/sampling_params.md
@@ -64,7 +64,6 @@ Please refer to our dedicated guide on [constrained decoding](./structured_outpu
 | ignore_eos                    | `bool = False`                  | Don't stop generation when EOS token is sampled.                                                                                               |
 | skip_special_tokens           | `bool = True`                   | Remove special tokens during decoding.                                                                                                         |
 | custom_params                 | `Optional[List[Optional[Dict[str, Any]]]] = None` | Used when employing `CustomLogitProcessor`. For usage, see below.                                                                              |
-| thinking_budget               | `Optional[int] = None`          | The maximum number of reasoning tokens that can be generated for a request. |

 ## Examples

@@ -297,29 +296,3 @@ response = requests.post(
 )
 print(response.json())
 ```
-
-### Thinking Budget
-
-Launch a server with `--reasoning-parser`.
-
-```bash
-python3 -m sglang.launch_server --model Qwen/Qwen3-8B --reasoning-parser qwen3
-```
-
-Send a request:
-
-```python
-import requests
-response = requests.post(
-    "http://localhost:30000/generate",
-    json={
-        "text": "9.11 and 9.8, which is greater?",
-        "sampling_params": {
-            "temperature": 0.3,
-            "max_new_tokens": 256,
-            "thinking_budget": 20,
-        },
-    },
-)
-print(response.json())
-```