Revert "feat: add thinking_budget (#6089)" (#6181)

This commit is contained in:
Yineng Zhang
2025-05-10 16:07:45 -07:00
committed by GitHub
parent 921e4a8185
commit 66fc63d6b1
9 changed files with 5 additions and 196 deletions

View File

@@ -64,7 +64,6 @@ Please refer to our dedicated guide on [constrained decoding](./structured_outpu
| ignore_eos | `bool = False` | Don't stop generation when EOS token is sampled. |
| skip_special_tokens | `bool = True` | Remove special tokens during decoding. |
| custom_params | `Optional[List[Optional[Dict[str, Any]]]] = None` | Used when employing `CustomLogitProcessor`. For usage, see below. |
| thinking_budget | `Optional[int] = None` | The maximum number of reasoning tokens that can be generated for a request. |
## Examples
@@ -297,29 +296,3 @@ response = requests.post(
)
print(response.json())
```
### Thinking Budget
Launch a server with `--reasoning-parser`.
```bash
python3 -m sglang.launch_server --model Qwen/Qwen3-8B --reasoning-parser qwen3
```
Send a request:
```python
import requests
response = requests.post(
"http://localhost:30000/generate",
json={
"text": "9.11 and 9.8, which is greater?",
"sampling_params": {
"temperature": 0.3,
"max_new_tokens": 256,
"thinking_budget": 20,
},
},
)
print(response.json())
```