Add examples for returning hidden states when using the server (#4074)

This commit is contained in:
Qiaolin Yu
2025-03-04 22:31:50 -05:00
committed by GitHub
parent 77a3954bf7
commit 4725e3f652
4 changed files with 71 additions and 2 deletions

View File

@@ -17,7 +17,7 @@ The `/generate` endpoint accepts the following parameters in JSON format. For in
* `stream: bool = False` Whether to stream the output.
* `lora_path: Optional[Union[List[Optional[str]], Optional[str]]] = None` Path to LoRA weights.
* `custom_logit_processor: Optional[Union[List[Optional[str]], str]] = None` Custom logit processor for advanced sampling control. For usage see below.
* `return_hidden_states: bool = False` Whether to return hidden states of the model. Note that each time it changes, the cuda graph will be recaptured, which might lead to a performance hit. See the [examples](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/hidden_states.py) for more information.
* `return_hidden_states: bool = False` Whether to return hidden states of the model. Note that each time it changes, the cuda graph will be recaptured, which might lead to a performance hit. See the [examples](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states) for more information.
## Sampling params