Add return hidden state in the native API (#3897)

Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
This commit is contained in:
Qiaolin Yu
2025-02-27 01:06:54 -05:00
committed by GitHub
parent 71ed01833d
commit d6898dd253
9 changed files with 112 additions and 34 deletions

View File

@@ -55,6 +55,7 @@ Please refer to our dedicated guide on [constrained decoding](https://docs.sglan
* `ignore_eos`: Don't stop generation when EOS token is sampled.
* `skip_special_tokens`: Remove special tokens during decoding.
* `custom_params`: Used when employing `CustomLogitProcessor`. For usage see below.
* `return_hidden_states`: Whether to return hidden states of the model. Note that each time it changes, the cuda graph will be recaptured, which might lead to a performance hit. See the [examples](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/hidden_states.py) for more information.
### Custom Logit Processor