Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>
This commit is contained in:
@@ -55,6 +55,7 @@ Please refer to our dedicated guide on [constrained decoding](https://docs.sglan
|
||||
* `ignore_eos`: Don't stop generation when EOS token is sampled.
|
||||
* `skip_special_tokens`: Remove special tokens during decoding.
|
||||
* `custom_params`: Used when employing `CustomLogitProcessor`. For usage see below.
|
||||
* `return_hidden_states`: Whether to return hidden states of the model. Note that each time it changes, the cuda graph will be recaptured, which might lead to a performance hit. See the [examples](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/hidden_states.py) for more information.
|
||||
|
||||
|
||||
### Custom Logit Processor
|
||||
|
||||
Reference in New Issue
Block a user