Add examples to token-in-token-out for LLM (#4010)

This commit is contained in:
Chayenne
2025-03-02 21:03:49 -08:00
committed by GitHub
parent 9e1014cf99
commit 728e175fc4
2 changed files with 4 additions and 5 deletions

View File

@@ -52,7 +52,7 @@ Please consult the documentation below to learn more about the parameters you ma
* `chat_template`: The chat template to use. Deviating from the default might lead to unexpected responses. For multi-modal chat templates, refer to [here](https://docs.sglang.ai/backend/openai_api_vision.html#Chat-Template).
* `is_embedding`: Set to true to perform [embedding](https://docs.sglang.ai/backend/openai_api_embeddings.html) / [encode](https://docs.sglang.ai/backend/native_api.html#Encode-(embedding-model)) and [reward](https://docs.sglang.ai/backend/native_api.html#Classify-(reward-model)) tasks.
* `revision`: Adjust if a specific version of the model should be used.
* `skip_tokenizer_init`: Set to true to provide the tokens to the engine and get the output tokens directly, typically used in RLHF. Please see this [example for reference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/input_ids.py).
* `skip_tokenizer_init`: Set to true to provide the tokens to the engine and get the output tokens directly, typically used in RLHF. Please see this [example for reference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/token_in_token_out_llm.py).
* `json_model_override_args`: Override model config with the provided JSON.
* `delete_ckpt_after_loading`: Delete the model checkpoint after loading the model.

View File

@@ -1,5 +1,5 @@
"""
This example demonstrates how to provide tokenized ids as input instead of text prompt
This example demonstrates how to provide tokenized ids to LLM as input instead of text prompt, i.e. a token-in-token-out workflow.
"""
import sglang as sgl
@@ -24,14 +24,13 @@ def main():
token_ids_list = [tokenizer.encode(prompt) for prompt in prompts]
# Create an LLM.
# You can also specify `skip_tokenizer_init=True`, but it requires explicit detokenization at the end
llm = sgl.Engine(model_path=MODEL_PATH)
llm = sgl.Engine(model_path=MODEL_PATH, skip_tokenizer_init=True)
outputs = llm.generate(input_ids=token_ids_list, sampling_params=sampling_params)
# Print the outputs.
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated Text: {output['text']}")
print(f"Prompt: {prompt}\nGenerated token ids: {output['token_ids']}")
# The __main__ condition is necessary here because we use "spawn" to create subprocesses