[Feature] Prefill assistant response - add continue_final_message parameter (#4226)

Co-authored-by: Chayenne <zhaochen20@outlook.com>
This commit is contained in:
Adarsh Shirawalmath
2025-04-21 06:07:18 +05:30
committed by GitHub
parent 5156d5a413
commit 8b39274e34
6 changed files with 82 additions and 23 deletions

View File

@@ -8,7 +8,9 @@ The below examples will mostly need you to start a server in a separate terminal
* `multimodal_embedding.py`: An example how perform [multi modal embedding](Alibaba-NLP/gme-Qwen2-VL-2B-Instruct).
* `openai_batch_chat.py`: An example how to process batch requests for chat completions.
* `openai_batch_complete.py`: An example how to process batch requests for text completions.
* `openai_chat_with_response_prefill.py`: An example how to [prefill](https://eugeneyan.com/writing/prompting/#prefill-claudes-responses) a response using OpenAI API.
* **`openai_chat_with_response_prefill.py`**:
An example that demonstrates how to [prefill a response](https://eugeneyan.com/writing/prompting/#prefill-claudes-responses) using the OpenAI API by enabling the `continue_final_message` parameter.
When enabled, the final (partial) assistant message is removed and its content is used as a prefill so that the model continues that message rather than starting a new turn. See [Anthropic's prefill example](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response#example-structured-data-extraction-with-prefilling) for more context.
* `reward_model.py`: An example how to extract scores from a reward model.
* `vertex_predict.py`: An example how to deploy a model to [Vertex AI](https://cloud.google.com/vertex-ai?hl=en).