[Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308)

2024-09-02 21:44:45 -07:00
parent a5a134f39f
commit f64eae3a29
17 changed files with 105 additions and 158 deletions
--- a/docs/en/custom_chat_template.md
+++ b/docs/en/custom_chat_template.md
@@ -1,6 +1,9 @@
 # Custom Chat Template in SGLang Runtime

-By default, the server uses the chat template specified in the model tokenizer from Hugging Face. It should just work for most official models such as Llama-2/Llama-3.
+**NOTE**: There are two chat template systems in SGLang project. This document is about setting a custom chat template for the OpenAI-compatible API server (defined at [conversation.py](../../python/sglang/srt/conversation.py)). It is NOT related to the chat template used in the SGLang language frontend (defined at [chat_template.py](../../python/sglang/lang/chat_template.py)).
+
+By default, the server uses the chat template specified in the model tokenizer from Hugging Face.
+It should just work for most official models such as Llama-2/Llama-3.

 If needed, you can also override the chat template when launching the server: