This commit is contained in:
Wang Ran (汪然)
2025-03-16 12:27:58 +08:00
committed by GitHub
parent 8ec2ce0726
commit 158430473e
2 changed files with 2 additions and 2 deletions

View File

@@ -14,7 +14,7 @@
""" """
The entry point of inference server. (SRT = SGLang Runtime) The entry point of inference server. (SRT = SGLang Runtime)
This file implements HTTP APIs for the inferenc engine via fastapi. This file implements HTTP APIs for the inference engine via fastapi.
""" """
import asyncio import asyncio

View File

@@ -19,7 +19,7 @@ from sglang.srt.torch_memory_saver_adapter import TorchMemorySaverAdapter
Memory pool. Memory pool.
SGLang has two levels of memory pool. SGLang has two levels of memory pool.
ReqToTokenPool maps a a request to its token locations. ReqToTokenPool maps a request to its token locations.
TokenToKVPoolAllocator manages the indices to kv cache data. TokenToKVPoolAllocator manages the indices to kv cache data.
KVCache actually holds the physical kv cache. KVCache actually holds the physical kv cache.
""" """