This commit is contained in:
Wang Ran (汪然)
2025-03-16 12:27:58 +08:00
committed by GitHub
parent 8ec2ce0726
commit 158430473e
2 changed files with 2 additions and 2 deletions

View File

@@ -14,7 +14,7 @@
"""
The entry point of inference server. (SRT = SGLang Runtime)
This file implements HTTP APIs for the inferenc engine via fastapi.
This file implements HTTP APIs for the inference engine via fastapi.
"""
import asyncio

View File

@@ -19,7 +19,7 @@ from sglang.srt.torch_memory_saver_adapter import TorchMemorySaverAdapter
Memory pool.
SGLang has two levels of memory pool.
ReqToTokenPool maps a a request to its token locations.
ReqToTokenPool maps a request to its token locations.
TokenToKVPoolAllocator manages the indices to kv cache data.
KVCache actually holds the physical kv cache.
"""