Merged three native APIs into one: get_server_info (#2152)
This commit is contained in:
committed by
GitHub
parent
84a1698d67
commit
dbe1729395
@@ -9,13 +9,11 @@
|
||||
"Apart from the OpenAI compatible APIs, the SGLang Runtime also provides its native server APIs. We introduce these following APIs:\n",
|
||||
"\n",
|
||||
"- `/generate` (text generation model)\n",
|
||||
"- `/get_server_args`\n",
|
||||
"- `/get_model_info`\n",
|
||||
"- `/get_server_info`\n",
|
||||
"- `/health`\n",
|
||||
"- `/health_generate`\n",
|
||||
"- `/flush_cache`\n",
|
||||
"- `/get_memory_pool_size`\n",
|
||||
"- `/get_max_total_num_tokens`\n",
|
||||
"- `/update_weights`\n",
|
||||
"- `/encode`(embedding model)\n",
|
||||
"- `/classify`(reward model)\n",
|
||||
@@ -75,26 +73,6 @@
|
||||
"print_highlight(response.json())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Get Server Args\n",
|
||||
"Get the arguments of a server."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"url = \"http://localhost:30010/get_server_args\"\n",
|
||||
"\n",
|
||||
"response = requests.get(url)\n",
|
||||
"print_highlight(response.json())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -123,6 +101,32 @@
|
||||
"assert response_json.keys() == {\"model_path\", \"is_generation\"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Get Server Info\n",
|
||||
"Gets the server information including CLI arguments, token limits, and memory pool sizes.\n",
|
||||
"- Note: `get_server_info` merges the following deprecated endpoints:\n",
|
||||
" - `get_server_args`\n",
|
||||
" - `get_memory_pool_size` \n",
|
||||
" - `get_max_total_num_tokens`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get_server_info\n",
|
||||
"\n",
|
||||
"url = \"http://localhost:30010/get_server_info\"\n",
|
||||
"\n",
|
||||
"response = requests.get(url)\n",
|
||||
"print_highlight(response.text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -179,52 +183,6 @@
|
||||
"print_highlight(response.text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Get Memory Pool Size\n",
|
||||
"\n",
|
||||
"Get the memory pool size in number of tokens.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get_memory_pool_size\n",
|
||||
"\n",
|
||||
"url = \"http://localhost:30010/get_memory_pool_size\"\n",
|
||||
"\n",
|
||||
"response = requests.get(url)\n",
|
||||
"print_highlight(response.text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Get Maximum Total Number of Tokens\n",
|
||||
"\n",
|
||||
"Exposes the maximum number of tokens SGLang can handle based on the current configuration."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get_max_total_num_tokens\n",
|
||||
"\n",
|
||||
"url = \"http://localhost:30010/get_max_total_num_tokens\"\n",
|
||||
"\n",
|
||||
"response = requests.get(url)\n",
|
||||
"print_highlight(response.text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
Reference in New Issue
Block a user