Merged three native APIs into one: get_server_info (#2152)

2024-11-24 01:37:58 -08:00
parent 84a1698d67
commit dbe1729395
10 changed files with 81 additions and 126 deletions
--- a/docs/backend/native_api.ipynb
+++ b/docs/backend/native_api.ipynb
@@ -9,13 +9,11 @@
    "Apart from the OpenAI compatible APIs, the SGLang Runtime also provides its native server APIs. We introduce these following APIs:\n",
    "\n",
    "- `/generate` (text generation model)\n",
-    "- `/get_server_args`\n",
    "- `/get_model_info`\n",
+    "- `/get_server_info`\n",
    "- `/health`\n",
    "- `/health_generate`\n",
    "- `/flush_cache`\n",
-    "- `/get_memory_pool_size`\n",
-    "- `/get_max_total_num_tokens`\n",
    "- `/update_weights`\n",
    "- `/encode`(embedding model)\n",
    "- `/classify`(reward model)\n",
@@ -75,26 +73,6 @@
    "print_highlight(response.json())"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Get Server Args\n",
-    "Get the arguments of a server."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "url = \"http://localhost:30010/get_server_args\"\n",
-    "\n",
-    "response = requests.get(url)\n",
-    "print_highlight(response.json())"
-   ]
-  },
  {
   "cell_type": "markdown",
   "metadata": {},
@@ -123,6 +101,32 @@
    "assert response_json.keys() == {\"model_path\", \"is_generation\"}"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Get Server Info\n",
+    "Gets the server information including CLI arguments, token limits, and memory pool sizes.\n",
+    "- Note: `get_server_info` merges the following deprecated endpoints:\n",
+    "  - `get_server_args`\n",
+    "  - `get_memory_pool_size` \n",
+    "  - `get_max_total_num_tokens`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# get_server_info\n",
+    "\n",
+    "url = \"http://localhost:30010/get_server_info\"\n",
+    "\n",
+    "response = requests.get(url)\n",
+    "print_highlight(response.text)"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@@ -179,52 +183,6 @@
    "print_highlight(response.text)"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Get Memory Pool Size\n",
-    "\n",
-    "Get the memory pool size in number of tokens.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# get_memory_pool_size\n",
-    "\n",
-    "url = \"http://localhost:30010/get_memory_pool_size\"\n",
-    "\n",
-    "response = requests.get(url)\n",
-    "print_highlight(response.text)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Get Maximum Total Number of Tokens\n",
-    "\n",
-    "Exposes the maximum number of tokens SGLang can handle based on the current configuration."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# get_max_total_num_tokens\n",
-    "\n",
-    "url = \"http://localhost:30010/get_max_total_num_tokens\"\n",
-    "\n",
-    "response = requests.get(url)\n",
-    "print_highlight(response.text)"
-   ]
-  },
  {
   "cell_type": "markdown",
   "metadata": {},