Native api (#1886)

Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>
This commit is contained in:
Chayenne
2024-11-02 01:02:17 -07:00
committed by GitHub
parent 5a9a4f41c6
commit 3b60558dd7
8 changed files with 184 additions and 66 deletions

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Native Server API\n",
"# Native API\n",
"\n",
"Apart from the OpenAI compatible API, the SGLang Runtime also provides its native server API. We introduce these following API:\n",
"\n",
@@ -254,7 +254,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [

View File

@@ -36,7 +36,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -69,7 +69,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:45:16.624550Z",
@@ -110,7 +110,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:45:18.090228Z",
@@ -151,12 +151,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Streaming mode is also supported"
"Streaming mode is also supported."
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:45:21.195226Z",
@@ -190,7 +190,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:45:21.676813Z",
@@ -226,7 +226,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:45:23.186337Z",
@@ -272,7 +272,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:45:26.772016Z",
@@ -334,7 +334,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:45:26.796422Z",
@@ -389,7 +389,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:45:29.812339Z",
@@ -472,7 +472,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:45:54.854018Z",
@@ -567,7 +567,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 11,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:46:07.896114Z",
@@ -587,6 +587,18 @@
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,

View File

@@ -36,7 +36,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -63,21 +63,19 @@
"source": [
"## Using cURL\n",
"\n",
"Once the server is up, you can send test requests using curl."
"Once the server is up, you can send test requests using curl or requests."
]
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"\n",
"curl_command = \"\"\"\n",
"curl http://localhost:30010/v1/chat/completions \\\n",
" -H \"Content-Type: application/json\" \\\n",
" -H \"Authorization: Bearer None\" \\\n",
"curl -s http://localhost:30010/v1/chat/completions \\\n",
" -d '{\n",
" \"model\": \"meta-llama/Llama-3.2-11B-Vision-Instruct\",\n",
" \"messages\": [\n",
@@ -109,14 +107,57 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using OpenAI Python Client\n",
"\n",
"You can use the OpenAI Python API library to send requests."
"## Using OpenAI Compatible API w/ Requests"
]
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"\n",
"url = \"http://localhost:30010/v1/chat/completions\"\n",
"\n",
"data = {\n",
" \"model\": \"meta-llama/Llama-3.2-11B-Vision-Instruct\",\n",
" \"messages\": [\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": [\n",
" {\n",
" \"type\": \"text\",\n",
" \"text\": \"Whats in this image?\"\n",
" },\n",
" {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\n",
" \"url\": \"https://github.com/sgl-project/sglang/blob/main/test/lang/example_image.png?raw=true\"\n",
" }\n",
" }\n",
" ]\n",
" }\n",
" ],\n",
" \"max_tokens\": 300\n",
"}\n",
"\n",
"response = requests.post(url, json=data)\n",
"print_highlight(response.text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using OpenAI Python Client\n",
"\n",
"Also, you can use the OpenAI Python API library to send requests."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -160,7 +201,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -202,7 +243,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -233,6 +274,18 @@
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Embedding Model\n",
"# OpenAI APIs - Embedding\n",
"\n",
"SGLang supports embedding models in the same way as completion models. Here are some example models:\n",
"\n",
@@ -62,7 +62,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use Curl"
"## Using cURL"
]
},
{
@@ -83,8 +83,6 @@
"text = \"Once upon a time\"\n",
"\n",
"curl_text = f\"\"\"curl -s http://localhost:30010/v1/embeddings \\\n",
" -H \"Content-Type: application/json\" \\\n",
" -H \"Authorization: Bearer None\" \\\n",
" -d '{{\"model\": \"Alibaba-NLP/gte-Qwen2-7B-instruct\", \"input\": \"{text}\"}}'\"\"\"\n",
"\n",
"text_embedding = json.loads(subprocess.check_output(curl_text, shell=True))[\"data\"][0][\n",
@@ -98,7 +96,37 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using OpenAI Compatible API"
"## Using OpenAI Compatible API w/ Requests"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"\n",
"text = \"Once upon a time\"\n",
"\n",
"response = requests.post(\n",
" \"http://localhost:30010/v1/embeddings\",\n",
" json={\n",
" \"model\": \"Alibaba-NLP/gte-Qwen2-7B-instruct\",\n",
" \"input\": text\n",
" }\n",
")\n",
"\n",
"text_embedding = response.json()[\"data\"][0][\"embedding\"]\n",
"\n",
"print_highlight(f\"Text embedding (first 10): {text_embedding[:10]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using OpenAI Python Client"
]
},
{
@@ -160,8 +188,6 @@
"input_ids = tokenizer.encode(text)\n",
"\n",
"curl_ids = f\"\"\"curl -s http://localhost:30010/v1/embeddings \\\n",
" -H \"Content-Type: application/json\" \\\n",
" -H \"Authorization: Bearer None\" \\\n",
" -d '{{\"model\": \"Alibaba-NLP/gte-Qwen2-7B-instruct\", \"input\": {json.dumps(input_ids)}}}'\"\"\"\n",
"\n",
"input_ids_embedding = json.loads(subprocess.check_output(curl_ids, shell=True))[\"data\"][\n",
@@ -173,7 +199,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:48:01.875204Z",