Files
sglang/docs/start/send_request.ipynb
Chayenne 3b60558dd7 Native api (#1886)
Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>
2024-11-02 01:02:17 -07:00

226 lines
6.1 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Quick Start: Sending Requests\n",
"\n",
"This notebook provides a quick-start guide for using SGLang after installation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Launch a server\n",
"\n",
"This code block is equivalent to executing \n",
"\n",
"```bash\n",
"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n",
"--port 30000 --host 0.0.0.0\n",
"```\n",
"\n",
"in your terminal and wait for the server to be ready. Once the server is running, you can send test requests using curl or requests. The server implements the [OpenAI-compatible API](https://platform.openai.com/docs/api-reference/chat)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:46:13.611212Z",
"iopub.status.busy": "2024-11-01T02:46:13.611093Z",
"iopub.status.idle": "2024-11-01T02:46:42.810261Z",
"shell.execute_reply": "2024-11-01T02:46:42.809147Z"
}
},
"outputs": [],
"source": [
"from sglang.utils import (\n",
" execute_shell_command,\n",
" wait_for_server,\n",
" terminate_process,\n",
" print_highlight,\n",
")\n",
"\n",
"server_process = execute_shell_command(\n",
"\"\"\"\n",
"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n",
"--port 30000 --host 0.0.0.0\n",
"\"\"\"\n",
")\n",
"\n",
"wait_for_server(\"http://localhost:30000\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using cURL\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import subprocess, json\n",
"\n",
"curl_command = \"\"\"\n",
"curl -s http://localhost:30000/v1/chat/completions \\\n",
" -d '{\"model\": \"meta-llama/Meta-Llama-3.1-8B-Instruct\", \"messages\": [{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"What is a LLM?\"}]}'\n",
"\"\"\"\n",
"\n",
"response = json.loads(subprocess.check_output(curl_command, shell=True))\n",
"print_highlight(response)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using OpenAI Compatible API w/ Requests"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:46:42.813656Z",
"iopub.status.busy": "2024-11-01T02:46:42.813354Z",
"iopub.status.idle": "2024-11-01T02:46:51.436613Z",
"shell.execute_reply": "2024-11-01T02:46:51.435965Z"
}
},
"outputs": [],
"source": [
"import requests\n",
"\n",
"url = \"http://localhost:30000/v1/chat/completions\"\n",
"\n",
"data = {\n",
" \"model\": \"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
" \"messages\": [\n",
" {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
" {\"role\": \"user\", \"content\": \"What is a LLM?\"}\n",
" ]\n",
"}\n",
"\n",
"response = requests.post(url, json=data)\n",
"print_highlight(response.json())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using OpenAI Python Client\n",
"\n",
"You can also use the OpenAI Python API library to send requests."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:46:51.439372Z",
"iopub.status.busy": "2024-11-01T02:46:51.439178Z",
"iopub.status.idle": "2024-11-01T02:46:52.895776Z",
"shell.execute_reply": "2024-11-01T02:46:52.895318Z"
}
},
"outputs": [],
"source": [
"import openai\n",
"\n",
"client = openai.Client(base_url=\"http://127.0.0.1:30000/v1\", api_key=\"None\")\n",
"\n",
"response = client.chat.completions.create(\n",
" model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": \"You are a helpful AI assistant\"},\n",
" {\"role\": \"user\", \"content\": \"List 3 countries and their capitals.\"},\n",
" ],\n",
" temperature=0,\n",
" max_tokens=64,\n",
")\n",
"print_highlight(response)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using Native Generation APIs\n",
"\n",
"You can also use the native `/generate` endpoint with requests, which provides more flexiblity. An API reference is available at [Sampling Parameters](https://sgl-project.github.io/references/sampling_params.html)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"\n",
"response = requests.post(\n",
" \"http://localhost:30000/generate\",\n",
" json={\n",
" \"text\": \"The capital of France is\",\n",
" \"sampling_params\": {\n",
" \"temperature\": 0,\n",
" \"max_new_tokens\": 32,\n",
" },\n",
" },\n",
")\n",
"\n",
"print_highlight(response.json())"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"execution": {
"iopub.execute_input": "2024-11-01T02:46:52.898411Z",
"iopub.status.busy": "2024-11-01T02:46:52.898149Z",
"iopub.status.idle": "2024-11-01T02:46:54.398382Z",
"shell.execute_reply": "2024-11-01T02:46:54.397564Z"
}
},
"outputs": [],
"source": [
"terminate_process(server_process)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}