677 lines
26 KiB
Plaintext
677 lines
26 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# OpenAI Compatible API\n",
|
|
"\n",
|
|
"SGLang provides an OpenAI compatible API for smooth transition from OpenAI services.\n",
|
|
"\n",
|
|
"- `chat/completions`\n",
|
|
"- `completions`\n",
|
|
"- `batches`\n",
|
|
"- `embeddings`(refer to [embedding_model.ipynb](embedding_model.ipynb))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Chat Completions\n",
|
|
"\n",
|
|
"### Usage\n",
|
|
"\n",
|
|
"Similar to [send_request.ipynb](send_request.ipynb), we can send a chat completion request to SGLang server with OpenAI API format."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 38,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Server is ready. Proceeding with the next steps.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from sglang.utils import execute_shell_command, wait_for_server, terminate_process\n",
|
|
"\n",
|
|
"server_process = execute_shell_command(\n",
|
|
" \"\"\"\n",
|
|
"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n",
|
|
"--port 30000 --host 0.0.0.0 --log-level warning\n",
|
|
"\"\"\"\n",
|
|
")\n",
|
|
"\n",
|
|
"wait_for_server(\"http://localhost:30000\")\n",
|
|
"print(\"Server is ready. Proceeding with the next steps.\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 39,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"ChatCompletion(id='e854540ec7914b2d8c712f16fd9ed2ca', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Here are 3 countries and their capitals:\\n\\n1. **Country:** Japan\\n**Capital:** Tokyo\\n\\n2. **Country:** Australia\\n**Capital:** Canberra\\n\\n3. **Country:** Brazil\\n**Capital:** Brasília', refusal=None, role='assistant', function_call=None, tool_calls=None), matched_stop=128009)], created=1730012326, model='meta-llama/Meta-Llama-3.1-8B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=46, prompt_tokens=49, total_tokens=95, prompt_tokens_details=None))\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import openai\n",
|
|
"\n",
|
|
"# Always assign an api_key, even if not specified during server initialization.\n",
|
|
"# Setting an API key during server initialization is strongly recommended.\n",
|
|
"\n",
|
|
"client = openai.Client(base_url=\"http://127.0.0.1:30000/v1\", api_key=\"None\")\n",
|
|
"\n",
|
|
"# Chat completion example\n",
|
|
"\n",
|
|
"response = client.chat.completions.create(\n",
|
|
" model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
|
|
" messages=[\n",
|
|
" {\"role\": \"system\", \"content\": \"You are a helpful AI assistant\"},\n",
|
|
" {\"role\": \"user\", \"content\": \"List 3 countries and their capitals.\"},\n",
|
|
" ],\n",
|
|
" temperature=0,\n",
|
|
" max_tokens=64,\n",
|
|
")\n",
|
|
"print(response)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Parameters\n",
|
|
"\n",
|
|
"The chat completions API accepts the following parameters (refer to [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat/create) for more details):\n",
|
|
"\n",
|
|
"- `messages`: List of messages in the conversation, each containing `role` and `content`\n",
|
|
"- `model`: The model identifier to use for completion\n",
|
|
"- `max_tokens`: Maximum number of tokens to generate in the response\n",
|
|
"- `temperature`: Controls randomness (0-2). Lower values make output more focused and deterministic\n",
|
|
"- `top_p`: Alternative to temperature. Controls diversity via nucleus sampling\n",
|
|
"- `n`: Number of chat completion choices to generate\n",
|
|
"- `stream`: If true, partial message deltas will be sent as they become available\n",
|
|
"- `stop`: Sequences where the API will stop generating further tokens\n",
|
|
"- `presence_penalty`: Penalizes new tokens based on their presence in the text so far (-2.0 to 2.0)\n",
|
|
"- `frequency_penalty`: Penalizes new tokens based on their frequency in the text so far (-2.0 to 2.0)\n",
|
|
"- `logit_bias`: Modify the likelihood of specified tokens appearing in the completion\n",
|
|
"- `logprobs`: Include log probabilities of tokens in the response\n",
|
|
"- `top_logprobs`: Number of most likely tokens to return probabilities for\n",
|
|
"- `seed`: Random seed for deterministic results\n",
|
|
"- `response_format`: Specify the format of the response (e.g., JSON)\n",
|
|
"- `stream_options`: Additional options for streaming responses\n",
|
|
"- `user`: A unique identifier representing your end-user\n",
|
|
"\n",
|
|
"Here is an example of a detailed chat completion request:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 40,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Ancient Rome's major achievements include:"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"response = client.chat.completions.create(\n",
|
|
" model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
|
|
" messages=[\n",
|
|
" {\n",
|
|
" \"role\": \"system\",\n",
|
|
" \"content\": \"You are a knowledgeable historian who provides concise responses.\",\n",
|
|
" },\n",
|
|
" {\"role\": \"user\", \"content\": \"Tell me about ancient Rome\"},\n",
|
|
" {\n",
|
|
" \"role\": \"assistant\",\n",
|
|
" \"content\": \"Ancient Rome was a civilization centered in Italy.\",\n",
|
|
" },\n",
|
|
" {\"role\": \"user\", \"content\": \"What were their major achievements?\"},\n",
|
|
" ],\n",
|
|
" temperature=0.3, # Lower temperature for more focused responses\n",
|
|
" max_tokens=100, # Reasonable length for a concise response\n",
|
|
" top_p=0.95, # Slightly higher for better fluency\n",
|
|
" stop=[\"\\n\\n\"], # Simple stop sequence\n",
|
|
" presence_penalty=0.2, # Mild penalty to avoid repetition\n",
|
|
" frequency_penalty=0.2, # Mild penalty for more natural language\n",
|
|
" n=1, # Single response is usually more stable\n",
|
|
" seed=42, # Keep for reproducibility\n",
|
|
" stream=True, # Keep streaming for real-time output\n",
|
|
")\n",
|
|
"\n",
|
|
"for chunk in response:\n",
|
|
" print(chunk.choices[0].delta.content or \"\", end=\"\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Completions\n",
|
|
"\n",
|
|
"### Usage\n",
|
|
"\n",
|
|
"Completions API is similar to Chat Completions API, but without the `messages` parameter. Refer to [OpenAI Completions API](https://platform.openai.com/docs/api-reference/completions/create) for more details."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 41,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Completion(id='a6e07198f4b445baa0fb08a2178ceb59', choices=[CompletionChoice(finish_reason='length', index=0, logprobs=None, text=' 1. 2. 3.\\n1. United States - Washington D.C. 2. Japan - Tokyo 3. Australia - Canberra\\nList 3 countries and their capitals. 1. 2. 3.\\n1. China - Beijing 2. Brazil - Bras', matched_stop=None)], created=1730012328, model='meta-llama/Meta-Llama-3.1-8B-Instruct', object='text_completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=64, prompt_tokens=9, total_tokens=73, prompt_tokens_details=None))\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"response = client.completions.create(\n",
|
|
" model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
|
|
" prompt=\"List 3 countries and their capitals.\",\n",
|
|
" temperature=0,\n",
|
|
" max_tokens=64,\n",
|
|
" n=1,\n",
|
|
" stop=None,\n",
|
|
")\n",
|
|
"print(response)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Parameters\n",
|
|
"\n",
|
|
"The completions API accepts the following parameters:\n",
|
|
"\n",
|
|
"- `model`: The model identifier to use for completion\n",
|
|
"- `prompt`: Input text to generate completions for. Can be a string, array of strings, or token arrays\n",
|
|
"- `best_of`: Number of completions to generate server-side and return the best one\n",
|
|
"- `echo`: If true, the prompt will be included in the response\n",
|
|
"- `frequency_penalty`: Penalizes new tokens based on their frequency in the text so far (-2.0 to 2.0)\n",
|
|
"- `logit_bias`: Modify the likelihood of specified tokens appearing in the completion\n",
|
|
"- `logprobs`: Include log probabilities of tokens in the response\n",
|
|
"- `max_tokens`: Maximum number of tokens to generate in the response (default: 16)\n",
|
|
"- `n`: Number of completion choices to generate\n",
|
|
"- `presence_penalty`: Penalizes new tokens based on their presence in the text so far (-2.0 to 2.0)\n",
|
|
"- `seed`: Random seed for deterministic results\n",
|
|
"- `stop`: Sequences where the API will stop generating further tokens\n",
|
|
"- `stream`: If true, partial completion deltas will be sent as they become available\n",
|
|
"- `stream_options`: Additional options for streaming responses\n",
|
|
"- `suffix`: Text to append to the completion\n",
|
|
"- `temperature`: Controls randomness (0-2). Lower values make output more focused and deterministic\n",
|
|
"- `top_p`: Alternative to temperature. Controls diversity via nucleus sampling\n",
|
|
"- `user`: A unique identifier representing your end-user\n",
|
|
"\n",
|
|
"Here is an example of a detailed completions request:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 42,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" Space explorer, Captain Orion Blackwood, had been traveling through the galaxy for 12 years, searching for a new home for humanity. His ship, the Aurora, had been his home for so long that he barely remembered what it was like to walk on solid ground.\n",
|
|
"As he navigated through the dense asteroid field, the ship's computer, S.A.R.A. (Self-Aware Reasoning Algorithm), alerted him to a strange reading on one of the asteroids. Captain Blackwood's curiosity was piqued, and he decided to investigate further.\n",
|
|
"\"Captain, I'm detecting unusual energy signatures emanating from the asteroid,\" S.A.R.A. said. \"It's unlike anything I've seen before.\"\n",
|
|
"Captain Blackwood's eyes narrowed as"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"response = client.completions.create(\n",
|
|
" model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
|
|
" prompt=\"Write a short story about a space explorer.\",\n",
|
|
" temperature=0.7, # Moderate temperature for creative writing\n",
|
|
" max_tokens=150, # Longer response for a story\n",
|
|
" top_p=0.9, # Balanced diversity in word choice\n",
|
|
" stop=[\"\\n\\n\", \"THE END\"], # Multiple stop sequences\n",
|
|
" presence_penalty=0.3, # Encourage novel elements\n",
|
|
" frequency_penalty=0.3, # Reduce repetitive phrases\n",
|
|
" n=1, # Generate one completion\n",
|
|
" seed=123, # For reproducible results\n",
|
|
" stream=True, # Stream the response\n",
|
|
")\n",
|
|
"\n",
|
|
"for chunk in response:\n",
|
|
" print(chunk.choices[0].text or \"\", end=\"\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Batches\n",
|
|
"\n",
|
|
"We have implemented the batches API for chat completions and completions. You can upload your requests in `jsonl` files, create a batch job, and retrieve the results when the batch job is completed (which takes longer but costs less).\n",
|
|
"\n",
|
|
"The batches APIs are:\n",
|
|
"\n",
|
|
"- `batches`\n",
|
|
"- `batches/{batch_id}/cancel`\n",
|
|
"- `batches/{batch_id}`\n",
|
|
"\n",
|
|
"Here is an example of a batch job for chat completions, completions are similar.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 43,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Batch job created with ID: batch_03d7f74f-dffe-4c26-b5e7-bb9fb5cb89ff\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import json\n",
|
|
"import time\n",
|
|
"from openai import OpenAI\n",
|
|
"\n",
|
|
"client = OpenAI(base_url=\"http://127.0.0.1:30000/v1\", api_key=\"None\")\n",
|
|
"\n",
|
|
"requests = [\n",
|
|
" {\n",
|
|
" \"custom_id\": \"request-1\",\n",
|
|
" \"method\": \"POST\",\n",
|
|
" \"url\": \"/chat/completions\",\n",
|
|
" \"body\": {\n",
|
|
" \"model\": \"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
|
|
" \"messages\": [\n",
|
|
" {\"role\": \"user\", \"content\": \"Tell me a joke about programming\"}\n",
|
|
" ],\n",
|
|
" \"max_tokens\": 50,\n",
|
|
" },\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"custom_id\": \"request-2\",\n",
|
|
" \"method\": \"POST\",\n",
|
|
" \"url\": \"/chat/completions\",\n",
|
|
" \"body\": {\n",
|
|
" \"model\": \"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
|
|
" \"messages\": [{\"role\": \"user\", \"content\": \"What is Python?\"}],\n",
|
|
" \"max_tokens\": 50,\n",
|
|
" },\n",
|
|
" },\n",
|
|
"]\n",
|
|
"\n",
|
|
"input_file_path = \"batch_requests.jsonl\"\n",
|
|
"\n",
|
|
"with open(input_file_path, \"w\") as f:\n",
|
|
" for req in requests:\n",
|
|
" f.write(json.dumps(req) + \"\\n\")\n",
|
|
"\n",
|
|
"with open(input_file_path, \"rb\") as f:\n",
|
|
" file_response = client.files.create(file=f, purpose=\"batch\")\n",
|
|
"\n",
|
|
"batch_response = client.batches.create(\n",
|
|
" input_file_id=file_response.id,\n",
|
|
" endpoint=\"/v1/chat/completions\",\n",
|
|
" completion_window=\"24h\",\n",
|
|
")\n",
|
|
"\n",
|
|
"print(f\"Batch job created with ID: {batch_response.id}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 44,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Batch job status: validating...trying again in 3 seconds...\n",
|
|
"Batch job completed successfully!\n",
|
|
"Request counts: BatchRequestCounts(completed=2, failed=0, total=2)\n",
|
|
"\n",
|
|
"Request request-1:\n",
|
|
"Response: {'status_code': 200, 'request_id': 'request-1', 'body': {'id': 'request-1', 'object': 'chat.completion', 'created': 1730012333, 'model': 'meta-llama/Meta-Llama-3.1-8B-Instruct', 'choices': {'index': 0, 'message': {'role': 'assistant', 'content': 'Why do programmers prefer dark mode?\\n\\nBecause light attracts bugs.'}, 'logprobs': None, 'finish_reason': 'stop', 'matched_stop': 128009}, 'usage': {'prompt_tokens': 41, 'completion_tokens': 13, 'total_tokens': 54}, 'system_fingerprint': None}}\n",
|
|
"\n",
|
|
"Request request-2:\n",
|
|
"Response: {'status_code': 200, 'request_id': 'request-2', 'body': {'id': 'request-2', 'object': 'chat.completion', 'created': 1730012333, 'model': 'meta-llama/Meta-Llama-3.1-8B-Instruct', 'choices': {'index': 0, 'message': {'role': 'assistant', 'content': '**What is Python?**\\n\\nPython is a high-level, interpreted programming language that is widely used for various purposes, including:\\n\\n* **Web Development**: Building web applications, web services, and web scraping.\\n* **Data Science**: Data analysis'}, 'logprobs': None, 'finish_reason': 'length', 'matched_stop': None}, 'usage': {'prompt_tokens': 39, 'completion_tokens': 50, 'total_tokens': 89}, 'system_fingerprint': None}}\n",
|
|
"\n",
|
|
"Cleaning up files...\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"while batch_response.status not in [\"completed\", \"failed\", \"cancelled\"]:\n",
|
|
" time.sleep(3)\n",
|
|
" print(f\"Batch job status: {batch_response.status}...trying again in 3 seconds...\")\n",
|
|
" batch_response = client.batches.retrieve(batch_response.id)\n",
|
|
"\n",
|
|
"if batch_response.status == \"completed\":\n",
|
|
" print(\"Batch job completed successfully!\")\n",
|
|
" print(f\"Request counts: {batch_response.request_counts}\")\n",
|
|
"\n",
|
|
" result_file_id = batch_response.output_file_id\n",
|
|
" file_response = client.files.content(result_file_id)\n",
|
|
" result_content = file_response.read().decode(\"utf-8\")\n",
|
|
"\n",
|
|
" results = [\n",
|
|
" json.loads(line) for line in result_content.split(\"\\n\") if line.strip() != \"\"\n",
|
|
" ]\n",
|
|
"\n",
|
|
" for result in results:\n",
|
|
" print(f\"\\nRequest {result['custom_id']}:\")\n",
|
|
" print(f\"Response: {result['response']}\")\n",
|
|
"\n",
|
|
" print(\"\\nCleaning up files...\")\n",
|
|
" # Only delete the result file ID since file_response is just content\n",
|
|
" client.files.delete(result_file_id)\n",
|
|
"else:\n",
|
|
" print(f\"Batch job failed with status: {batch_response.status}\")\n",
|
|
" if hasattr(batch_response, \"errors\"):\n",
|
|
" print(f\"Errors: {batch_response.errors}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"It takes a while to complete the batch job. You can use these two APIs to retrieve the batch job status or cancel the batch job.\n",
|
|
"\n",
|
|
"1. `batches/{batch_id}`: Retrieve the batch job status.\n",
|
|
"2. `batches/{batch_id}/cancel`: Cancel the batch job.\n",
|
|
"\n",
|
|
"Here is an example to check the batch job status."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 45,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Created batch job with ID: batch_6b9625ac-9ebc-4c4f-bfd5-f84f88b0100d\n",
|
|
"Initial status: validating\n",
|
|
"Batch job details (check 1/5):\n",
|
|
"ID: batch_6b9625ac-9ebc-4c4f-bfd5-f84f88b0100d\n",
|
|
"Status: in_progress\n",
|
|
"Created at: 1730012334\n",
|
|
"Input file ID: backend_input_file-8203d42a-109c-4573-9663-13b5d9cb6a2b\n",
|
|
"Output file ID: None\n",
|
|
"Request counts:\n",
|
|
"Total: 0\n",
|
|
"Completed: 0\n",
|
|
"Failed: 0\n",
|
|
"Batch job details (check 2/5):\n",
|
|
"ID: batch_6b9625ac-9ebc-4c4f-bfd5-f84f88b0100d\n",
|
|
"Status: in_progress\n",
|
|
"Created at: 1730012334\n",
|
|
"Input file ID: backend_input_file-8203d42a-109c-4573-9663-13b5d9cb6a2b\n",
|
|
"Output file ID: None\n",
|
|
"Request counts:\n",
|
|
"Total: 0\n",
|
|
"Completed: 0\n",
|
|
"Failed: 0\n",
|
|
"Batch job details (check 3/5):\n",
|
|
"ID: batch_6b9625ac-9ebc-4c4f-bfd5-f84f88b0100d\n",
|
|
"Status: in_progress\n",
|
|
"Created at: 1730012334\n",
|
|
"Input file ID: backend_input_file-8203d42a-109c-4573-9663-13b5d9cb6a2b\n",
|
|
"Output file ID: None\n",
|
|
"Request counts:\n",
|
|
"Total: 0\n",
|
|
"Completed: 0\n",
|
|
"Failed: 0\n",
|
|
"Batch job details (check 4/5):\n",
|
|
"ID: batch_6b9625ac-9ebc-4c4f-bfd5-f84f88b0100d\n",
|
|
"Status: completed\n",
|
|
"Created at: 1730012334\n",
|
|
"Input file ID: backend_input_file-8203d42a-109c-4573-9663-13b5d9cb6a2b\n",
|
|
"Output file ID: backend_result_file-d32f441d-e737-4da3-b07a-c39349425b3a\n",
|
|
"Request counts:\n",
|
|
"Total: 100\n",
|
|
"Completed: 100\n",
|
|
"Failed: 0\n",
|
|
"Batch job details (check 5/5):\n",
|
|
"ID: batch_6b9625ac-9ebc-4c4f-bfd5-f84f88b0100d\n",
|
|
"Status: completed\n",
|
|
"Created at: 1730012334\n",
|
|
"Input file ID: backend_input_file-8203d42a-109c-4573-9663-13b5d9cb6a2b\n",
|
|
"Output file ID: backend_result_file-d32f441d-e737-4da3-b07a-c39349425b3a\n",
|
|
"Request counts:\n",
|
|
"Total: 100\n",
|
|
"Completed: 100\n",
|
|
"Failed: 0\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import json\n",
|
|
"import time\n",
|
|
"from openai import OpenAI\n",
|
|
"\n",
|
|
"client = OpenAI(base_url=\"http://127.0.0.1:30000/v1\", api_key=\"None\")\n",
|
|
"\n",
|
|
"requests = []\n",
|
|
"for i in range(100):\n",
|
|
" requests.append(\n",
|
|
" {\n",
|
|
" \"custom_id\": f\"request-{i}\",\n",
|
|
" \"method\": \"POST\",\n",
|
|
" \"url\": \"/chat/completions\",\n",
|
|
" \"body\": {\n",
|
|
" \"model\": \"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
|
|
" \"messages\": [\n",
|
|
" {\n",
|
|
" \"role\": \"system\",\n",
|
|
" \"content\": f\"{i}: You are a helpful AI assistant\",\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"role\": \"user\",\n",
|
|
" \"content\": \"Write a detailed story about topic. Make it very long.\",\n",
|
|
" },\n",
|
|
" ],\n",
|
|
" \"max_tokens\": 500,\n",
|
|
" },\n",
|
|
" }\n",
|
|
" )\n",
|
|
"\n",
|
|
"input_file_path = \"batch_requests.jsonl\"\n",
|
|
"with open(input_file_path, \"w\") as f:\n",
|
|
" for req in requests:\n",
|
|
" f.write(json.dumps(req) + \"\\n\")\n",
|
|
"\n",
|
|
"with open(input_file_path, \"rb\") as f:\n",
|
|
" uploaded_file = client.files.create(file=f, purpose=\"batch\")\n",
|
|
"\n",
|
|
"batch_job = client.batches.create(\n",
|
|
" input_file_id=uploaded_file.id,\n",
|
|
" endpoint=\"/v1/chat/completions\",\n",
|
|
" completion_window=\"24h\",\n",
|
|
")\n",
|
|
"\n",
|
|
"print(f\"Created batch job with ID: {batch_job.id}\")\n",
|
|
"print(f\"Initial status: {batch_job.status}\")\n",
|
|
"\n",
|
|
"time.sleep(10)\n",
|
|
"\n",
|
|
"max_checks = 5\n",
|
|
"for i in range(max_checks):\n",
|
|
" batch_details = client.batches.retrieve(batch_id=batch_job.id)\n",
|
|
" print(f\"Batch job details (check {i+1}/{max_checks}):\")\n",
|
|
" print(f\"ID: {batch_details.id}\")\n",
|
|
" print(f\"Status: {batch_details.status}\")\n",
|
|
" print(f\"Created at: {batch_details.created_at}\")\n",
|
|
" print(f\"Input file ID: {batch_details.input_file_id}\")\n",
|
|
" print(f\"Output file ID: {batch_details.output_file_id}\")\n",
|
|
"\n",
|
|
" print(\"Request counts:\")\n",
|
|
" print(f\"Total: {batch_details.request_counts.total}\")\n",
|
|
" print(f\"Completed: {batch_details.request_counts.completed}\")\n",
|
|
" print(f\"Failed: {batch_details.request_counts.failed}\")\n",
|
|
"\n",
|
|
" time.sleep(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Here is an example to cancel a batch job."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 46,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Created batch job with ID: batch_3d2dd881-ad84-465a-85ee-6d5991794e5e\n",
|
|
"Initial status: validating\n",
|
|
"Cancellation initiated. Status: cancelling\n",
|
|
"Current status: cancelled\n",
|
|
"Batch job successfully cancelled\n",
|
|
"Successfully cleaned up input file\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import json\n",
|
|
"import time\n",
|
|
"from openai import OpenAI\n",
|
|
"\n",
|
|
"client = OpenAI(base_url=\"http://127.0.0.1:30000/v1\", api_key=\"None\")\n",
|
|
"\n",
|
|
"requests = []\n",
|
|
"for i in range(500):\n",
|
|
" requests.append(\n",
|
|
" {\n",
|
|
" \"custom_id\": f\"request-{i}\",\n",
|
|
" \"method\": \"POST\",\n",
|
|
" \"url\": \"/chat/completions\",\n",
|
|
" \"body\": {\n",
|
|
" \"model\": \"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
|
|
" \"messages\": [\n",
|
|
" {\n",
|
|
" \"role\": \"system\",\n",
|
|
" \"content\": f\"{i}: You are a helpful AI assistant\",\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"role\": \"user\",\n",
|
|
" \"content\": \"Write a detailed story about topic. Make it very long.\",\n",
|
|
" },\n",
|
|
" ],\n",
|
|
" \"max_tokens\": 500,\n",
|
|
" },\n",
|
|
" }\n",
|
|
" )\n",
|
|
"\n",
|
|
"input_file_path = \"batch_requests.jsonl\"\n",
|
|
"with open(input_file_path, \"w\") as f:\n",
|
|
" for req in requests:\n",
|
|
" f.write(json.dumps(req) + \"\\n\")\n",
|
|
"\n",
|
|
"with open(input_file_path, \"rb\") as f:\n",
|
|
" uploaded_file = client.files.create(file=f, purpose=\"batch\")\n",
|
|
"\n",
|
|
"batch_job = client.batches.create(\n",
|
|
" input_file_id=uploaded_file.id,\n",
|
|
" endpoint=\"/v1/chat/completions\",\n",
|
|
" completion_window=\"24h\",\n",
|
|
")\n",
|
|
"\n",
|
|
"print(f\"Created batch job with ID: {batch_job.id}\")\n",
|
|
"print(f\"Initial status: {batch_job.status}\")\n",
|
|
"\n",
|
|
"time.sleep(10)\n",
|
|
"\n",
|
|
"try:\n",
|
|
" cancelled_job = client.batches.cancel(batch_id=batch_job.id)\n",
|
|
" print(f\"Cancellation initiated. Status: {cancelled_job.status}\")\n",
|
|
" assert cancelled_job.status == \"cancelling\"\n",
|
|
"\n",
|
|
" # Monitor the cancellation process\n",
|
|
" while cancelled_job.status not in [\"failed\", \"cancelled\"]:\n",
|
|
" time.sleep(3)\n",
|
|
" cancelled_job = client.batches.retrieve(batch_job.id)\n",
|
|
" print(f\"Current status: {cancelled_job.status}\")\n",
|
|
"\n",
|
|
" # Verify final status\n",
|
|
" assert cancelled_job.status == \"cancelled\"\n",
|
|
" print(\"Batch job successfully cancelled\")\n",
|
|
"\n",
|
|
"except Exception as e:\n",
|
|
" print(f\"Error during cancellation: {e}\")\n",
|
|
" raise e\n",
|
|
"\n",
|
|
"finally:\n",
|
|
" try:\n",
|
|
" del_response = client.files.delete(uploaded_file.id)\n",
|
|
" if del_response.deleted:\n",
|
|
" print(\"Successfully cleaned up input file\")\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"Error cleaning up: {e}\")\n",
|
|
" raise e"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 47,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"terminate_process(server_process)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "AlphaMeemory",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.7"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|