344 lines
10 KiB
Plaintext
344 lines
10 KiB
Plaintext
|
|
{
|
||
|
|
"cells": [
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"# Deploy Llama-VARCO-8B-Instruct Model from AWS Marketplace \n"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"\n",
|
||
|
|
"\n",
|
||
|
|
"Llama-VARCO-8B-Instruct is a generative model built with Llama, specifically designed to excel in Korean through additional training. The model uses continual pre-training with both Korean and English datasets to enhance its understanding and generation capabilites in Korean, while also maintaining its proficiency in English. It performs supervised fine-tuning (SFT) and direct preference optimization (DPO) in Korean to align with human preferences.\n",
|
||
|
|
"\n",
|
||
|
|
"This sample notebook shows you how to deploy [Llama-VARCO-8B-Instruct](https://aws.amazon.com/marketplace/pp/prodview-pynin2e23lb3e) using Amazon SageMaker.\n",
|
||
|
|
"\n",
|
||
|
|
"> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.\n",
|
||
|
|
"\n",
|
||
|
|
"## Pre-requisites:\n",
|
||
|
|
"1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.\n",
|
||
|
|
"1. Ensure that IAM role used has **AmazonSageMakerFullAccess**\n",
|
||
|
|
"1. To deploy this ML model successfully, ensure that:\n",
|
||
|
|
" 1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: \n",
|
||
|
|
" 1. **aws-marketplace:ViewSubscriptions**\n",
|
||
|
|
" 1. **aws-marketplace:Unsubscribe**\n",
|
||
|
|
" 1. **aws-marketplace:Subscribe** \n",
|
||
|
|
"\n",
|
||
|
|
"## Contents:\n",
|
||
|
|
"1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)\n",
|
||
|
|
"2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)\n",
|
||
|
|
"3. [Clean-up](#3.-Clean-up)\n",
|
||
|
|
"\n",
|
||
|
|
" \n",
|
||
|
|
"\n",
|
||
|
|
"## Usage instructions\n",
|
||
|
|
"You can run this notebook one cell at a time (By using Shift+Enter for running a cell)."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"## 1. Subscribe to the model package"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {
|
||
|
|
"tags": []
|
||
|
|
},
|
||
|
|
"source": [
|
||
|
|
"To subscribe to the model package:\n",
|
||
|
|
"1. Open the model package [listing page](https://aws.amazon.com/marketplace/pp/prodview-pynin2e23lb3e)\n",
|
||
|
|
"1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.\n",
|
||
|
|
"1. On the **Subscribe to this software** page, review and click on **\"Accept Offer\"** if you and your organization agrees with EULA, pricing, and support terms. \n",
|
||
|
|
"1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {
|
||
|
|
"tags": []
|
||
|
|
},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"model_package_arn = \"arn:aws:sagemaker:us-west-2:594846645681:model-package/llama-varco-8b-ist-bedrock-37339dbb44f23f488e24f8671eaa0494\""
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {
|
||
|
|
"tags": []
|
||
|
|
},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"import base64\n",
|
||
|
|
"import json\n",
|
||
|
|
"import uuid\n",
|
||
|
|
"from sagemaker import ModelPackage\n",
|
||
|
|
"import sagemaker as sage\n",
|
||
|
|
"from sagemaker import get_execution_role\n",
|
||
|
|
"from sagemaker import ModelPackage\n",
|
||
|
|
"import boto3\n",
|
||
|
|
"from IPython.display import Image\n",
|
||
|
|
"from PIL import Image as ImageEdit\n",
|
||
|
|
"import numpy as np\n",
|
||
|
|
"import io"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {
|
||
|
|
"tags": []
|
||
|
|
},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"role = get_execution_role()\n",
|
||
|
|
"\n",
|
||
|
|
"sagemaker_session = sage.Session()\n",
|
||
|
|
"\n",
|
||
|
|
"bucket = sagemaker_session.default_bucket()\n",
|
||
|
|
"runtime = boto3.client(\"runtime.sagemaker\")"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"## 2. Create an endpoint and perform real-time inference"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html)."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {
|
||
|
|
"tags": []
|
||
|
|
},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"model_name = \"Llama-VARCO-8B-Instruct\"\n",
|
||
|
|
"\n",
|
||
|
|
"content_type = \"application/json\"\n",
|
||
|
|
"\n",
|
||
|
|
"real_time_inference_instance_type = (\n",
|
||
|
|
" \"ml.g5.12xlarge\"\n",
|
||
|
|
")\n",
|
||
|
|
"batch_transform_inference_instance_type = (\n",
|
||
|
|
" \"ml.g4dn.12xlarge\"\n",
|
||
|
|
")"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### A.Create an endpoint"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {
|
||
|
|
"tags": []
|
||
|
|
},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"# create a deployable model from the model package.\n",
|
||
|
|
"model = ModelPackage(\n",
|
||
|
|
" role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session\n",
|
||
|
|
")\n",
|
||
|
|
"\n",
|
||
|
|
"# Deploy the model\n",
|
||
|
|
"predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"Once endpoint has been created, you would be able to perform real-time inference."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {
|
||
|
|
"tags": []
|
||
|
|
},
|
||
|
|
"source": [
|
||
|
|
"### B.Create input payload"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {
|
||
|
|
"tags": []
|
||
|
|
},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"input = {\n",
|
||
|
|
" \"messages\": [\n",
|
||
|
|
" {\n",
|
||
|
|
" \"role\":\"user\",\n",
|
||
|
|
" \"content\":\"안녕 넌 누구야?\"\n",
|
||
|
|
" }\n",
|
||
|
|
" ]\n",
|
||
|
|
"}"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### C. Perform real-time inference"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"##### C-1. Stream Inference Example"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {
|
||
|
|
"tags": []
|
||
|
|
},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"class VarcoInferenceStream():\n",
|
||
|
|
" def __init__(self, sagemaker_runtime, endpoint_name):\n",
|
||
|
|
" self.sagemaker_runtime = sagemaker_runtime\n",
|
||
|
|
" self.endpoint_name = endpoint_name\n",
|
||
|
|
"\n",
|
||
|
|
" def stream_inference(self, request_body):\n",
|
||
|
|
" # Gets a streaming inference response\n",
|
||
|
|
" # from the specified model endpoint:\n",
|
||
|
|
" response = self.sagemaker_runtime\\\n",
|
||
|
|
" .invoke_endpoint_with_response_stream(\n",
|
||
|
|
" EndpointName=self.endpoint_name,\n",
|
||
|
|
" Body=json.dumps(request_body),\n",
|
||
|
|
" ContentType=\"application/json\"\n",
|
||
|
|
" )\n",
|
||
|
|
" # Gets the EventStream object returned by the SDK:\n",
|
||
|
|
" for body in response[\"Body\"]:\n",
|
||
|
|
" raw = body['PayloadPart']['Bytes']\n",
|
||
|
|
" yield raw.decode()\n",
|
||
|
|
"\n",
|
||
|
|
"\n",
|
||
|
|
"sm_runtime = boto3.client(\"sagemaker-runtime\")\n",
|
||
|
|
"varco_inference_stream = VarcoInferenceStream(sm_runtime, model_name)\n",
|
||
|
|
"stream = varco_inference_stream.stream_inference(input)\n",
|
||
|
|
"for part in stream:\n",
|
||
|
|
" print(part, end='')"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {
|
||
|
|
"tags": []
|
||
|
|
},
|
||
|
|
"source": [
|
||
|
|
"## 3. Clean-up"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### A. Delete the endpoint"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"model.sagemaker_session.delete_endpoint(model_name)\n",
|
||
|
|
"model.sagemaker_session.delete_endpoint_config(model_name)"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### B. Delete the model"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"model.delete_model()"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### C. Unsubscribe to the listing (optional)"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. \n",
|
||
|
|
"\n",
|
||
|
|
"**Steps to unsubscribe to product from AWS Marketplace**:\n",
|
||
|
|
"1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)\n",
|
||
|
|
"2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__ to cancel the subscription.\n",
|
||
|
|
"\n"
|
||
|
|
]
|
||
|
|
}
|
||
|
|
],
|
||
|
|
"metadata": {
|
||
|
|
"instance_type": "ml.t3.medium",
|
||
|
|
"kernelspec": {
|
||
|
|
"display_name": "conda_pytorch_p310",
|
||
|
|
"language": "python",
|
||
|
|
"name": "conda_pytorch_p310"
|
||
|
|
},
|
||
|
|
"language_info": {
|
||
|
|
"codemirror_mode": {
|
||
|
|
"name": "ipython",
|
||
|
|
"version": 3
|
||
|
|
},
|
||
|
|
"file_extension": ".py",
|
||
|
|
"mimetype": "text/x-python",
|
||
|
|
"name": "python",
|
||
|
|
"nbconvert_exporter": "python",
|
||
|
|
"pygments_lexer": "ipython3",
|
||
|
|
"version": "3.10.14"
|
||
|
|
}
|
||
|
|
},
|
||
|
|
"nbformat": 4,
|
||
|
|
"nbformat_minor": 4
|
||
|
|
}
|