Llama-VARCO-8B-Instruct/Llama-3-1-Varco-8B.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deploy Llama-VARCO-8B-Instruct Model from AWS Marketplace \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "Llama-VARCO-8B-Instruct is a generative model built with Llama, specifically designed to excel in Korean through additional training. The model uses continual pre-training with both Korean and English datasets to enhance its understanding and generation capabilites in Korean, while also maintaining its proficiency in English. It performs supervised fine-tuning (SFT) and direct preference optimization (DPO) in Korean to align with human preferences.\n",
    "\n",
    "This sample notebook shows you how to deploy [Llama-VARCO-8B-Instruct](https://aws.amazon.com/marketplace/pp/prodview-pynin2e23lb3e) using Amazon SageMaker.\n",
    "\n",
    "> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.\n",
    "\n",
    "## Pre-requisites:\n",
    "1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.\n",
    "1. Ensure that IAM role used has **AmazonSageMakerFullAccess**\n",
    "1. To deploy this ML model successfully, ensure that:\n",
    "    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: \n",
    "        1. **aws-marketplace:ViewSubscriptions**\n",
    "        1. **aws-marketplace:Unsubscribe**\n",
    "        1. **aws-marketplace:Subscribe**  \n",
    "\n",
    "## Contents:\n",
    "1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)\n",
    "2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)\n",
    "3. [Clean-up](#3.-Clean-up)\n",
    "\n",
    "    \n",
    "\n",
    "## Usage instructions\n",
    "You can run this notebook one cell at a time (By using Shift+Enter for running a cell)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Subscribe to the model package"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "To subscribe to the model package:\n",
    "1. Open the model package [listing page](https://aws.amazon.com/marketplace/pp/prodview-pynin2e23lb3e)\n",
    "1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.\n",
    "1. On the **Subscribe to this software** page, review and click on **\"Accept Offer\"** if you and your organization agrees with EULA, pricing, and support terms. \n",
    "1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "model_package_arn = \"arn:aws:sagemaker:us-west-2:594846645681:model-package/llama-varco-8b-ist-bedrock-37339dbb44f23f488e24f8671eaa0494\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import base64\n",
    "import json\n",
    "import uuid\n",
    "from sagemaker import ModelPackage\n",
    "import sagemaker as sage\n",
    "from sagemaker import get_execution_role\n",
    "from sagemaker import ModelPackage\n",
    "import boto3\n",
    "from IPython.display import Image\n",
    "from PIL import Image as ImageEdit\n",
    "import numpy as np\n",
    "import io"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "role = get_execution_role()\n",
    "\n",
    "sagemaker_session = sage.Session()\n",
    "\n",
    "bucket = sagemaker_session.default_bucket()\n",
    "runtime = boto3.client(\"runtime.sagemaker\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Create an endpoint and perform real-time inference"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "model_name = \"Llama-VARCO-8B-Instruct\"\n",
    "\n",
    "content_type = \"application/json\"\n",
    "\n",
    "real_time_inference_instance_type = (\n",
    "    \"ml.g5.12xlarge\"\n",
    ")\n",
    "batch_transform_inference_instance_type = (\n",
    "    \"ml.g4dn.12xlarge\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A.Create an endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# create a deployable model from the model package.\n",
    "model = ModelPackage(\n",
    "    role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session\n",
    ")\n",
    "\n",
    "# Deploy the model\n",
    "predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once endpoint has been created, you would be able to perform real-time inference."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "### B.Create input payload"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "input = {\n",
    "    \"messages\": [\n",
    "        {\n",
    "            \"role\":\"user\",\n",
    "            \"content\":\"안녕 넌 누구야?\"\n",
    "        }\n",
    "    ]\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### C. Perform real-time inference"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### C-1. Stream Inference Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "class VarcoInferenceStream():\n",
    "    def __init__(self, sagemaker_runtime, endpoint_name):\n",
    "        self.sagemaker_runtime = sagemaker_runtime\n",
    "        self.endpoint_name = endpoint_name\n",
    "\n",
    "    def stream_inference(self, request_body):\n",
    "        # Gets a streaming inference response\n",
    "        # from the specified model endpoint:\n",
    "        response = self.sagemaker_runtime\\\n",
    "            .invoke_endpoint_with_response_stream(\n",
    "                EndpointName=self.endpoint_name,\n",
    "                Body=json.dumps(request_body),\n",
    "                ContentType=\"application/json\"\n",
    "        )\n",
    "        # Gets the EventStream object returned by the SDK:\n",
    "        for body in response[\"Body\"]:\n",
    "            raw = body['PayloadPart']['Bytes']\n",
    "            yield raw.decode()\n",
    "\n",
    "\n",
    "sm_runtime = boto3.client(\"sagemaker-runtime\")\n",
    "varco_inference_stream = VarcoInferenceStream(sm_runtime, model_name)\n",
    "stream = varco_inference_stream.stream_inference(input)\n",
    "for part in stream:\n",
    "    print(part, end='')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 3. Clean-up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A. Delete the endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.sagemaker_session.delete_endpoint(model_name)\n",
    "model.sagemaker_session.delete_endpoint_config(model_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### B. Delete the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.delete_model()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### C. Unsubscribe to the listing (optional)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. \n",
    "\n",
    "**Steps to unsubscribe to product from AWS Marketplace**:\n",
    "1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)\n",
    "2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "conda_pytorch_p310",
   "language": "python",
   "name": "conda_pytorch_p310"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
初始化项目，由ModelHub XC社区提供模型 Model: NCSOFT/Llama-VARCO-8B-Instruct Source: Original Platform 2026-05-27 04:22:12 +08:00			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"# Deploy Llama-VARCO-8B-Instruct Model from AWS Marketplace \n"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"\n",`
			`"\n",`
			`"Llama-VARCO-8B-Instruct is a generative model built with Llama, specifically designed to excel in Korean through additional training. The model uses continual pre-training with both Korean and English datasets to enhance its understanding and generation capabilites in Korean, while also maintaining its proficiency in English. It performs supervised fine-tuning (SFT) and direct preference optimization (DPO) in Korean to align with human preferences.\n",`
			`"\n",`
			`"This sample notebook shows you how to deploy [Llama-VARCO-8B-Instruct](https://aws.amazon.com/marketplace/pp/prodview-pynin2e23lb3e) using Amazon SageMaker.\n",`
			`"\n",`
			`"> Note: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.\n",`
			`"\n",`
			`"## Pre-requisites:\n",`
			`"1. Note: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.\n",`
			`"1. Ensure that IAM role used has AmazonSageMakerFullAccess\n",`
			`"1. To deploy this ML model successfully, ensure that:\n",`
			`" 1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: \n",`
			`" 1. aws-marketplace:ViewSubscriptions\n",`
			`" 1. aws-marketplace:Unsubscribe\n",`
			`" 1. aws-marketplace:Subscribe \n",`
			`"\n",`
			`"## Contents:\n",`
			`"1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)\n",`
			`"2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)\n",`
			`"3. [Clean-up](#3.-Clean-up)\n",`
			`"\n",`
			`" \n",`
			`"\n",`
			`"## Usage instructions\n",`
			`"You can run this notebook one cell at a time (By using Shift+Enter for running a cell)."`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## 1. Subscribe to the model package"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"tags": []`
			`},`
			`"source": [`
			`"To subscribe to the model package:\n",`
			`"1. Open the model package [listing page](https://aws.amazon.com/marketplace/pp/prodview-pynin2e23lb3e)\n",`
			`"1. On the AWS Marketplace listing, click on the Continue to subscribe button.\n",`
			`"1. On the Subscribe to this software page, review and click on \"Accept Offer\" if you and your organization agrees with EULA, pricing, and support terms. \n",`
			`"1. Once you click on Continue to configuration button and then choose a region, you will see a Product Arn displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {`
			`"tags": []`
			`},`
			`"outputs": [],`
			`"source": [`
			`"model_package_arn = \"arn:aws:sagemaker:us-west-2:594846645681:model-package/llama-varco-8b-ist-bedrock-37339dbb44f23f488e24f8671eaa0494\""`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {`
			`"tags": []`
			`},`
			`"outputs": [],`
			`"source": [`
			`"import base64\n",`
			`"import json\n",`
			`"import uuid\n",`
			`"from sagemaker import ModelPackage\n",`
			`"import sagemaker as sage\n",`
			`"from sagemaker import get_execution_role\n",`
			`"from sagemaker import ModelPackage\n",`
			`"import boto3\n",`
			`"from IPython.display import Image\n",`
			`"from PIL import Image as ImageEdit\n",`
			`"import numpy as np\n",`
			`"import io"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {`
			`"tags": []`
			`},`
			`"outputs": [],`
			`"source": [`
			`"role = get_execution_role()\n",`
			`"\n",`
			`"sagemaker_session = sage.Session()\n",`
			`"\n",`
			`"bucket = sagemaker_session.default_bucket()\n",`
			`"runtime = boto3.client(\"runtime.sagemaker\")"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## 2. Create an endpoint and perform real-time inference"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html)."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {`
			`"tags": []`
			`},`
			`"outputs": [],`
			`"source": [`
			`"model_name = \"Llama-VARCO-8B-Instruct\"\n",`
			`"\n",`
			`"content_type = \"application/json\"\n",`
			`"\n",`
			`"real_time_inference_instance_type = (\n",`
			`" \"ml.g5.12xlarge\"\n",`
			`")\n",`
			`"batch_transform_inference_instance_type = (\n",`
			`" \"ml.g4dn.12xlarge\"\n",`
			`")"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### A.Create an endpoint"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {`
			`"tags": []`
			`},`
			`"outputs": [],`
			`"source": [`
			`"# create a deployable model from the model package.\n",`
			`"model = ModelPackage(\n",`
			`" role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session\n",`
			`")\n",`
			`"\n",`
			`"# Deploy the model\n",`
			`"predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Once endpoint has been created, you would be able to perform real-time inference."`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"tags": []`
			`},`
			`"source": [`
			`"### B.Create input payload"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {`
			`"tags": []`
			`},`
			`"outputs": [],`
			`"source": [`
			`"input = {\n",`
			`" \"messages\": [\n",`
			`" {\n",`
			`" \"role\":\"user\",\n",`
			`" \"content\":\"안녕 넌 누구야?\"\n",`
			`" }\n",`
			`" ]\n",`
			`"}"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### C. Perform real-time inference"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"##### C-1. Stream Inference Example"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {`
			`"tags": []`
			`},`
			`"outputs": [],`
			`"source": [`
			`"class VarcoInferenceStream():\n",`
			`" def __init__(self, sagemaker_runtime, endpoint_name):\n",`
			`" self.sagemaker_runtime = sagemaker_runtime\n",`
			`" self.endpoint_name = endpoint_name\n",`
			`"\n",`
			`" def stream_inference(self, request_body):\n",`
			`" # Gets a streaming inference response\n",`
			`" # from the specified model endpoint:\n",`
			`" response = self.sagemaker_runtime\\\n",`
			`" .invoke_endpoint_with_response_stream(\n",`
			`" EndpointName=self.endpoint_name,\n",`
			`" Body=json.dumps(request_body),\n",`
			`" ContentType=\"application/json\"\n",`
			`" )\n",`
			`" # Gets the EventStream object returned by the SDK:\n",`
			`" for body in response[\"Body\"]:\n",`
			`" raw = body['PayloadPart']['Bytes']\n",`
			`" yield raw.decode()\n",`
			`"\n",`
			`"\n",`
			`"sm_runtime = boto3.client(\"sagemaker-runtime\")\n",`
			`"varco_inference_stream = VarcoInferenceStream(sm_runtime, model_name)\n",`
			`"stream = varco_inference_stream.stream_inference(input)\n",`
			`"for part in stream:\n",`
			`" print(part, end='')"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"tags": []`
			`},`
			`"source": [`
			`"## 3. Clean-up"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged."`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### A. Delete the endpoint"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"model.sagemaker_session.delete_endpoint(model_name)\n",`
			`"model.sagemaker_session.delete_endpoint_config(model_name)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### B. Delete the model"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"model.delete_model()"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### C. Unsubscribe to the listing (optional)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. \n",`
			`"\n",`
			`"Steps to unsubscribe to product from AWS Marketplace:\n",`
			`"1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)\n",`
			`"2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__ to cancel the subscription.\n",`
			`"\n"`
			`]`
			`}`
			`],`
			`"metadata": {`
			`"instance_type": "ml.t3.medium",`
			`"kernelspec": {`
			`"display_name": "conda_pytorch_p310",`
			`"language": "python",`
			`"name": "conda_pytorch_p310"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.10.14"`
			`}`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 4`
			`}`