TinyWand-DPO/README.md

---
license: apache-2.0
---

# **TinyWand-DPO**
<p align="left">
    <img src="./TinyWand.png" width="150"/>
<p>

# **한국어 모델 설명**

**1.63B, 하찮은 크기의 SLM은 어떨까요?**

## **모델 소개**
**TinyWand-DPO**는 1.63B의 SLM 모델입니다. 이 모델은 1.63B라는 작은 크기를 가짐으로써 소형기기에서 구동되거나 큰 toks/s를 가질 수 있음과 동시에 강력한 성능을 보여줍니다.

## **모델 라이센스**
OPEN

## **모델 성능**
TBD

## **학습 과정**
TBD

## **사용 안내**

**추론에 필요한 VRAM**
| 양자화 | 입력 토큰 수 | 출력 토큰 수 | 메모리 사용량 |
|---|---|---|---|
| bf16(base) | 64 | 256 | 3,888 MiB |
| q4_K_M | 64 | 256 | 1,788 MiB |

**프롬프트 템플릿**

본 모델은 Alpaca 프롬프트 템플릿을 사용합니다. 

해당 템플릿은 `apply_chat_template()`를 통해 [허깅페이스 템플릿](https://huggingface.co/docs/transformers/main/chat_templating)에서 확인 하실 수 있습니다.

**아래 파이썬 코드를 사용하여 모델을 로드 및 사용 할 수 있습니다.**
*transformers, torch가 사전 설치되어야함*

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # nvidia 그래픽카드 기준

tokenizer = AutoTokenizer.from_pretrained("maywell/TinyWand-DPO")
model = AutoModelForCausalLM.from_pretrained(
    "maywell/TinyWand-DPO",
    device_map="auto",
    torch_dtype=torch.bfloat16, # 사용하는 장비가 bfloat16을 지원하지 않는 경우 torch.float16으로 바꿔주세요.
)

messages = [
    {"role": "system", "content": "Below is an instruction that describes a task. Write a response that appropriately completes the request."}, # 비울 경우에도 동일하게 적용 됨.
    {"role": "user", "content": "언어모델의 파라미터 수가 작으면 어떤 이점이 있어?"},
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
```
初始化项目，由ModelHub XC社区提供模型 Model: maywell/TinyWand-DPO Source: Original Platform 2026-06-22 17:09:12 +08:00			`---`
			`license: apache-2.0`
			`---`

			`# TinyWand-DPO`
			`<p align="left">`
			`<img src="./TinyWand.png" width="150"/>`
			`<p>`

			`# 한국어 모델 설명`

			`1.63B, 하찮은 크기의 SLM은 어떨까요?`

			`## 모델 소개`
			`TinyWand-DPO는 1.63B의 SLM 모델입니다. 이 모델은 1.63B라는 작은 크기를 가짐으로써 소형기기에서 구동되거나 큰 toks/s를 가질 수 있음과 동시에 강력한 성능을 보여줍니다.`

			`## 모델 라이센스`
			`OPEN`

			`## 모델 성능`
			`TBD`

			`## 학습 과정`
			`TBD`

			`## 사용 안내`

			`추론에 필요한 VRAM`
			`\| 양자화 \| 입력 토큰 수 \| 출력 토큰 수 \| 메모리 사용량 \|`
			`\|---\|---\|---\|---\|`
			`\| bf16(base) \| 64 \| 256 \| 3,888 MiB \|`
			`\| q4_K_M \| 64 \| 256 \| 1,788 MiB \|`

			`프롬프트 템플릿`

			`본 모델은 Alpaca 프롬프트 템플릿을 사용합니다.`

			해당 템플릿은 `apply_chat_template()`를 통해 [허깅페이스 템플릿](https://huggingface.co/docs/transformers/main/chat_templating)에서 확인 하실 수 있습니다.

			`아래 파이썬 코드를 사용하여 모델을 로드 및 사용 할 수 있습니다.`
			`transformers, torch가 사전 설치되어야함`

			```python
			`from transformers import AutoModelForCausalLM, AutoTokenizer`

			`device = "cuda" # nvidia 그래픽카드 기준`

			`tokenizer = AutoTokenizer.from_pretrained("maywell/TinyWand-DPO")`
			`model = AutoModelForCausalLM.from_pretrained(`
			`"maywell/TinyWand-DPO",`
			`device_map="auto",`
			`torch_dtype=torch.bfloat16, # 사용하는 장비가 bfloat16을 지원하지 않는 경우 torch.float16으로 바꿔주세요.`
			`)`

			`messages = [`
			`{"role": "system", "content": "Below is an instruction that describes a task. Write a response that appropriately completes the request."}, # 비울 경우에도 동일하게 적용 됨.`
			`{"role": "user", "content": "언어모델의 파라미터 수가 작으면 어떤 이점이 있어?"},`
			`]`

			`encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")`

			`model_inputs = encodeds.to(device)`
			`model.to(device)`

			`generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)`
			`decoded = tokenizer.batch_decode(generated_ids)`
			`print(decoded[0])`
			```