296 lines
7.0 KiB
Markdown
296 lines
7.0 KiB
Markdown
|
|
---
|
|||
|
|
language:
|
|||
|
|
- ko
|
|||
|
|
- en
|
|||
|
|
license: apache-2.0
|
|||
|
|
base_model: Qwen/Qwen2.5-3B-Instruct
|
|||
|
|
tags:
|
|||
|
|
- qwen
|
|||
|
|
- qwen2.5
|
|||
|
|
- korean
|
|||
|
|
- merged
|
|||
|
|
- gguf
|
|||
|
|
- conversational
|
|||
|
|
library_name: transformers
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
datasets:
|
|||
|
|
- MyeongHo0621/smol-koreantalk
|
|||
|
|
model-index:
|
|||
|
|
- name: Qwen2.5-3B-Korean
|
|||
|
|
results: []
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Qwen2.5-3B-Korean
|
|||
|
|
|
|||
|
|
## Model Description
|
|||
|
|
|
|||
|
|
**Qwen2.5-3B-Korean**은 [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)를 한국어로 파인튜닝한 **Merged 모델**입니다.
|
|||
|
|
|
|||
|
|
이 리포지토리는 **LoRA 어댑터가 이미 병합된 완전한 모델**과 **GGUF 파일**을 제공합니다.
|
|||
|
|
|
|||
|
|
> **PEFT/LoRA 어댑터**가 필요하신 경우: [MyeongHo0621/Qwen2.5-3B-Korean-QLoRA](https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean-QLoRA)
|
|||
|
|
|
|||
|
|
### 🎯 Key Features
|
|||
|
|
|
|||
|
|
- 🇰🇷 **Korean Optimization**: 200,000개 고품질 한국어 대화 데이터로 학습
|
|||
|
|
- 📦 **Ready-to-Use**: LoRA 병합 완료, 즉시 사용 가능
|
|||
|
|
- 🚀 **Multi-Format**: Safetensors (루트) + GGUF (gguf/)
|
|||
|
|
- 💻 **All Frameworks**: Transformers, vLLM, SGLang, Ollama, Llama.cpp
|
|||
|
|
- ⚖️ **Apache 2.0**: 상업적 사용 가능
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📦 Available Formats
|
|||
|
|
|
|||
|
|
| Format | Path | Use Case | Size |
|
|||
|
|
|--------|------|----------|------|
|
|||
|
|
| **Safetensors** | `/` (루트) | Transformers, vLLM, SGLang | ~6GB |
|
|||
|
|
| **GGUF Q4_K_M** | `gguf/qwen25-3b-korean-Q4_K_M.gguf` | Ollama, Llama.cpp (권장) | ~2GB |
|
|||
|
|
| **GGUF Q5_K_M** | `gguf/qwen25-3b-korean-Q5_K_M.gguf` | 고품질 | ~2.5GB |
|
|||
|
|
| **GGUF Q8_0** | `gguf/qwen25-3b-korean-Q8_0.gguf` | 최고 품질 | ~3.5GB |
|
|||
|
|
| **GGUF F16** | `gguf/qwen25-3b-korean-F16.gguf` | 벤치마크 | ~6GB |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 Quick Start
|
|||
|
|
|
|||
|
|
### 1️⃣ Transformers (가장 간단)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
|
|||
|
|
# 모델 로딩 (Merged 모델)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
"MyeongHo0621/Qwen2.5-3B-Korean",
|
|||
|
|
torch_dtype="auto",
|
|||
|
|
device_map="auto"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained("MyeongHo0621/Qwen2.5-3B-Korean")
|
|||
|
|
|
|||
|
|
# 채팅 템플릿 사용
|
|||
|
|
messages = [
|
|||
|
|
{"role": "system", "content": "You are a helpful Korean assistant."},
|
|||
|
|
{"role": "user", "content": "한국의 수도는 어디인가요?"}
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|||
|
|
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
|||
|
|
|
|||
|
|
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
|
|||
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 2️⃣ vLLM (Production Serving)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from vllm import LLM, SamplingParams
|
|||
|
|
|
|||
|
|
# Merged 모델 로딩
|
|||
|
|
llm = LLM(
|
|||
|
|
model="MyeongHo0621/Qwen2.5-3B-Korean",
|
|||
|
|
quantization="bitsandbytes", # 옵션: 4-bit 양자화
|
|||
|
|
gpu_memory_utilization=0.6
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
prompts = ["한국의 수도는 어디인가요?"]
|
|||
|
|
params = SamplingParams(temperature=0.7, max_tokens=512)
|
|||
|
|
|
|||
|
|
outputs = llm.generate(prompts, params)
|
|||
|
|
for output in outputs:
|
|||
|
|
print(output.outputs[0].text)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Server Mode:**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
vllm serve MyeongHo0621/Qwen2.5-3B-Korean \
|
|||
|
|
--quantization bitsandbytes \
|
|||
|
|
--port 8000
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3️⃣ SGLang (Fastest)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
import sglang as sgl
|
|||
|
|
|
|||
|
|
runtime = sgl.Runtime(
|
|||
|
|
model_path="MyeongHo0621/Qwen2.5-3B-Korean",
|
|||
|
|
quantization="bitsandbytes"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
sgl.set_default_backend(runtime)
|
|||
|
|
|
|||
|
|
@sgl.function
|
|||
|
|
def chat(s, prompt):
|
|||
|
|
s += sgl.user(prompt)
|
|||
|
|
s += sgl.assistant(sgl.gen("response", max_tokens=512))
|
|||
|
|
|
|||
|
|
state = chat.run(prompt="한국의 수도는?")
|
|||
|
|
print(state["response"])
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4️⃣ Ollama (Local Desktop)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 1. GGUF 다운로드
|
|||
|
|
huggingface-cli download MyeongHo0621/Qwen2.5-3B-Korean \
|
|||
|
|
gguf/qwen25-3b-korean-Q4_K_M.gguf \
|
|||
|
|
--local-dir ./
|
|||
|
|
|
|||
|
|
# 2. Modelfile 생성
|
|||
|
|
cat > Modelfile << 'EOF'
|
|||
|
|
FROM ./gguf/qwen25-3b-korean-Q4_K_M.gguf
|
|||
|
|
|
|||
|
|
TEMPLATE """<|im_start|>system
|
|||
|
|
You are a helpful Korean assistant.<|im_end|>
|
|||
|
|
<|im_start|>user
|
|||
|
|
{{ .Prompt }}<|im_end|>
|
|||
|
|
<|im_start|>assistant
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
PARAMETER stop "<|im_start|>"
|
|||
|
|
PARAMETER stop "<|im_end|>"
|
|||
|
|
PARAMETER temperature 0.7
|
|||
|
|
EOF
|
|||
|
|
|
|||
|
|
# 3. 모델 생성 & 실행
|
|||
|
|
ollama create qwen25-korean -f Modelfile
|
|||
|
|
ollama run qwen25-korean "한국의 수도는?"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 5️⃣ Llama.cpp (CPU/Edge)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 1. GGUF 다운로드
|
|||
|
|
huggingface-cli download MyeongHo0621/Qwen2.5-3B-Korean \
|
|||
|
|
gguf/qwen25-3b-korean-Q4_K_M.gguf \
|
|||
|
|
--local-dir ./
|
|||
|
|
|
|||
|
|
# 2. 추론 (GPU)
|
|||
|
|
./llama.cpp/main \
|
|||
|
|
-m ./gguf/qwen25-3b-korean-Q4_K_M.gguf \
|
|||
|
|
-p "<|im_start|>user\n한국의 수도는?<|im_end|>\n<|im_start|>assistant\n" \
|
|||
|
|
-n 512 \
|
|||
|
|
--temp 0.7 \
|
|||
|
|
-ngl 99
|
|||
|
|
|
|||
|
|
# 3. 추론 (CPU)
|
|||
|
|
./llama.cpp/main \
|
|||
|
|
-m ./gguf/qwen25-3b-korean-Q4_K_M.gguf \
|
|||
|
|
-p "<|im_start|>user\n한국의 수도는?<|im_end|>\n<|im_start|>assistant\n" \
|
|||
|
|
-n 512 \
|
|||
|
|
-t 8
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔧 Training Details
|
|||
|
|
|
|||
|
|
### Dataset
|
|||
|
|
- **Source**: [MyeongHo0621/smol-koreantalk](https://huggingface.co/datasets/MyeongHo0621/smol-koreantalk)
|
|||
|
|
- **Samples**: 200,000 한국어 대화 쌍
|
|||
|
|
- **Domain**: 일반 대화, 지시 수행, 지식 Q&A
|
|||
|
|
|
|||
|
|
### Training Configuration
|
|||
|
|
| Hyperparameter | Value |
|
|||
|
|
|----------------|-------|
|
|||
|
|
| **Method** | QLoRA (4-bit NF4) |
|
|||
|
|
| **LoRA Rank** | 64 |
|
|||
|
|
| **LoRA Alpha** | 128 |
|
|||
|
|
| **Learning Rate** | 2e-4 |
|
|||
|
|
| **Batch Size** | 128 (effective) |
|
|||
|
|
| **Epochs** | 3 |
|
|||
|
|
| **Steps** | 4689 |
|
|||
|
|
| **Max Length** | 2048 |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 Repository Structure
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
MyeongHo0621/Qwen2.5-3B-Korean/
|
|||
|
|
├── config.json # 모델 설정
|
|||
|
|
├── model.safetensors # Merged 모델 (~6GB)
|
|||
|
|
├── tokenizer.json # 토크나이저
|
|||
|
|
├── tokenizer_config.json
|
|||
|
|
└── gguf/ # GGUF 파일들
|
|||
|
|
├── qwen25-3b-korean-Q4_K_M.gguf (~2GB) ⭐ 권장
|
|||
|
|
├── qwen25-3b-korean-Q5_K_M.gguf (~2.5GB)
|
|||
|
|
├── qwen25-3b-korean-Q8_0.gguf (~3.5GB)
|
|||
|
|
└── qwen25-3b-korean-F16.gguf (~6GB)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔗 Related Repositories
|
|||
|
|
|
|||
|
|
- **PEFT Adapter**: [MyeongHo0621/Qwen2.5-3B-Korean-QLoRA](https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean-QLoRA)
|
|||
|
|
- LoRA 어댑터만 필요한 경우
|
|||
|
|
- 파인튜닝 연구용
|
|||
|
|
- ~479MB (경량)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📝 Citation
|
|||
|
|
|
|||
|
|
```bibtex
|
|||
|
|
@misc{qwen25-korean-2025,
|
|||
|
|
author = {MyeongHo Shin},
|
|||
|
|
title = {Qwen2.5-3B-Korean: Korean-Optimized Conversational Model},
|
|||
|
|
year = {2025},
|
|||
|
|
publisher = {HuggingFace},
|
|||
|
|
howpublished = {\url{https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean}},
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🙏 Acknowledgments
|
|||
|
|
|
|||
|
|
- **Base Model**: [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) by Alibaba Cloud
|
|||
|
|
- **Dataset**: [smol-koreantalk](https://huggingface.co/datasets/MyeongHo0621/smol-koreantalk)
|
|||
|
|
- **Tools**: Unsloth, PEFT, vLLM, SGLang, Llama.cpp
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📞 Contact
|
|||
|
|
|
|||
|
|
- **Author**: MyeongHo Shin
|
|||
|
|
- **HuggingFace**: [@MyeongHo0621](https://huggingface.co/MyeongHo0621)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ⚖️ License
|
|||
|
|
|
|||
|
|
Apache 2.0 - 상업적 사용, 수정, 배포 가능
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Evaluation results
|
|||
|
|
|
|||
|
|
### Benchmark Results
|
|||
|
|
|
|||
|
|
#### General Benchmarks
|
|||
|
|
|
|||
|
|
| Task | Score | Metric |
|
|||
|
|
|------|-------|--------|
|
|||
|
|
| gsm8k | 42.00% | acc |
|
|||
|
|
| mmlu | 58.00% | acc |
|
|||
|
|
| hellaswag | 71.00% | acc_norm |
|
|||
|
|
| winogrande | 65.00% | acc |
|
|||
|
|
| arc_easy | 78.00% | acc |
|
|||
|
|
| arc_challenge | 48.00% | acc_norm |
|
|||
|
|
|
|||
|
|
**Average Score**: 60.33%
|