Files
Qwen2.5-3B-Korean/README.md
ModelHub XC 2b23ea1921 初始化项目,由ModelHub XC社区提供模型
Model: MyeongHo0621/Qwen2.5-3B-Korean
Source: Original Platform
2026-06-18 08:50:22 +08:00

296 lines
7.0 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language:
- ko
- en
license: apache-2.0
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
- qwen
- qwen2.5
- korean
- merged
- gguf
- conversational
library_name: transformers
pipeline_tag: text-generation
datasets:
- MyeongHo0621/smol-koreantalk
model-index:
- name: Qwen2.5-3B-Korean
results: []
---
# Qwen2.5-3B-Korean
## Model Description
**Qwen2.5-3B-Korean**은 [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)를 한국어로 파인튜닝한 **Merged 모델**입니다.
이 리포지토리는 **LoRA 어댑터가 이미 병합된 완전한 모델**과 **GGUF 파일**을 제공합니다.
> **PEFT/LoRA 어댑터**가 필요하신 경우: [MyeongHo0621/Qwen2.5-3B-Korean-QLoRA](https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean-QLoRA)
### 🎯 Key Features
- 🇰🇷 **Korean Optimization**: 200,000개 고품질 한국어 대화 데이터로 학습
- 📦 **Ready-to-Use**: LoRA 병합 완료, 즉시 사용 가능
- 🚀 **Multi-Format**: Safetensors (루트) + GGUF (gguf/)
- 💻 **All Frameworks**: Transformers, vLLM, SGLang, Ollama, Llama.cpp
- ⚖️ **Apache 2.0**: 상업적 사용 가능
---
## 📦 Available Formats
| Format | Path | Use Case | Size |
|--------|------|----------|------|
| **Safetensors** | `/` (루트) | Transformers, vLLM, SGLang | ~6GB |
| **GGUF Q4_K_M** | `gguf/qwen25-3b-korean-Q4_K_M.gguf` | Ollama, Llama.cpp (권장) | ~2GB |
| **GGUF Q5_K_M** | `gguf/qwen25-3b-korean-Q5_K_M.gguf` | 고품질 | ~2.5GB |
| **GGUF Q8_0** | `gguf/qwen25-3b-korean-Q8_0.gguf` | 최고 품질 | ~3.5GB |
| **GGUF F16** | `gguf/qwen25-3b-korean-F16.gguf` | 벤치마크 | ~6GB |
---
## 🚀 Quick Start
### 1⃣ Transformers (가장 간단)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# 모델 로딩 (Merged 모델)
model = AutoModelForCausalLM.from_pretrained(
"MyeongHo0621/Qwen2.5-3B-Korean",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("MyeongHo0621/Qwen2.5-3B-Korean")
# 채팅 템플릿 사용
messages = [
{"role": "system", "content": "You are a helpful Korean assistant."},
{"role": "user", "content": "한국의 수도는 어디인가요?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
### 2⃣ vLLM (Production Serving)
```python
from vllm import LLM, SamplingParams
# Merged 모델 로딩
llm = LLM(
model="MyeongHo0621/Qwen2.5-3B-Korean",
quantization="bitsandbytes", # 옵션: 4-bit 양자화
gpu_memory_utilization=0.6
)
prompts = ["한국의 수도는 어디인가요?"]
params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(prompts, params)
for output in outputs:
print(output.outputs[0].text)
```
**Server Mode:**
```bash
vllm serve MyeongHo0621/Qwen2.5-3B-Korean \
--quantization bitsandbytes \
--port 8000
```
---
### 3⃣ SGLang (Fastest)
```python
import sglang as sgl
runtime = sgl.Runtime(
model_path="MyeongHo0621/Qwen2.5-3B-Korean",
quantization="bitsandbytes"
)
sgl.set_default_backend(runtime)
@sgl.function
def chat(s, prompt):
s += sgl.user(prompt)
s += sgl.assistant(sgl.gen("response", max_tokens=512))
state = chat.run(prompt="한국의 수도는?")
print(state["response"])
```
---
### 4⃣ Ollama (Local Desktop)
```bash
# 1. GGUF 다운로드
huggingface-cli download MyeongHo0621/Qwen2.5-3B-Korean \
gguf/qwen25-3b-korean-Q4_K_M.gguf \
--local-dir ./
# 2. Modelfile 생성
cat > Modelfile << 'EOF'
FROM ./gguf/qwen25-3b-korean-Q4_K_M.gguf
TEMPLATE """<|im_start|>system
You are a helpful Korean assistant.<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER temperature 0.7
EOF
# 3. 모델 생성 & 실행
ollama create qwen25-korean -f Modelfile
ollama run qwen25-korean "한국의 수도는?"
```
---
### 5⃣ Llama.cpp (CPU/Edge)
```bash
# 1. GGUF 다운로드
huggingface-cli download MyeongHo0621/Qwen2.5-3B-Korean \
gguf/qwen25-3b-korean-Q4_K_M.gguf \
--local-dir ./
# 2. 추론 (GPU)
./llama.cpp/main \
-m ./gguf/qwen25-3b-korean-Q4_K_M.gguf \
-p "<|im_start|>user\n한국의 수도는?<|im_end|>\n<|im_start|>assistant\n" \
-n 512 \
--temp 0.7 \
-ngl 99
# 3. 추론 (CPU)
./llama.cpp/main \
-m ./gguf/qwen25-3b-korean-Q4_K_M.gguf \
-p "<|im_start|>user\n한국의 수도는?<|im_end|>\n<|im_start|>assistant\n" \
-n 512 \
-t 8
```
---
## 🔧 Training Details
### Dataset
- **Source**: [MyeongHo0621/smol-koreantalk](https://huggingface.co/datasets/MyeongHo0621/smol-koreantalk)
- **Samples**: 200,000 한국어 대화 쌍
- **Domain**: 일반 대화, 지시 수행, 지식 Q&A
### Training Configuration
| Hyperparameter | Value |
|----------------|-------|
| **Method** | QLoRA (4-bit NF4) |
| **LoRA Rank** | 64 |
| **LoRA Alpha** | 128 |
| **Learning Rate** | 2e-4 |
| **Batch Size** | 128 (effective) |
| **Epochs** | 3 |
| **Steps** | 4689 |
| **Max Length** | 2048 |
---
## 📊 Repository Structure
```
MyeongHo0621/Qwen2.5-3B-Korean/
├── config.json # 모델 설정
├── model.safetensors # Merged 모델 (~6GB)
├── tokenizer.json # 토크나이저
├── tokenizer_config.json
└── gguf/ # GGUF 파일들
├── qwen25-3b-korean-Q4_K_M.gguf (~2GB) ⭐ 권장
├── qwen25-3b-korean-Q5_K_M.gguf (~2.5GB)
├── qwen25-3b-korean-Q8_0.gguf (~3.5GB)
└── qwen25-3b-korean-F16.gguf (~6GB)
```
---
## 🔗 Related Repositories
- **PEFT Adapter**: [MyeongHo0621/Qwen2.5-3B-Korean-QLoRA](https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean-QLoRA)
- LoRA 어댑터만 필요한 경우
- 파인튜닝 연구용
- ~479MB (경량)
---
## 📝 Citation
```bibtex
@misc{qwen25-korean-2025,
author = {MyeongHo Shin},
title = {Qwen2.5-3B-Korean: Korean-Optimized Conversational Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean}},
}
```
---
## 🙏 Acknowledgments
- **Base Model**: [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) by Alibaba Cloud
- **Dataset**: [smol-koreantalk](https://huggingface.co/datasets/MyeongHo0621/smol-koreantalk)
- **Tools**: Unsloth, PEFT, vLLM, SGLang, Llama.cpp
---
## 📞 Contact
- **Author**: MyeongHo Shin
- **HuggingFace**: [@MyeongHo0621](https://huggingface.co/MyeongHo0621)
---
## ⚖️ License
Apache 2.0 - 상업적 사용, 수정, 배포 가능
---
## Evaluation results
### Benchmark Results
#### General Benchmarks
| Task | Score | Metric |
|------|-------|--------|
| gsm8k | 42.00% | acc |
| mmlu | 58.00% | acc |
| hellaswag | 71.00% | acc_norm |
| winogrande | 65.00% | acc |
| arc_easy | 78.00% | acc |
| arc_challenge | 48.00% | acc_norm |
**Average Score**: 60.33%