初始化项目，由ModelHub XC社区提供模型

Model: MyeongHo0621/Qwen2.5-3B-Korean Source: Original Platform
2026-06-18 08:50:22 +08:00
commit 2b23ea1921
21 changed files with 152603 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,295 @@
+---
+language:
+- ko
+- en
+license: apache-2.0
+base_model: Qwen/Qwen2.5-3B-Instruct
+tags:
+- qwen
+- qwen2.5
+- korean
+- merged
+- gguf
+- conversational
+library_name: transformers
+pipeline_tag: text-generation
+datasets:
+- MyeongHo0621/smol-koreantalk
+model-index:
+- name: Qwen2.5-3B-Korean
+  results: []
+---
+
+# Qwen2.5-3B-Korean
+
+## Model Description
+
+**Qwen2.5-3B-Korean**은 [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)를 한국어로 파인튜닝한 **Merged 모델**입니다.
+
+이 리포지토리는 **LoRA 어댑터가 이미 병합된 완전한 모델**과 **GGUF 파일**을 제공합니다.
+
+> **PEFT/LoRA 어댑터**가 필요하신 경우: [MyeongHo0621/Qwen2.5-3B-Korean-QLoRA](https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean-QLoRA)
+
+### 🎯 Key Features
+
+- 🇰🇷 **Korean Optimization**: 200,000개 고품질 한국어 대화 데이터로 학습
+- 📦 **Ready-to-Use**: LoRA 병합 완료, 즉시 사용 가능
+- 🚀 **Multi-Format**: Safetensors (루트) + GGUF (gguf/)
+- 💻 **All Frameworks**: Transformers, vLLM, SGLang, Ollama, Llama.cpp
+- ⚖️ **Apache 2.0**: 상업적 사용 가능
+
+---
+
+## 📦 Available Formats
+
+| Format | Path | Use Case | Size |
+|--------|------|----------|------|
+| **Safetensors** | `/` (루트) | Transformers, vLLM, SGLang | ~6GB |
+| **GGUF Q4_K_M** | `gguf/qwen25-3b-korean-Q4_K_M.gguf` | Ollama, Llama.cpp (권장) | ~2GB |
+| **GGUF Q5_K_M** | `gguf/qwen25-3b-korean-Q5_K_M.gguf` | 고품질 | ~2.5GB |
+| **GGUF Q8_0** | `gguf/qwen25-3b-korean-Q8_0.gguf` | 최고 품질 | ~3.5GB |
+| **GGUF F16** | `gguf/qwen25-3b-korean-F16.gguf` | 벤치마크 | ~6GB |
+
+---
+
+## 🚀 Quick Start
+
+### 1️⃣ Transformers (가장 간단)
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+# 모델 로딩 (Merged 모델)
+model = AutoModelForCausalLM.from_pretrained(
+    "MyeongHo0621/Qwen2.5-3B-Korean",
+    torch_dtype="auto",
+    device_map="auto"
+)
+
+tokenizer = AutoTokenizer.from_pretrained("MyeongHo0621/Qwen2.5-3B-Korean")
+
+# 채팅 템플릿 사용
+messages = [
+    {"role": "system", "content": "You are a helpful Korean assistant."},
+    {"role": "user", "content": "한국의 수도는 어디인가요?"}
+]
+
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+
+outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+---
+
+### 2️⃣ vLLM (Production Serving)
+
+```python
+from vllm import LLM, SamplingParams
+
+# Merged 모델 로딩
+llm = LLM(
+    model="MyeongHo0621/Qwen2.5-3B-Korean",
+    quantization="bitsandbytes",  # 옵션: 4-bit 양자화
+    gpu_memory_utilization=0.6
+)
+
+prompts = ["한국의 수도는 어디인가요?"]
+params = SamplingParams(temperature=0.7, max_tokens=512)
+
+outputs = llm.generate(prompts, params)
+for output in outputs:
+    print(output.outputs[0].text)
+```
+
+**Server Mode:**
+
+```bash
+vllm serve MyeongHo0621/Qwen2.5-3B-Korean \
+    --quantization bitsandbytes \
+    --port 8000
+```
+
+---
+
+### 3️⃣ SGLang (Fastest)
+
+```python
+import sglang as sgl
+
+runtime = sgl.Runtime(
+    model_path="MyeongHo0621/Qwen2.5-3B-Korean",
+    quantization="bitsandbytes"
+)
+
+sgl.set_default_backend(runtime)
+
+@sgl.function
+def chat(s, prompt):
+    s += sgl.user(prompt)
+    s += sgl.assistant(sgl.gen("response", max_tokens=512))
+
+state = chat.run(prompt="한국의 수도는?")
+print(state["response"])
+```
+
+---
+
+### 4️⃣ Ollama (Local Desktop)
+
+```bash
+# 1. GGUF 다운로드
+huggingface-cli download MyeongHo0621/Qwen2.5-3B-Korean \
+    gguf/qwen25-3b-korean-Q4_K_M.gguf \
+    --local-dir ./
+
+# 2. Modelfile 생성
+cat > Modelfile << 'EOF'
+FROM ./gguf/qwen25-3b-korean-Q4_K_M.gguf
+
+TEMPLATE """<|im_start|>system
+You are a helpful Korean assistant.<|im_end|>
+<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+"""
+
+PARAMETER stop "<|im_start|>"
+PARAMETER stop "<|im_end|>"
+PARAMETER temperature 0.7
+EOF
+
+# 3. 모델 생성 & 실행
+ollama create qwen25-korean -f Modelfile
+ollama run qwen25-korean "한국의 수도는?"
+```
+
+---
+
+### 5️⃣ Llama.cpp (CPU/Edge)
+
+```bash
+# 1. GGUF 다운로드
+huggingface-cli download MyeongHo0621/Qwen2.5-3B-Korean \
+    gguf/qwen25-3b-korean-Q4_K_M.gguf \
+    --local-dir ./
+
+# 2. 추론 (GPU)
+./llama.cpp/main \
+    -m ./gguf/qwen25-3b-korean-Q4_K_M.gguf \
+    -p "<|im_start|>user\n한국의 수도는?<|im_end|>\n<|im_start|>assistant\n" \
+    -n 512 \
+    --temp 0.7 \
+    -ngl 99
+
+# 3. 추론 (CPU)
+./llama.cpp/main \
+    -m ./gguf/qwen25-3b-korean-Q4_K_M.gguf \
+    -p "<|im_start|>user\n한국의 수도는?<|im_end|>\n<|im_start|>assistant\n" \
+    -n 512 \
+    -t 8
+```
+
+---
+
+## 🔧 Training Details
+
+### Dataset
+- **Source**: [MyeongHo0621/smol-koreantalk](https://huggingface.co/datasets/MyeongHo0621/smol-koreantalk)
+- **Samples**: 200,000 한국어 대화 쌍
+- **Domain**: 일반 대화, 지시 수행, 지식 Q&A
+
+### Training Configuration
+| Hyperparameter | Value |
+|----------------|-------|
+| **Method** | QLoRA (4-bit NF4) |
+| **LoRA Rank** | 64 |
+| **LoRA Alpha** | 128 |
+| **Learning Rate** | 2e-4 |
+| **Batch Size** | 128 (effective) |
+| **Epochs** | 3 |
+| **Steps** | 4689 |
+| **Max Length** | 2048 |
+
+---
+
+## 📊 Repository Structure
+
+```
+MyeongHo0621/Qwen2.5-3B-Korean/
+├── config.json                 # 모델 설정
+├── model.safetensors          # Merged 모델 (~6GB)
+├── tokenizer.json             # 토크나이저
+├── tokenizer_config.json
+└── gguf/                      # GGUF 파일들
+    ├── qwen25-3b-korean-Q4_K_M.gguf  (~2GB) ⭐ 권장
+    ├── qwen25-3b-korean-Q5_K_M.gguf  (~2.5GB)
+    ├── qwen25-3b-korean-Q8_0.gguf    (~3.5GB)
+    └── qwen25-3b-korean-F16.gguf     (~6GB)
+```
+
+---
+
+## 🔗 Related Repositories
+
+- **PEFT Adapter**: [MyeongHo0621/Qwen2.5-3B-Korean-QLoRA](https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean-QLoRA)
+  - LoRA 어댑터만 필요한 경우
+  - 파인튜닝 연구용
+  - ~479MB (경량)
+
+---
+
+## 📝 Citation
+
+```bibtex
+@misc{qwen25-korean-2025,
+  author = {MyeongHo Shin},
+  title = {Qwen2.5-3B-Korean: Korean-Optimized Conversational Model},
+  year = {2025},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean}},
+}
+```
+
+---
+
+## 🙏 Acknowledgments
+
+- **Base Model**: [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) by Alibaba Cloud
+- **Dataset**: [smol-koreantalk](https://huggingface.co/datasets/MyeongHo0621/smol-koreantalk)
+- **Tools**: Unsloth, PEFT, vLLM, SGLang, Llama.cpp
+
+---
+
+## 📞 Contact
+
+- **Author**: MyeongHo Shin
+- **HuggingFace**: [@MyeongHo0621](https://huggingface.co/MyeongHo0621)
+
+---
+
+## ⚖️ License
+
+Apache 2.0 - 상업적 사용, 수정, 배포 가능
+
+
+
+---
+
+##  Evaluation results
+
+### Benchmark Results
+
+#### General Benchmarks
+
+| Task | Score | Metric |
+|------|-------|--------|
+| gsm8k | 42.00% | acc |
+| mmlu | 58.00% | acc |
+| hellaswag | 71.00% | acc_norm |
+| winogrande | 65.00% | acc |
+| arc_easy | 78.00% | acc |
+| arc_challenge | 48.00% | acc_norm |
+
+**Average Score**: 60.33%