24 lines
568 B
Markdown
24 lines
568 B
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
---
|
||
|
|
## Model
|
||
|
|
base_model : yanolja/KoSOLAR-10.7B-v0.2
|
||
|
|
## Dataset
|
||
|
|
* 공개 데이터 수집
|
||
|
|
* Deduplicating Training Data Makes Language Models Better 알고리즘 활용
|
||
|
|
|
||
|
|
## Code
|
||
|
|
```python
|
||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||
|
|
import torch
|
||
|
|
|
||
|
|
model_name = "jingyeom/KoSoLAR-10.7B-v0.2_1.4_dedup"
|
||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
||
|
|
model_name,
|
||
|
|
)
|
||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Benchmark
|
||
|
|
**[Ko-LLM-Leaderboard](https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard)**
|