61 lines
3.5 KiB
Markdown
61 lines
3.5 KiB
Markdown
---
|
||
license: other
|
||
license_name: hyperclovax-seed
|
||
license_link: LICENSE
|
||
pipeline_tag: text-generation
|
||
library_name: transformers
|
||
---
|
||
|
||

|
||
|
||
## Overview
|
||
|
||
HyperCLOVAX-SEED-Text-Instruct-1.5B is a model developed by NAVER that can understand and generate text. It demonstrates competitive performance on major benchmarks related to Korean language and culture. In addition, it supports a context length of up to 16k tokens, enabling it to handle a wide range of tasks.
|
||
|
||
## Basic Information
|
||
|
||
- Model Architecture: Transformer-based architecture (Dense Model)
|
||
- Number of Parameters: 1.5B
|
||
- Input/Output Format: Text / Text (both input and output are in text format)
|
||
- Context Length: 16k
|
||
- Knowledge Cutoff Date: The model was trained on data prior to August 2024.
|
||
|
||
|
||
## Training and Data
|
||
|
||
The training data for HyperCLOVAX-Seed-Instruct-1.5B consists of diverse sources, including high-quality datasets. The training process was carried out in four main stages: Pretraining Stage 1, where the model learns from a large volume of documents; Pretraining Stage 2, which focuses on additional training with high-quality data; Rejection sampling Fine-Tuning (RFT), aimed at enhancing the model’s knowledge across various domains and its complex reasoning abilities; and Supervised Fine-Tuning (SFT), which improves the model’s instruction-following capabilities. Furthermore, due to the characteristics of smaller models, vulnerability to long-context handling was observed. To address this, reinforcement for long-context understanding was incorporated from the pretraining stages through to the SFT stage, enabling the model to stably support context lengths of up to 16k tokens.
|
||
|
||
## Benchmark
|
||
|
||
| **Model** | **KMMLU (5-shot, acc)** | **HAE-RAE (5-shot, acc)** | **CLiCK (5-shot, acc)** | **KoBEST (5-shot, acc)** |
|
||
| --------------------------------- | --------------------------- | --------------------------- | ------------------------- | -------------------------- |
|
||
| **HyperCLOVAX-SEED-Text-Base-1.5B** | 0.4181 | 0.6370 | 0.5373 | 0.6963 |
|
||
| **HyperCLOVAX-SEED-Text-Instruct-1.5B** | 0.3933 | 0.5674 | 0.4947 | 0.6490 |
|
||
| **Qwen2.5-1.5B-instruct** | 0.3696 | 0.5160 | 0.4772 | 0.5968 |
|
||
| **gemma-3-1b-it** | 0.3075 | 0.3648 | 0.3724 | 0.5869 |
|
||
|
||
|
||
## Huggingface Usage Example
|
||
|
||
```python
|
||
model_name = "naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B"
|
||
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
|
||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||
|
||
chat = [
|
||
{"role": "tool_list", "content": ""},
|
||
{"role": "system", "content": "- AI 언어모델의 이름은 \"CLOVA X\" 이며 네이버에서 만들었다.\n- 오늘은 2025년 04월 24일(목)이다."},
|
||
{"role": "user", "content": "슈뢰딩거 방정식과 양자역학의 관계를 최대한 자세히 알려줘."},
|
||
]
|
||
|
||
inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_dict=True, return_tensors="pt")
|
||
inputs = inputs.to("cuda")
|
||
output_ids = model.generate(
|
||
**inputs,
|
||
max_length=1024,
|
||
stop_strings=["<|endofturn|>", "<|stop|>"],
|
||
tokenizer=tokenizer
|
||
)
|
||
print(tokenizer.batch_decode(output_ids)[0])
|
||
```
|