Files
ModelHub XC 23b99f81f4 初始化项目,由ModelHub XC社区提供模型
Model: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B
Source: Original Platform
2026-05-29 16:35:12 +08:00

61 lines
3.5 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: other
license_name: hyperclovax-seed
license_link: LICENSE
pipeline_tag: text-generation
library_name: transformers
---
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6512d9827fccffe1e9e28fa7/2iEXc-wcr6iezodh6rqI6.png)
## Overview
HyperCLOVAX-SEED-Text-Instruct-1.5B is a model developed by NAVER that can understand and generate text. It demonstrates competitive performance on major benchmarks related to Korean language and culture. In addition, it supports a context length of up to 16k tokens, enabling it to handle a wide range of tasks.
## Basic Information
- Model Architecture: Transformer-based architecture (Dense Model)
- Number of Parameters: 1.5B
- Input/Output Format: Text / Text (both input and output are in text format)
- Context Length: 16k
- Knowledge Cutoff Date: The model was trained on data prior to August 2024.
## Training and Data
The training data for HyperCLOVAX-Seed-Instruct-1.5B consists of diverse sources, including high-quality datasets. The training process was carried out in four main stages: Pretraining Stage 1, where the model learns from a large volume of documents; Pretraining Stage 2, which focuses on additional training with high-quality data; Rejection sampling Fine-Tuning (RFT), aimed at enhancing the models knowledge across various domains and its complex reasoning abilities; and Supervised Fine-Tuning (SFT), which improves the models instruction-following capabilities. Furthermore, due to the characteristics of smaller models, vulnerability to long-context handling was observed. To address this, reinforcement for long-context understanding was incorporated from the pretraining stages through to the SFT stage, enabling the model to stably support context lengths of up to 16k tokens.
## Benchmark
| **Model** | **KMMLU (5-shot, acc)** | **HAE-RAE (5-shot, acc)** | **CLiCK (5-shot, acc)** | **KoBEST (5-shot, acc)** |
| --------------------------------- | --------------------------- | --------------------------- | ------------------------- | -------------------------- |
| **HyperCLOVAX-SEED-Text-Base-1.5B** | 0.4181 | 0.6370 | 0.5373 | 0.6963 |
| **HyperCLOVAX-SEED-Text-Instruct-1.5B** | 0.3933 | 0.5674 | 0.4947 | 0.6490 |
| **Qwen2.5-1.5B-instruct** | 0.3696 | 0.5160 | 0.4772 | 0.5968 |
| **gemma-3-1b-it** | 0.3075 | 0.3648 | 0.3724 | 0.5869 |
## Huggingface Usage Example
```python
model_name = "naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
chat = [
{"role": "tool_list", "content": ""},
{"role": "system", "content": "- AI 언어모델의 이름은 \"CLOVA X\" 이며 네이버에서 만들었다.\n- 오늘은 2025년 04월 24일(목)이다."},
{"role": "user", "content": "슈뢰딩거 방정식과 양자역학의 관계를 최대한 자세히 알려줘."},
]
inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = inputs.to("cuda")
output_ids = model.generate(
**inputs,
max_length=1024,
stop_strings=["<|endofturn|>", "<|stop|>"],
tokenizer=tokenizer
)
print(tokenizer.batch_decode(output_ids)[0])
```