Files
Gumini-1.5B-Base/README.md
ModelHub XC 14fcf4e431 初始化项目,由ModelHub XC社区提供模型
Model: GuminiResearch/Gumini-1.5B-Base
Source: Original Platform
2026-05-23 02:24:17 +08:00

264 lines
7.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
---
license: other
license_name: qwen-research
license_link: https://huggingface.co/Qwen/Qwen2.5-3B/blob/main/LICENSE
library_name: transformers
language:
- ko
- en
tags:
- text-generation
- korean
- bilingual
- qwen2
- built-with-qwen
- inheritune
- continued-pretraining
base_model: Qwen/Qwen2.5-3B
datasets:
- HuggingFaceFW/fineweb-edu
- uonlp/CulturaX
- wikimedia/wikipedia
pipeline_tag: text-generation
---
# 🐻 Gumini-1.5B (구미니)
<p align="center">
<img src="https://img.shields.io/badge/Parameters-1.54B-blue" style="display:inline-block; margin-right:6px;" />
<img src="https://img.shields.io/badge/Layers-16-green" style="display:inline-block; margin-right:6px;" />
<img src="https://img.shields.io/badge/Tokens-3.14B-red" style="display:inline-block; margin-right:6px;" />
<img src="https://img.shields.io/badge/Languages-Korean%20%7C%20English-orange" style="display:inline-block; margin-right:6px;" />
<img src="https://img.shields.io/badge/Built%20with-Qwen-purple" style="display:inline-block; margin-right:6px;" />
</p>
<p align="center">
<a href="https://linkedin.com/in/devgumin" target="_blank">
<img src="https://img.shields.io/badge/LinkedIn-Gumin%20Kwon-0A66C2?logo=linkedin&logoColor=white" style="display:inline-block; margin-right:6px;" />
</a>
<a href="https://x.com/Gumini_Research" target="_blank">
<img src="https://img.shields.io/badge/X-@Gumini__Research-black?logo=x&logoColor=white" style="display:inline-block; margin-right:6px;" />
</a>
<a href="https://www.instagram.com/gumini_research/" target="_blank">
<img src="https://img.shields.io/badge/Instagram-gumini__research-E4405F?logo=instagram&logoColor=white" style="display:inline-block;" />
</a>
</p>
<p align="center"><b>Built with Qwen</b></p>
> **5,700× less data, better performance.**
> Gumini-1.5B achieves Korean PPL 8.49 with only 3.14B tokens, outperforming Qwen-1.5B (18T tokens, PPL 8.84).
## 🔥 Key Results
| Model | Params | Training Tokens | Korean PPL ↓ | Rank |
|-------|--------|-----------------|--------------|------|
| Qwen-2.5-7B | 7.62B | 18T | 6.39 | #1 |
| Gemma-2B | 2.0B | 2T | 8.15 | #2 |
| **Gumini-1.5B (Ours)** | **1.54B** | **3.14B** | **8.49** | **#3** |
| Qwen-2.5-1.5B | 1.5B | 18T | 8.84 | #4 |
| Llama-3.2-3B | 3.21B | 9T | 9.47 | #5 |
| EXAONE-3.5-2.4B | 2.4B | ~6.5T | 9.80 | #6 |
## 📊 Data Efficiency
| vs Model | Their Tokens | Gumini Tokens | Efficiency |
|----------|--------------|---------------|------------|
| Qwen-2.5 | 18T | 3.14B | **5,732×** less |
| Llama-3.2 | 9T | 3.14B | **2,866×** less |
| EXAONE-3.5 | ~6.5T | 3.14B | **~2,070×** less |
## Model Description
**Gumini-1.5B** (구미니) is a bilingual Korean-English **base language model** trained using the *Inheritune* methodology. Starting from **Qwen 2.5 3B**, the model progressively grew from 10 to 16 layers through 7 training stages, with **~3.14B tokens** of continued pretraining on a KoreanEnglish mixed corpus.
> This is a **BASE model**, not instruction-tuned.
> It produces text continuations rather than conversational responses.
## Training Highlights
### Inheritune Progressive Layer Growing
```
Stage 0: 10 layers (1.08B) → 393M tokens
Stage 1: 11 layers (1.15B) → 393M tokens
Stage 2: 12 layers (1.23B) → 393M tokens
Stage 3: 13 layers (1.31B) → 393M tokens
Stage 4: 14 layers (1.39B) → 393M tokens
Stage 5: 15 layers (1.47B) → 393M tokens
Stage 6: 16 layers (1.54B) → 786M tokens ⭐
────────────────────────────────────────────
Total: 16 layers, 1.54B params, ~3.14B tokens
```
## Model Details
| Attribute | Value |
|-----------|-------|
| **Researcher** | [Gumin Kwon (권구민)](https://linkedin.com/in/devgumin) |
| **Base Model** | Qwen/Qwen2.5-3B |
| **Training Method** | Inheritune + Pretraining |
| **Parameters** | 1.54B |
| **Layers** | 16 |
| **Hidden Size** | 2048 |
| **Attention Heads** | 16 |
| **KV Heads** | 2 (GQA) |
| **Vocab Size** | 151,936 |
| **Total Tokens Trained** | ~3.14B |
| **Precision** | BF16 |
## Training Data
| Dataset | Language | Weight |
|---------|----------|--------|
| FineWeb-Edu (sample-10BT) | English | 20% |
| CulturaX-ko | Korean | 50% |
| Wikipedia-ko | Korean | 30% |
**Total: 80% Korean, 20% English**
### Optimization
```yaml
learning_rate: 2.0e-4
weight_decay: 0.1
lr_scheduler: cosine
warmup_ratio: 0.01
max_grad_norm: 1.0
precision: bf16
gradient_checkpointing: true
attention: PyTorch SDPA (Flash Attention)
```
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"GuminiResearch/Gumini-1.5B-Base",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("GuminiResearch/Gumini-1.5B-Base")
prompt = "저는 구미니입니다."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=200,
repetition_penalty=1.2,
do_sample=True,
temperature=0.7,
top_p=0.9,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Using Pipeline
```python
from transformers import pipeline
generator = pipeline(
"text-generation",
model="GuminiResearch/Gumini-1.5B-Base",
torch_dtype="bfloat16",
device_map="auto",
)
output = generator(
"저는 구미니입니다.",
max_new_tokens=100,
temperature=0.7,
repetition_penalty=1.2,
)
print(output[0]["generated_text"])
```
## Evaluation
| Stage | Layers | Parameters |
|-------|--------|------------|
| 0 | 10 | 1.08B | - | - |
| 5 | 15 | 1.47B | - | - |
| **6** | **16** | **1.54B** |
## Model Family
| Model | Layers | Params | Tokens | Status |
|-------|--------|--------|--------|--------|
| Gumini-1B | 10 | 1.08B | 393M | ✅ Released |
| **Gumini-1.5B** | **16** | **1.54B** | **3.14B** | ✅ **This Model** |
## Limitations
- **Base model**: No instruction-tuning or safety alignment
- **High repetition risk**: Use `repetition_penalty >= 1.2`
- May generate **incorrect or outdated information**
- Should not be used in **sensitive or safety-critical** contexts
- Knowledge cutoff based on training data
## License
### Qwen Research License (Non-Commercial)
This model is **Built with Qwen** and derived from Qwen 2.5 3B.
```
Qwen is licensed under the Qwen RESEARCH LICENSE AGREEMENT.
Copyright (c) Alibaba Cloud. All Rights Reserved.
```
**This model is for NON-COMMERCIAL / RESEARCH use only.**
For commercial use, contact Alibaba Cloud.
## References
### Inheritune Paper
```bibtex
@inproceedings{Sanyal2024inheritune,
title={Inheritune: Training Smaller Yet More Attentive Language Models},
author={Sunny Sanyal and Ravid Shwartz-Ziv and Alexandros G. Dimakis and Sujay Sanghavi},
year={2024},
url={https://arxiv.org/abs/2404.08634}
}
```
### Qwen 2.5
```bibtex
@misc{qwen2.5,
title={Qwen2.5: A Party of Foundation Models},
author={Qwen Team},
year={2024},
url={https://qwenlm.github.io/blog/qwen2.5/}
}
```
## Citation
```bibtex
@misc{gumini2025,
title={Gumini-1.5B: Bilingual Korean-English Language Model via Inheritune},
author={Gumin Kwon},
year={2025},
note={Built with Qwen. Trained with Inheritune progressive layer growing.},
url={https://huggingface.co/GuminiResearch/Gumini-1.5B-Base}
}
```
## Author
**[Gumin Kwon (권구민)](https://linkedin.com/in/devgumin)**
- LinkedIn: [linkedin.com/in/devgumin](https://linkedin.com/in/devgumin)
- HuggingFace: [GuminiResearch](https://huggingface.co/GuminiResearch)
---
<p align="center">
<b>Built with Qwen</b><br>
<i>Gumini - 작지만 똑똑한 AI</i>
</p>