--- --- license: other license_name: qwen-research license_link: https://huggingface.co/Qwen/Qwen2.5-3B/blob/main/LICENSE library_name: transformers language: - ko - en tags: - text-generation - korean - bilingual - qwen2 - built-with-qwen - inheritune - continued-pretraining base_model: Qwen/Qwen2.5-3B datasets: - HuggingFaceFW/fineweb-edu - uonlp/CulturaX - wikimedia/wikipedia pipeline_tag: text-generation --- # π» Gumini-1.5B (ꡬ미λ)
Built with Qwen
> **5,700Γ less data, better performance.** > Gumini-1.5B achieves Korean PPL 8.49 with only 3.14B tokens, outperforming Qwen-1.5B (18T tokens, PPL 8.84). ## π₯ Key Results | Model | Params | Training Tokens | Korean PPL β | Rank | |-------|--------|-----------------|--------------|------| | Qwen-2.5-7B | 7.62B | 18T | 6.39 | #1 | | Gemma-2B | 2.0B | 2T | 8.15 | #2 | | **Gumini-1.5B (Ours)** | **1.54B** | **3.14B** | **8.49** | **#3** | | Qwen-2.5-1.5B | 1.5B | 18T | 8.84 | #4 | | Llama-3.2-3B | 3.21B | 9T | 9.47 | #5 | | EXAONE-3.5-2.4B | 2.4B | ~6.5T | 9.80 | #6 | ## π Data Efficiency | vs Model | Their Tokens | Gumini Tokens | Efficiency | |----------|--------------|---------------|------------| | Qwen-2.5 | 18T | 3.14B | **5,732Γ** less | | Llama-3.2 | 9T | 3.14B | **2,866Γ** less | | EXAONE-3.5 | ~6.5T | 3.14B | **~2,070Γ** less | ## Model Description **Gumini-1.5B** (ꡬ미λ) is a bilingual Korean-English **base language model** trained using the *Inheritune* methodology. Starting from **Qwen 2.5 3B**, the model progressively grew from 10 to 16 layers through 7 training stages, with **~3.14B tokens** of continued pretraining on a KoreanβEnglish mixed corpus. > This is a **BASE model**, not instruction-tuned. > It produces text continuations rather than conversational responses. ## Training Highlights ### Inheritune Progressive Layer Growing ``` Stage 0: 10 layers (1.08B) β 393M tokens Stage 1: 11 layers (1.15B) β 393M tokens Stage 2: 12 layers (1.23B) β 393M tokens Stage 3: 13 layers (1.31B) β 393M tokens Stage 4: 14 layers (1.39B) β 393M tokens Stage 5: 15 layers (1.47B) β 393M tokens Stage 6: 16 layers (1.54B) β 786M tokens β ββββββββββββββββββββββββββββββββββββββββββββ Total: 16 layers, 1.54B params, ~3.14B tokens ``` ## Model Details | Attribute | Value | |-----------|-------| | **Researcher** | [Gumin Kwon (κΆκ΅¬λ―Ό)](https://linkedin.com/in/devgumin) | | **Base Model** | Qwen/Qwen2.5-3B | | **Training Method** | Inheritune + Pretraining | | **Parameters** | 1.54B | | **Layers** | 16 | | **Hidden Size** | 2048 | | **Attention Heads** | 16 | | **KV Heads** | 2 (GQA) | | **Vocab Size** | 151,936 | | **Total Tokens Trained** | ~3.14B | | **Precision** | BF16 | ## Training Data | Dataset | Language | Weight | |---------|----------|--------| | FineWeb-Edu (sample-10BT) | English | 20% | | CulturaX-ko | Korean | 50% | | Wikipedia-ko | Korean | 30% | **Total: 80% Korean, 20% English** ### Optimization ```yaml learning_rate: 2.0e-4 weight_decay: 0.1 lr_scheduler: cosine warmup_ratio: 0.01 max_grad_norm: 1.0 precision: bf16 gradient_checkpointing: true attention: PyTorch SDPA (Flash Attention) ``` ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( "GuminiResearch/Gumini-1.5B-Base", torch_dtype=torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("GuminiResearch/Gumini-1.5B-Base") prompt = "μ λ ꡬ미λμ λλ€." inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=200, repetition_penalty=1.2, do_sample=True, temperature=0.7, top_p=0.9, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Using Pipeline ```python from transformers import pipeline generator = pipeline( "text-generation", model="GuminiResearch/Gumini-1.5B-Base", torch_dtype="bfloat16", device_map="auto", ) output = generator( "μ λ ꡬ미λμ λλ€.", max_new_tokens=100, temperature=0.7, repetition_penalty=1.2, ) print(output[0]["generated_text"]) ``` ## Evaluation | Stage | Layers | Parameters | |-------|--------|------------| | 0 | 10 | 1.08B | - | - | | 5 | 15 | 1.47B | - | - | | **6** | **16** | **1.54B** | ## Model Family | Model | Layers | Params | Tokens | Status | |-------|--------|--------|--------|--------| | Gumini-1B | 10 | 1.08B | 393M | β Released | | **Gumini-1.5B** | **16** | **1.54B** | **3.14B** | β **This Model** | ## Limitations - **Base model**: No instruction-tuning or safety alignment - **High repetition risk**: Use `repetition_penalty >= 1.2` - May generate **incorrect or outdated information** - Should not be used in **sensitive or safety-critical** contexts - Knowledge cutoff based on training data ## License ### Qwen Research License (Non-Commercial) This model is **Built with Qwen** and derived from Qwen 2.5 3B. ``` Qwen is licensed under the Qwen RESEARCH LICENSE AGREEMENT. Copyright (c) Alibaba Cloud. All Rights Reserved. ``` **This model is for NON-COMMERCIAL / RESEARCH use only.** For commercial use, contact Alibaba Cloud. ## References ### Inheritune Paper ```bibtex @inproceedings{Sanyal2024inheritune, title={Inheritune: Training Smaller Yet More Attentive Language Models}, author={Sunny Sanyal and Ravid Shwartz-Ziv and Alexandros G. Dimakis and Sujay Sanghavi}, year={2024}, url={https://arxiv.org/abs/2404.08634} } ``` ### Qwen 2.5 ```bibtex @misc{qwen2.5, title={Qwen2.5: A Party of Foundation Models}, author={Qwen Team}, year={2024}, url={https://qwenlm.github.io/blog/qwen2.5/} } ``` ## Citation ```bibtex @misc{gumini2025, title={Gumini-1.5B: Bilingual Korean-English Language Model via Inheritune}, author={Gumin Kwon}, year={2025}, note={Built with Qwen. Trained with Inheritune progressive layer growing.}, url={https://huggingface.co/GuminiResearch/Gumini-1.5B-Base} } ``` ## Author **[Gumin Kwon (κΆκ΅¬λ―Ό)](https://linkedin.com/in/devgumin)** - LinkedIn: [linkedin.com/in/devgumin](https://linkedin.com/in/devgumin) - HuggingFace: [GuminiResearch](https://huggingface.co/GuminiResearch) ---
Built with Qwen
Gumini - μμ§λ§ λλν AI