Go to file

ModelHub XC 8b82aad8b6 初始化项目，由ModelHub XC社区提供模型

Model: GuminiResearch/Gumini-1B-Base
Source: Original Platform

2026-04-13 13:42:07 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

LICENSE

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

NOTICE

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

optimizer.pt

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

rng_state.pth

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

scheduler.pt

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

trainer_state.json

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

training_args.bin

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-04-13 13:42:07 +08:00

README.md

license, license_name, license_link, library_name, language, tags, base_model, datasets, pipeline_tag

license

license_name

license_link

library_name

language

🐻 Gumini-1B (구미니)

Built with Qwen

Model Description

Gumini (구미니) is a bilingual Korean-English base language model created by inheriting the first 10 layers of Qwen 2.5 3B using the Inheritune methodology, followed by continued pretraining on a Korean–English mixed corpus (~393M tokens).

This is a BASE model, not instruction-tuned.
It produces text continuations rather than conversational responses.

What We Modified

The original Qwen 2.5 3B model was modified as follows:

Layer Inheritance (Inheritune)
- Inherited the first 10 transformer layers out of 36
- Reduced model size while preserving early linguistic abilities
Pretraining
- Trained for 393M tokens on a Korean–English dataset
- Maintains base-model behavior (not SFT or instruction-tuning)
Identity Injection
- Added system-level identity tokens for model conditioning

This model inherits early layers from Qwen 2.5 3B and is retrained with progressive layer expansion using Inheritune methodology.

Model Details

Attribute	Value
Researcher	Gumin Kwon (권구민)
Base Model	Qwen/Qwen2.5-3B
Training Method	Inheritune + Pretraining
Parameters	1.08B
Layers	10
Hidden Size	2048
Attention Heads	16
KV Heads	2 (GQA)
Vocab Size	151,936
Tokens Trained	393M

Training Data

Dataset	Language	Weight
FineWeb-Edu	English	20%
CulturaX-ko	Korean	50%
Wikipedia-ko	Korean	30%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "GuminiResearch/Gumini-1B-Base",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("GuminiResearch/Gumini-1B-Base")

prompt = "저는 구미니입니다."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    repetition_penalty=1.2,
    do_sample=True,
    temperature=0.7
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="GuminiResearch/Gumini-1B-Base",
)

prompt = "저는 구미니입니다."
output = generator(prompt, max_new_tokens=100, temperature=0.7, repetition_penalty=1.2)

print(output[0]["generated_text"])

Limitations

Base model: no instruction-tuning or safety alignment
High repetition risk: use repetition_penalty >= 1.2
May generate incorrect or outdated information
Should not be used in sensitive or safety-critical contexts

License

Qwen Research License (Non-Commercial)

This model is Built with Qwen and derived from Qwen 2.5 3B.

Qwen is licensed under the Qwen RESEARCH LICENSE AGREEMENT.
Copyright (c) Alibaba Cloud. All Rights Reserved.

This model is for NON-COMMERCIAL / RESEARCH use only.
For commercial use, contact Alibaba Cloud.

Inheritune Paper (CC BY 4.0)

@inproceedings{Sanyal2024inheritune,
  title={Inheritune: Training Smaller Yet More Attentive Language Models},
  author={Sunny Sanyal and Ravid Shwartz-Ziv and Alexandros G. Dimakis and Sujay Sanghavi},
  year={2024},
  url={https://arxiv.org/abs/2404.08634}
}

Citation

@misc{gumini2025,
  title={Gumini-1B: Bilingual Language Model Built with Qwen via Inheritune},
  author={Gumin Kwon},
  year={2025},
  note={Built with Qwen},
  url={https://huggingface.co/GuminiResearch/Gumini-1B-Base}
}

Author

Gumin Kwon (권구민)

LinkedIn: linkedin.com/in/devgumin
HuggingFace: GuminiResearch

Built with Qwen
Gumini - 작지만 똑똑한 AI

README.md Unescape Escape

🐻 Gumini-1B (구미니)

Model Description

What We Modified

Model Details

Training Data

Usage

Limitations

License

Qwen Research License (Non-Commercial)

Inheritune Paper (CC BY 4.0)

Citation

Author

README.md