38 lines
952 B
Plaintext
38 lines
952 B
Plaintext
|
|
NOTICE
|
||
|
|
|
||
|
|
Gumini-1B
|
||
|
|
Copyright (c) 2025 Gumin Kwon
|
||
|
|
|
||
|
|
========================================
|
||
|
|
Built with Qwen
|
||
|
|
========================================
|
||
|
|
|
||
|
|
QWEN ATTRIBUTION
|
||
|
|
----------------
|
||
|
|
Qwen is licensed under the Qwen RESEARCH LICENSE AGREEMENT.
|
||
|
|
Copyright (c) Alibaba Cloud. All Rights Reserved.
|
||
|
|
|
||
|
|
MODIFICATIONS MADE:
|
||
|
|
- Reduced layers from 36 to 10 (inherited first 10 layers)
|
||
|
|
- Continued training on Korean-English dataset (~393M tokens)
|
||
|
|
- Added model identity information
|
||
|
|
|
||
|
|
This model is for NON-COMMERCIAL/RESEARCH purposes only.
|
||
|
|
|
||
|
|
========================================
|
||
|
|
|
||
|
|
INHERITUNE ATTRIBUTION (CC BY 4.0)
|
||
|
|
----------------------------------
|
||
|
|
Training method from:
|
||
|
|
"Inheritune: Training Smaller Yet More Attentive Language Models"
|
||
|
|
Sanyal et al., 2024
|
||
|
|
https://arxiv.org/abs/2404.08634
|
||
|
|
|
||
|
|
========================================
|
||
|
|
|
||
|
|
TRAINING DATA ATTRIBUTION
|
||
|
|
-------------------------
|
||
|
|
- FineWeb-Edu: ODC-By 1.0
|
||
|
|
- CulturaX: CC BY 4.0
|
||
|
|
- Wikipedia: CC BY-SA 3.0
|