初始化项目,由ModelHub XC社区提供模型
Model: raincandy-u/Rain-100M Source: Original Platform
This commit is contained in:
100
README.md
Normal file
100
README.md
Normal file
@@ -0,0 +1,100 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
datasets:
|
||||
- HuggingFaceFW/fineweb-edu
|
||||
language:
|
||||
- en
|
||||
pipeline_tag: text-generation
|
||||
library_name: transformers
|
||||
---
|
||||
|
||||
|
||||
# 🩵 Rain-100M — Model Card
|
||||
|
||||
**Rain-100M** is an experimental language model trained from scratch based on the **Qwen3 architecture**.
|
||||
|
||||
## 🧠 Training & Data
|
||||
|
||||
* **Training corpus**: `HuggingFaceFW/fineweb-edu`
|
||||
* **Total tokens**: ~**3B**
|
||||
* **Language**: English only
|
||||
* **Tokenizer**: Newly trained **16k BPE** (optimized for small/compact models)
|
||||
* **Max sequence length**: 4096
|
||||
|
||||
**Sample training metrics**:
|
||||
|
||||
```text
|
||||
train/grad_norm: 0.6640625
|
||||
train/learning_rate: 0.00000000002171853813
|
||||
train/loss: 3.4459
|
||||
```
|
||||
|
||||
## 🏗️ Architecture (Qwen3-style)
|
||||
|
||||
* **Parameters**: ~100M
|
||||
* **Layers**: 12 Transformer layers
|
||||
* **Hidden size**: 768
|
||||
* **Attention heads**: 12
|
||||
* **MLP dimension**: 2048
|
||||
* **Activation**: SiLU
|
||||
* **Weight dtype**: bfloat16
|
||||
* **RMSNorm eps**: 1e-6
|
||||
* **RoPE θ**: 10000
|
||||
* **Inference framework**: `transformers`
|
||||
|
||||
## ⚠️ Limitations
|
||||
|
||||
* Trained only on English data; weak or no capabilities in other languages, not suitable as a general-purpose chat model or for safety-critical use cases.
|
||||
* No system-level alignment or safety fine-tuning has been applied.
|
||||
|
||||
## 📄 License
|
||||
|
||||
When using this model locally, please also comply with the licenses of the `fineweb-edu` dataset and the `transformers` / Qwen3-related components.
|
||||
|
||||
---
|
||||
|
||||
# 中文版本
|
||||
|
||||
# 🩵 Rain-100M — Model Card
|
||||
|
||||
**Rain-100M** 是一个基于 **Qwen3 架构** 从零训练的实验语言模型。
|
||||
|
||||
## 🧠 训练与数据
|
||||
|
||||
* **训练语料**:HuggingFaceFW/fineweb-edu
|
||||
* **Tokens 数量**:约 **3B**
|
||||
* **语言**:仅英语
|
||||
* **Tokenizer**:全新训练的 **16k BPE**(面向轻量模型优化)
|
||||
* **最大序列长度**:4096
|
||||
|
||||
**训练参数**:
|
||||
|
||||
```
|
||||
|
||||
train/grad_norm:0.6640625
|
||||
train/learning_rate:0.00000000002171853813
|
||||
train/loss:3.4459
|
||||
|
||||
```
|
||||
|
||||
## 🏗️ 模型结构(Qwen3 规格)
|
||||
|
||||
* **参数量**:约 100M
|
||||
* **层数**:12 层 Transformer
|
||||
* **隐藏维度**:768
|
||||
* **注意力头数**:12
|
||||
* **中间层维度**:2048
|
||||
* **激活函数**:SiLU
|
||||
* **权重类型**:bfloat16
|
||||
* **RMSNorm eps**:1e-6
|
||||
* **RoPE θ**:10000
|
||||
* **推理框架**:transformers
|
||||
|
||||
## ⚠️ 限制
|
||||
|
||||
* 仅使用英文语料,小语言能力有限,不适合作为通用聊天或安全敏感任务;
|
||||
* 无系统对齐与安全强化。
|
||||
|
||||
## 📄 License
|
||||
|
||||
请在本地使用时遵循 fineweb-edu 数据集与 transformers/Qwen3 相关许可证。
|
||||
Reference in New Issue
Block a user