Files
Rain-100M/README.md
ModelHub XC 1fd9b72469 初始化项目,由ModelHub XC社区提供模型
Model: raincandy-u/Rain-100M
Source: Original Platform
2026-06-27 02:07:16 +08:00

100 lines
2.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
# 🩵 Rain-100M — Model Card
**Rain-100M** is an experimental language model trained from scratch based on the **Qwen3 architecture**.
## 🧠 Training & Data
* **Training corpus**: `HuggingFaceFW/fineweb-edu`
* **Total tokens**: ~**3B**
* **Language**: English only
* **Tokenizer**: Newly trained **16k BPE** (optimized for small/compact models)
* **Max sequence length**: 4096
**Sample training metrics**:
```text
train/grad_norm: 0.6640625
train/learning_rate: 0.00000000002171853813
train/loss: 3.4459
```
## 🏗️ Architecture (Qwen3-style)
* **Parameters**: ~100M
* **Layers**: 12 Transformer layers
* **Hidden size**: 768
* **Attention heads**: 12
* **MLP dimension**: 2048
* **Activation**: SiLU
* **Weight dtype**: bfloat16
* **RMSNorm eps**: 1e-6
* **RoPE θ**: 10000
* **Inference framework**: `transformers`
## ⚠️ Limitations
* Trained only on English data; weak or no capabilities in other languages, not suitable as a general-purpose chat model or for safety-critical use cases.
* No system-level alignment or safety fine-tuning has been applied.
## 📄 License
When using this model locally, please also comply with the licenses of the `fineweb-edu` dataset and the `transformers` / Qwen3-related components.
---
# 中文版本
# 🩵 Rain-100M — Model Card
**Rain-100M** 是一个基于 **Qwen3 架构** 从零训练的实验语言模型。
## 🧠 训练与数据
* **训练语料**HuggingFaceFW/fineweb-edu
* **Tokens 数量**:约 **3B**
* **语言**:仅英语
* **Tokenizer**:全新训练的 **16k BPE**(面向轻量模型优化)
* **最大序列长度**4096
**训练参数**
```
train/grad_norm:0.6640625
train/learning_rate:0.00000000002171853813
train/loss:3.4459
```
## 🏗️ 模型结构Qwen3 规格)
* **参数量**:约 100M
* **层数**12 层 Transformer
* **隐藏维度**768
* **注意力头数**12
* **中间层维度**2048
* **激活函数**SiLU
* **权重类型**bfloat16
* **RMSNorm eps**1e-6
* **RoPE θ**10000
* **推理框架**transformers
## ⚠️ 限制
* 仅使用英文语料,小语言能力有限,不适合作为通用聊天或安全敏感任务;
* 无系统对齐与安全强化。
## 📄 License
请在本地使用时遵循 fineweb-edu 数据集与 transformers/Qwen3 相关许可证。