Files

ModelHub XC 1fd9b72469 初始化项目，由ModelHub XC社区提供模型

Model: raincandy-u/Rain-100M
Source: Original Platform

2026-06-27 02:07:16 +08:00

2.4 KiB

Raw Permalink Blame History

license, datasets, language, pipeline_tag, library_name

license

datasets

language

pipeline_tag

library_name

apache-2.0

HuggingFaceFW/fineweb-edu

text-generation

transformers

🩵 Rain-100M — Model Card

Rain-100M is an experimental language model trained from scratch based on the Qwen3 architecture.

🧠 Training & Data

Training corpus: HuggingFaceFW/fineweb-edu
Total tokens: ~3B
Language: English only
Tokenizer: Newly trained 16k BPE (optimized for small/compact models)
Max sequence length: 4096

Sample training metrics:

train/grad_norm: 0.6640625
train/learning_rate: 0.00000000002171853813
train/loss: 3.4459

🏗️ Architecture (Qwen3-style)

Parameters: ~100M
Layers: 12 Transformer layers
Hidden size: 768
Attention heads: 12
MLP dimension: 2048
Activation: SiLU
Weight dtype: bfloat16
RMSNorm eps: 1e-6
RoPE θ: 10000
Inference framework: transformers

⚠️ Limitations

Trained only on English data; weak or no capabilities in other languages, not suitable as a general-purpose chat model or for safety-critical use cases.
No system-level alignment or safety fine-tuning has been applied.

📄 License

When using this model locally, please also comply with the licenses of the fineweb-edu dataset and the transformers / Qwen3-related components.

中文版本

🩵 Rain-100M — Model Card

Rain-100M 是一个基于 Qwen3 架构 从零训练的实验语言模型。

🧠 训练与数据

训练语料：HuggingFaceFW/fineweb-edu
Tokens 数量：约 3B
语言：仅英语
Tokenizer：全新训练的 16k BPE（面向轻量模型优化）
最大序列长度：4096

训练参数：


train/grad_norm:0.6640625
train/learning_rate:0.00000000002171853813
train/loss:3.4459

🏗️ 模型结构（Qwen3 规格）

参数量：约 100M
层数：12 层 Transformer
隐藏维度：768
注意力头数：12
中间层维度：2048
激活函数：SiLU
权重类型：bfloat16
RMSNorm eps：1e-6
RoPE θ：10000
推理框架：transformers

⚠️ 限制

仅使用英文语料，小语言能力有限，不适合作为通用聊天或安全敏感任务；
无系统对齐与安全强化。

📄 License

请在本地使用时遵循 fineweb-edu 数据集与 transformers/Qwen3 相关许可证。

2.4 KiB Raw Permalink Blame History Unescape Escape

🩵 Rain-100M — Model Card

🧠 Training & Data

🏗️ Architecture (Qwen3-style)

⚠️ Limitations

📄 License

中文版本

🩵 Rain-100M — Model Card

🧠 训练与数据

🏗️ 模型结构（Qwen3 规格）

⚠️ 限制

📄 License

2.4 KiB

Raw Permalink Blame History