Files
Rain-100M/README.md
ModelHub XC 1fd9b72469 初始化项目,由ModelHub XC社区提供模型
Model: raincandy-u/Rain-100M
Source: Original Platform
2026-06-27 02:07:16 +08:00

2.4 KiB
Raw Permalink Blame History

license, datasets, language, pipeline_tag, library_name
license datasets language pipeline_tag library_name
apache-2.0
HuggingFaceFW/fineweb-edu
en
text-generation transformers

🩵 Rain-100M — Model Card

Rain-100M is an experimental language model trained from scratch based on the Qwen3 architecture.

🧠 Training & Data

  • Training corpus: HuggingFaceFW/fineweb-edu
  • Total tokens: ~3B
  • Language: English only
  • Tokenizer: Newly trained 16k BPE (optimized for small/compact models)
  • Max sequence length: 4096

Sample training metrics:

train/grad_norm: 0.6640625
train/learning_rate: 0.00000000002171853813
train/loss: 3.4459

🏗️ Architecture (Qwen3-style)

  • Parameters: ~100M
  • Layers: 12 Transformer layers
  • Hidden size: 768
  • Attention heads: 12
  • MLP dimension: 2048
  • Activation: SiLU
  • Weight dtype: bfloat16
  • RMSNorm eps: 1e-6
  • RoPE θ: 10000
  • Inference framework: transformers

⚠️ Limitations

  • Trained only on English data; weak or no capabilities in other languages, not suitable as a general-purpose chat model or for safety-critical use cases.
  • No system-level alignment or safety fine-tuning has been applied.

📄 License

When using this model locally, please also comply with the licenses of the fineweb-edu dataset and the transformers / Qwen3-related components.


中文版本

🩵 Rain-100M — Model Card

Rain-100M 是一个基于 Qwen3 架构 从零训练的实验语言模型。

🧠 训练与数据

  • 训练语料HuggingFaceFW/fineweb-edu
  • Tokens 数量:约 3B
  • 语言:仅英语
  • Tokenizer:全新训练的 16k BPE(面向轻量模型优化)
  • 最大序列长度4096

训练参数


train/grad_norm:0.6640625
train/learning_rate:0.00000000002171853813
train/loss:3.4459

🏗️ 模型结构Qwen3 规格)

  • 参数量:约 100M
  • 层数12 层 Transformer
  • 隐藏维度768
  • 注意力头数12
  • 中间层维度2048
  • 激活函数SiLU
  • 权重类型bfloat16
  • RMSNorm eps1e-6
  • RoPE θ10000
  • 推理框架transformers

⚠️ 限制

  • 仅使用英文语料,小语言能力有限,不适合作为通用聊天或安全敏感任务;
  • 无系统对齐与安全强化。

📄 License

请在本地使用时遵循 fineweb-edu 数据集与 transformers/Qwen3 相关许可证。